java - Regex search taking increasingly long time -


my regex taking increasingly long match (about 30 seconds 5th time) needs applied around 500 rounds of matches. suspect catastrophic backtracking. please help! how can optimize regex:

string regex = "<tr bgcolor=\"ffffff\">\\s*?<td width=\"20%\"><b>((?:.|\\s)+?): *?</b></td>\\s*?<td width=\"80%\">((?:.|\\s)*?)(?=(?:</td>\\s*?</tr>\\s*?<tr bgcolor=\"ffffff\">)|(?:</td>\\s*?</tr>\\s*?</table>\\s*?<b>tags</b>))"; 

edit: since not clear(my bad): trying take html formatted document , reformat extracting 2 search groups , adding formating afterwards.

the alternation (?:.|\\s)+? inefficient, involves backtracking.

if want match characters including whitespace regex, with

[\\s\\s]*? 

or enable singleline mode (?s) (or pattern.dotall matcher option) , use . (e.g. (?s)start(.*?)end).

note: manipulate html, use dedicated parser, jsoup. here post discussing java html parsers.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -