java - Regex search taking increasingly long time -
my regex taking increasingly long match (about 30 seconds 5th time) needs applied around 500 rounds of matches. suspect catastrophic backtracking. please help! how can optimize regex:
string regex = "<tr bgcolor=\"ffffff\">\\s*?<td width=\"20%\"><b>((?:.|\\s)+?): *?</b></td>\\s*?<td width=\"80%\">((?:.|\\s)*?)(?=(?:</td>\\s*?</tr>\\s*?<tr bgcolor=\"ffffff\">)|(?:</td>\\s*?</tr>\\s*?</table>\\s*?<b>tags</b>))"; edit: since not clear(my bad): trying take html formatted document , reformat extracting 2 search groups , adding formating afterwards.
the alternation (?:.|\\s)+? inefficient, involves backtracking.
if want match characters including whitespace regex, with
[\\s\\s]*? or enable singleline mode (?s) (or pattern.dotall matcher option) , use . (e.g. (?s)start(.*?)end).
note: manipulate html, use dedicated parser, jsoup. here post discussing java html parsers.
Comments
Post a Comment