c# - Regex to match words but not html entities -
i'm parsing html node text regex looking words perform operations on.
i'm using (\w+)
i have situations word word
, nbsp gets recognized word.
i can match html entity \&[a-z0-9a-z]+\;
don't know how unmatch word if part of entity.
is there way have regex match word not if html entity following?
<
<
ý
ý
etc etc
a negative lookbehind assertion might trick:
(?<!&#?)\b\w+
matches if word not preceded &
or &#
. doesn't check semicolon, though, since might legitimately follow normal word.
Comments
Post a Comment