c# - Regex to match words but not html entities -
i'm parsing html node text regex looking words perform operations on.
i'm using (\w+)
i have situations word word , nbsp gets recognized word.
i can match html entity \&[a-z0-9a-z]+\; don't know how unmatch word if part of entity.
is there way have regex match word not if html entity following?
< <
ý ý
etc etc
a negative lookbehind assertion might trick:
(?<!&#?)\b\w+ matches if word not preceded & or &#. doesn't check semicolon, though, since might legitimately follow normal word.
Comments
Post a Comment