c# - Regex to match words but not html entities -

i'm parsing html node text regex looking words perform operations on.
i'm using (\w+)

i have situations word word , nbsp gets recognized word.

i can match html entity \&[a-z0-9a-z]+\; don't know how unmatch word if part of entity.

is there way have regex match word not if html entity following?

 
< <
ý ý
etc etc

(?<!&#?)\b\w+

matches if word not preceded & or &#. doesn't check semicolon, though, since might legitimately follow normal word.

JVParth