levenshtein distance - Can you weight Levenshtien to the front of the string? -
levenshtein seems very.. agnostic..in terms of how scores distance/similarity in terms.
for instance:
- olive garden vs olden garden = 3
whereas
- olive garden vs olive garden restaurant = 11
in real world (as see or @ least applications) latter should weighted more heavily.
is there modification or 'distance' comparison tool doesn't try account misspelling , transposition weight second example higher because of sheer # of 100% matches on first part of phrase?
this difficult question answer , no means expert on subject, however, have @ least partial answer of questions. additionally, didn't specify language, examples use php.
to best of knowledge there no single comparison tool or function able determine relevance, rather similarity of 2 strings. however, there different comparison tools out there give better results. similar_text function in php, example, returns percent similarity between 2 strings , more accurate in you're trying do.
additionally, can account misspellings when comparing similarity of 2 strings first calculating phonetic "keys" of each string, , calculating levenshtein distance between phonetic keys. best phonetic algorithm know of calculating phonetic keys of strings metaphone. in php, metaphone built in , can used this:
echo metaphone("carrot"); // prints krt the cool part if user misspell carrot , instead type "carrrot," same phonetic key generated (as "carrot" , "carrrot"), sound same
echo metaphone("carrot"); // prints krt echo metaphone("carrrot"); // prints krt and levenshtein distance between krt , krt 0. pitfall solution while metaphone helps smoothing out spelling errors don't change how word sounds, words misspelled point no longer have phonetic resemblance not generate similar phonetic keys. in example, olive garden , olden garden don't have same phonetic keys, still seen levenshtein being relatively far apart.
echo levenshtein(metaphone("olive garden"), metaphone("olden garden")); // prints 2 conclusion
even in conjunction metaphone, using levenshtein distance falls short, , unable provide relevance between 2 strings. best solution can give use similar_text in conjunction metaphone compare strings. this:
similar_text(metaphone("olive garden restaurant"), metaphone("olive garden"), $sim); echo $sim; // prints 70%
Comments
Post a Comment