Exactly just What algorithm could you best utilize for string similarity?

Exactly just What algorithm could you best utilize for string similarity?

I will be creating a plugin to identify content on uniquely various webpages, centered on details.

Therefore I may get one target which appears like:

later on i might find this target in a format that is slightly different.

or simply since obscure as

They are theoretically the exact same target, however with an amount of similarity. I wish to a) create an unique identifier for each target to execute lookups, and b) determine whenever a rather comparable target turns up.

What algorithms techniques that ar / String metrics should we be taking a look at? Levenshtein distance appears like a choice that is obvious but interested if there’s any kind of approaches that could provide on their own right here.

7 Responses 7

Levenstein’s algorithm is founded on the true quantity of insertions, deletions, and substitutions in strings.

Unfortuitously it does not account fully for a typical misspelling which will be the transposition of 2 chars ( ag e.g. someawesome vs someaewsome). Therefore I’d like the more Damerau-Levenstein that is robust algorithm.

I do not think it is an idea that is good use the exact distance on entire strings since the time increases suddenly utilizing the period of the strings contrasted. But a whole lot worse, when target elements, like ZIP are removed, very different details may match better (calculated making use of online Levenshtein calculator):

These impacts have a tendency to aggravate for faster road name. Continue reading Exactly just What algorithm could you best utilize for string similarity?