4 library(isub): isub: a string similarity measure
- author
- Giorgos Stoilos
- See also
- A string metric for ontology alignment by Giorgos Stoilos, 2005.
The library(isub)
implements a similarity measure
between strings, i.e., something similar to the Levenshtein distance.
This method is based on the length of common substrings.
- [det]isub(+Text1:text, +Text2:text, +Normalize:bool, -Similarity:float)
- Similarity is a measure for the distance between Text1
and
Text2. E.g.
?- isub('E56.Language', 'languange', true, D). D = 0.711348.
If Normalize is
true
, isub/4 applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library functiontowlower()
. In general, the required normalization is domain dependent and is better left to the caller. See e.g., unaccent_atom/2.Text1 and Text2 are either an atom, string or a list of characters or character codes. Similarity is a float in the range [0.0..1.0], where 1.0 means most similar