Thursday, March 8, 2012

Compare the simlarity of two strings in Python

Again, I have adapted this code from some C code I found on the Internet years ago.

Side note: Did you know that toward and towards are completely interchangeable? Toward tends to get used in American English while towards is used more heavily in British English.



def wordcompare (s1, s2, weight_toward_front = False):
"""Compares two strings and returns a value between 0 and 100 for similarness"""

if not s1 or not s2:
return 0

if s1 == s2:
return 100

s1_len = len(s1)
s2_len = len(s2)

n = i = j = k = x = y = 0

while i < s1_len:
x = i
j = 0
while j < s2_len:
y = j
k = 0
while x < s1_len and y < s2_len and s1[x] == s2[y]:
eb = 1
if weight_toward_front:
if (x < 1 and y < 1):
eb += 3
if (x < 2 and y < 2):
eb += 2
if (x < 3 and y < 3):
eb += 1

k += 1
n += (k*k*eb)
x += 1
y += 1

j += 1
i += 1

n = int((n * 20) / (s1_len * s2_len))

if n > 100:
n = 100

return n

No comments:

Post a Comment