a list implementation of the simm
Briefly, here is the code for a list implementation of simm (as opposed to the superposition version). Note it is unscaled, so we are not rescaling f and g so that w*f == w*g. I may give the code for that later (I don't think I have it typed up anywhere just yet).
Recall:
simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)
Python:
def list_simm(w,f,g):
the_len = min(len(f),len(g))
# w += [0] * (the_len - len(w))
w += [1] * (the_len - len(w))
f = f[:the_len]
g = g[:the_len]
wf = sum(abs(w[k]*f[k]) for k in range(the_len))
wg = sum(abs(w[k]*g[k]) for k in range(the_len))
wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len))
if wf == 0 and wg == 0:
return 0
else:
return (wf + wg - wfg)/(2*max(wf,wg))
And that's it! Heaps more to come!
Update: OK. I wrote a rescaled version of list simm. I haven't tested it, but I think it is probably right :)
Python:
def rescaled_list_simm(w,f,g):
the_len = min(len(f),len(g))
# normalize lengths of our lists:
# w += [0] * (the_len - len(w))
w += [1] * (the_len - len(w))
f = f[:the_len]
g = g[:the_len]
# rescale step, first find size:
s1 = sum(abs(w[k]*f[k]) for k in range(the_len))
s2 = sum(abs(w[k]*g[k]) for k in range(the_len))
# if s1 == 0, or s2 == 0, we can't rescale:
if s1 == 0 or s2 == 0:
return 0
# now rescale:
# we just need w*f == w*g, the exact value doesn't matter, so we choose 1.
# noting that our equation has symmetry under: "f => k.f, g => k.g"
# also, note that finite precision floats means sometimes it does matter, but hopefully we will be fine.
f = [f[k]/s1 for k in range(the_len)]
g = [g[k]/s2 for k in range(the_len)]
# proceed with algo:
# if we did the rescale step correctly we will have:
# wf == wg == 1
# wf = sum(abs(w[k]*f[k]) for k in range(the_len))
# wg = sum(abs(w[k]*g[k]) for k in range(the_len))
wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len))
# we should never have wf or wg == 0 in the rescaled case:
# if wf == 0 and wg == 0:
# return 0
# else:
# return (wf + wg - wfg)/(2*max(wf,wg))
return (2 - wfg)/2
Update: OK. May as well do an implementation of Gaussian simm too:
gaussian-simm(s,f,g) = exp(-||f - g||^2/2s)
Python:
import math
# define Euclidean Distance function:
def ED(f,g):
if len(f) != len(g):
print("different length vectors!")
return 0
return math.sqrt(sum((f[k] - g[k])**2 for k in range(len(f))))
# define Guassian simm:
# guassian-simm(s,f,g) = exp(-||f - g||^2/2s)
def guass_simm(s,f,g):
return math.exp(-ED(f,g)**2/2*s)
Home
previous: a similarity metric
next: some examples of list simm in action
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org