Recall:

simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)Python:

def list_simm(w,f,g): the_len = min(len(f),len(g)) # w += [0] * (the_len - len(w)) w += [1] * (the_len - len(w)) f = f[:the_len] g = g[:the_len] wf = sum(abs(w[k]*f[k]) for k in range(the_len)) wg = sum(abs(w[k]*g[k]) for k in range(the_len)) wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len)) if wf == 0 and wg == 0: return 0 else: return (wf + wg - wfg)/(2*max(wf,wg))And that's it! Heaps more to come!

Update: OK. I wrote a rescaled version of list simm. I haven't tested it, but I think it is probably right :)

Python:

def rescaled_list_simm(w,f,g): the_len = min(len(f),len(g)) # normalize lengths of our lists: # w += [0] * (the_len - len(w)) w += [1] * (the_len - len(w)) f = f[:the_len] g = g[:the_len] # rescale step, first find size: s1 = sum(abs(w[k]*f[k]) for k in range(the_len)) s2 = sum(abs(w[k]*g[k]) for k in range(the_len)) # if s1 == 0, or s2 == 0, we can't rescale: if s1 == 0 or s2 == 0: return 0 # now rescale: # we just need w*f == w*g, the exact value doesn't matter, so we choose 1. # noting that our equation has symmetry under: "f => k.f, g => k.g" # also, note that finite precision floats means sometimes it does matter, but hopefully we will be fine. f = [f[k]/s1 for k in range(the_len)] g = [g[k]/s2 for k in range(the_len)] # proceed with algo: # if we did the rescale step correctly we will have: # wf == wg == 1 # wf = sum(abs(w[k]*f[k]) for k in range(the_len)) # wg = sum(abs(w[k]*g[k]) for k in range(the_len)) wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len)) # we should never have wf or wg == 0 in the rescaled case: # if wf == 0 and wg == 0: # return 0 # else: # return (wf + wg - wfg)/(2*max(wf,wg)) return (2 - wfg)/2Update: OK. May as well do an implementation of Gaussian simm too:

gaussian-simm(s,f,g) = exp(-||f - g||^2/2s)Python:

import math # define Euclidean Distance function: def ED(f,g): if len(f) != len(g): print("different length vectors!") return 0 return math.sqrt(sum((f[k] - g[k])**2 for k in range(len(f)))) # define Guassian simm: # guassian-simm(s,f,g) = exp(-||f - g||^2/2s) def guass_simm(s,f,g): return math.exp(-ED(f,g)**2/2*s)

Home

previous: a similarity metric

next: some examples of list simm in action

updated: 19/12/2016

by Garry Morrison

email: garry -at- semantic-db.org