p pattern similarity metric in python
Last post we gave the p pattern similarity metric in nice pretty LaTeX. In this post, verify my maths by actually implementing it in python. Yeah, there were a couple of steps I wasn't 100% confident in, but the python says I'm fine.
Here is the python:
import math
import cmath
# define p'th roots of unity:
def jpk(p,k):
return cmath.exp(1j*2*math.pi*k/p)
# define wf_k:
def wf(vect):
return sum(abs(x) for x in vect)
# define wf^p:
def wfp(vects):
p = len(vects)
i_max = len(vects[0]) # assume all vects are the same size as the first one.
r1 = 0
for i in range(i_max):
r2 = 0
for k in range(p):
r2 += jpk(p,k)*vects[k][i]
r1 += abs(r2)
return r1
def multi_simm(vects):
p = len(vects)
i_max = len(vects[0]) # assume all vects are the same size as the first one.
# sum over wf_k term:
r1 = 0
max_wf = 0
for k in range(p):
wf_k = wf(vects[k])
max_wf = max(max_wf,wf_k)
r1 += wf_k
# wf^p term:
r2 = wfp(vects)
# p.max term:
r3 = p*max_wf
# prevent divide by 0:
if r3 == 0:
return 0
# return result:
return (r1 - r2)/r3
def rescaled_multi_simm(vects):
p = len(vects)
i_max = len(vects[0]) # assume all vects are the same size as the first one.
# find normalization terms:
norms = []
for k in range(p):
wf_k = wf(vects[k])
if wf_k == 0: # prevent divide by zero
return 0
norms.append(wf_k)
# find normalized wf^p:
r1 = 0
for i in range(i_max):
r2 = 0
for k in range(p):
r2 += jpk(p,k)*vects[k][i]/norms[k]
r1 += abs(r2)
# return result:
return 1 - r1/p
# test the code:
print("wfp: %s" % wfp(list_of_vects))
print("multi-simm: %s" % multi_simm(list_of_vects))
print("rescaled-multi-simm: %s" % rescaled_multi_simm(list_of_vects))
Now, some test cases. First, with all patterns equal, which should give 1, else we made a mistake!
list_of_vects = [[2,3,4,5,6], [2,3,4,5,6], [2,3,4,5,6]]
$ ./multi-simm.py
wfp: 5.887076992907251e-15
multi-simm: 0.9999999999999999
rescaled-multi-simm: 0.9999999999999999
Next, all patterns "disjoint", this time we expect 0, else we made a mistake:
list_of_vects = [[5,0,0,0], [0,-5,0,0], [0,0,-5,0], [0,0,0,5]]
$ ./multi-simm.py
wfp: 20.0
multi-simm: 0.0
rescaled-multi-simm: 0.0
Next, test that rescaling works, and gives a different answer to non-rescaled:
list_of_vects = [[2,3,4,5,6], [4,6,8,10,12], [6,9,12,15,18]]
$ ./multi-simm.py
wfp: 34.641016151377556
multi-simm: 0.47421657693679137
rescaled-multi-simm: 0.9999999999999999
And finally, a test case where we don't expect 0 or 1:
list_of_vects = [[2,3,4,5,6], [6,5,4,3,2], [2,4,3,5,6], [2,4,5,3,6]]
$ ./multi-simm.py
wfp: 10.82842712474619
multi-simm: 0.8646446609406727
rescaled-multi-simm: 0.8646446609406726
Cool. It all seems to work as desired. Heh, now I need to find a use case for p > 2.
Home
previous: p pattern similarity metric in latex
next: normalizing mnist digits
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org