# p pattern similarity metric in python

Last post we gave the p pattern similarity metric in nice pretty LaTeX. In this post, verify my maths by actually implementing it in python. Yeah, there were a couple of steps I wasn't 100% confident in, but the python says I'm fine.

Here is the python:
```import math
import cmath

# define p'th roots of unity:
def jpk(p,k):
return cmath.exp(1j*2*math.pi*k/p)

# define wf_k:
def wf(vect):
return sum(abs(x) for x in vect)

# define wf^p:
def wfp(vects):
p = len(vects)
i_max = len(vects[0])          # assume all vects are the same size as the first one.
r1 = 0
for i in range(i_max):
r2 = 0
for k in range(p):
r2 += jpk(p,k)*vects[k][i]
r1 += abs(r2)
return r1

def multi_simm(vects):
p = len(vects)
i_max = len(vects[0])          # assume all vects are the same size as the first one.

# sum over wf_k term:
r1 = 0
max_wf = 0
for k in range(p):
wf_k = wf(vects[k])
max_wf = max(max_wf,wf_k)
r1 += wf_k

# wf^p term:
r2 = wfp(vects)

# p.max term:
r3 = p*max_wf

# prevent divide by 0:
if r3 == 0:
return 0

# return result:
return (r1 - r2)/r3

def rescaled_multi_simm(vects):
p = len(vects)
i_max = len(vects[0])          # assume all vects are the same size as the first one.

# find normalization terms:
norms = []
for k in range(p):
wf_k = wf(vects[k])
if wf_k == 0:                # prevent divide by zero
return 0
norms.append(wf_k)

# find normalized wf^p:
r1 = 0
for i in range(i_max):
r2 = 0
for k in range(p):
r2 += jpk(p,k)*vects[k][i]/norms[k]
r1 += abs(r2)

# return result:
return 1 - r1/p

# test the code:
print("wfp: %s" % wfp(list_of_vects))
print("multi-simm: %s" % multi_simm(list_of_vects))
print("rescaled-multi-simm: %s" % rescaled_multi_simm(list_of_vects))
```
Now, some test cases. First, with all patterns equal, which should give 1, else we made a mistake!
```list_of_vects = [[2,3,4,5,6], [2,3,4,5,6], [2,3,4,5,6]]

\$ ./multi-simm.py
wfp: 5.887076992907251e-15
multi-simm: 0.9999999999999999
rescaled-multi-simm: 0.9999999999999999
```
Next, all patterns "disjoint", this time we expect 0, else we made a mistake:
```list_of_vects = [[5,0,0,0], [0,-5,0,0], [0,0,-5,0], [0,0,0,5]]

\$ ./multi-simm.py
wfp: 20.0
multi-simm: 0.0
rescaled-multi-simm: 0.0
```
Next, test that rescaling works, and gives a different answer to non-rescaled:
```list_of_vects = [[2,3,4,5,6], [4,6,8,10,12], [6,9,12,15,18]]

\$ ./multi-simm.py
wfp: 34.641016151377556
multi-simm: 0.47421657693679137
rescaled-multi-simm: 0.9999999999999999
```
And finally, a test case where we don't expect 0 or 1:
```list_of_vects = [[2,3,4,5,6], [6,5,4,3,2], [2,4,3,5,6], [2,4,5,3,6]]

\$ ./multi-simm.py
wfp: 10.82842712474619
multi-simm: 0.8646446609406727
rescaled-multi-simm: 0.8646446609406726
```
Cool. It all seems to work as desired. Heh, now I need to find a use case for p > 2.

Home
previous: p pattern similarity metric in latex
next: normalizing mnist digits

updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org