random encode similarity matrices
In the HTM style sequence learning, there is an initial step where we random encode each object in the sequence. In this post, I want to have a look at the corresponding similarity matrices, and visually see how much overlap there are between the objects.
Here is the relevant code:
full |range> => range(|1>,|65536>)
encode |count> => pick[40] full |range>
encode |one> => pick[40] full |range>
encode |two> => pick[40] full |range>
encode |three> => pick[40] full |range>
encode |four> => pick[40] full |range>
encode |five> => pick[40] full |range>
encode |six> => pick[40] full |range>
encode |seven> => pick[40] full |range>
encode |Fibonacci> => pick[40] full |range>
encode |eight> => pick[40] full |range>
encode |thirteen> => pick[40] full |range>
encode |factorial> => pick[40] full |range>
encode |twenty-four> => pick[40] full |range>
encode |one-hundred-twenty> => pick[40] full |range>
Now, let's see the resulting similarity matrix:
simm-op |*> #=> 100 self-similar[encode] |_self>
map[simm-op,similarity] rel-kets[encode]
matrix[similarity]
[ count ] = [ 100.0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ count ]
[ eight ] [ 0 100.0 0 0 0 0 0 2.5 0 0 0 0 0 0 ] [ eight ]
[ factorial ] [ 0 0 100.0 0 0 0 0 0 0 0 0 2.5 0 0 ] [ factorial ]
[ Fibonacci ] [ 0 0 0 100.0 0 0 0 0 0 0 0 0 0 0 ] [ Fibonacci ]
[ five ] [ 0 0 0 0 100.0 0 0 0 0 0 0 0 0 0 ] [ five ]
[ four ] [ 0 0 0 0 0 100.0 0 0 0 0 0 0 0 0 ] [ four ]
[ one ] [ 0 0 0 0 0 0 100.0 0 0 0 0 0 0 0 ] [ one ]
[ one-hundred-twenty ] [ 0 2.5 0 0 0 0 0 100.0 0 0 0 0 0 0 ] [ one-hundred-twenty ]
[ seven ] [ 0 0 0 0 0 0 0 0 100.0 0 0 0 0 0 ] [ seven ]
[ six ] [ 0 0 0 0 0 0 0 0 0 100.0 0 0 0 0 ] [ six ]
[ thirteen ] [ 0 0 0 0 0 0 0 0 0 0 100.0 0 2.5 0 ] [ thirteen ]
[ three ] [ 0 0 2.5 0 0 0 0 0 0 0 0 100.0 0 0 ] [ three ]
[ twenty-four ] [ 0 0 0 0 0 0 0 0 0 0 2.5 0 100.0 0 ] [ twenty-four ]
[ two ] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 100.0 ] [ two ]
Now, let's repeat, this time using only 2048 length vectors:
full |range> => range(|1>,|2048>)
encode |count> => pick[40] full |range>
encode |one> => pick[40] full |range>
encode |two> => pick[40] full |range>
encode |three> => pick[40] full |range>
encode |four> => pick[40] full |range>
encode |five> => pick[40] full |range>
encode |six> => pick[40] full |range>
encode |seven> => pick[40] full |range>
encode |Fibonacci> => pick[40] full |range>
encode |eight> => pick[40] full |range>
encode |thirteen> => pick[40] full |range>
encode |factorial> => pick[40] full |range>
encode |twenty-four> => pick[40] full |range>
encode |one-hundred-twenty> => pick[40] full |range>
simm-op |*> #=> 100 self-similar[encode] |_self>
map[simm-op,similarity] rel-kets[encode]
matrix[similarity]
[ count ] = [ 100.0 2.5 0 0 0 2.5 0 2.5 0 5 2.5 0 0 2.5 ] [ count ]
[ eight ] [ 2.5 100.0 2.5 2.5 0 0 2.5 0 5 2.5 5 7.5 5 0 ] [ eight ]
[ factorial ] [ 0 2.5 100.0 2.5 0 2.5 0 2.5 5 5 5 2.5 0 2.5 ] [ factorial ]
[ Fibonacci ] [ 0 2.5 2.5 100.0 0 5 7.5 2.5 0 2.5 5 5 2.5 2.5 ] [ Fibonacci ]
[ five ] [ 0 0 0 0 100.0 2.5 2.5 0 2.5 0 5 2.5 0 0 ] [ five ]
[ four ] [ 2.5 0 2.5 5 2.5 100.0 5 2.5 0 0 0 2.5 0 0 ] [ four ]
[ one ] [ 0 2.5 0 7.5 2.5 5 100.0 2.5 2.5 0 2.5 0 2.5 0 ] [ one ]
[ one-hundred-twenty ] [ 2.5 0 2.5 2.5 0 2.5 2.5 100.0 2.5 5 0 0 5 0 ] [ one-hundred-twenty ]
[ seven ] [ 0 5 5 0 2.5 0 2.5 2.5 100.0 5 2.5 0 0 5 ] [ seven ]
[ six ] [ 5 2.5 5 2.5 0 0 0 5 5 100.0 0 0 2.5 2.5 ] [ six ]
[ thirteen ] [ 2.5 5 5 5 5 0 2.5 0 2.5 0 100.0 2.5 0 2.5 ] [ thirteen ]
[ three ] [ 0 7.5 2.5 5 2.5 2.5 0 0 0 0 2.5 100.0 0 0 ] [ three ]
[ twenty-four ] [ 0 5 0 2.5 0 0 2.5 5 0 2.5 0 0 100.0 2.5 ] [ twenty-four ]
[ two ] [ 2.5 0 2.5 2.5 0 0 0 0 5 2.5 2.5 0 2.5 100.0 ] [ two ]
Which confirms that 2048 is too small for my code. I wonder what happens with something bigger than 65536? Let's try 1000000:
[ count ] = [ 100.0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ count ]
[ eight ] [ 0 100.0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ eight ]
[ factorial ] [ 0 0 100.0 0 0 0 0 0 0 0 0 0 0 0 ] [ factorial ]
[ Fibonacci ] [ 0 0 0 100.0 0 0 0 0 0 0 0 0 0 0 ] [ Fibonacci ]
[ five ] [ 0 0 0 0 100.0 0 0 0 0 0 0 0 0 0 ] [ five ]
[ four ] [ 0 0 0 0 0 100.0 0 0 0 0 0 0 0 0 ] [ four ]
[ one ] [ 0 0 0 0 0 0 100.0 0 0 0 0 0 0 0 ] [ one ]
[ one-hundred-twenty ] [ 0 0 0 0 0 0 0 100.0 0 0 0 0 0 0 ] [ one-hundred-twenty ]
[ seven ] [ 0 0 0 0 0 0 0 0 100.0 0 0 0 0 0 ] [ seven ]
[ six ] [ 0 0 0 0 0 0 0 0 0 100.0 0 0 0 0 ] [ six ]
[ thirteen ] [ 0 0 0 0 0 0 0 0 0 0 100.0 0 0 0 ] [ thirteen ]
[ three ] [ 0 0 0 0 0 0 0 0 0 0 0 100.0 0 0 ] [ three ]
[ twenty-four ] [ 0 0 0 0 0 0 0 0 0 0 0 0 100.0 0 ] [ twenty-four ]
[ two ] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 100.0 ] [ two ]
Which is good. Now there are no overlaps between the encodings at all. But 1,000,000 is way too slow in my code. Not exactly sure why range() is so slow yet. The other question is how many objects can you encode before you get too many collisions? I don't know.
That's it for this post.
Home
previous: more htm sequence learning
next: simple if then machine classifier
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org