more htm sequence learning
So, today I wrote a little bit of python to auto generate sw files. Now my HTM sequence learning is pretty much automated. You specify the HTM parameters, the sequences you want to learn, and run it. Then load that sw file in the console, and invoke a pretty table.
These are the needed parameters:
# number of on bits:
bits = 40
# total number of bits:
total_bits = 65536
# column size:
column_size = 10
# destination file:
destination = "sw-examples/high-order-sequences.sw"
# drop below threshold:
# use 0 for off
#t = 0.01
#t = 0.1
#t = 0.05
t = 0.5
# data:
# NB: sequence elements don't have to be single letters. Anything separated by space will work fine.
#data = ["a b c d e", "A B C D E F G", "X B C Y Z x y z","one two three four five six seven"]
data = ["count one two three four five six seven","Fibonacci one one two three five eight thirteen","factorial one two six twenty-four one-hundred-twenty"]
Now, an example:
$ ./learn-high-order-sequences.py
$ ./the_semantic_db_console.py
Welcome!
sa: load high-order-sequences.sw
sa: the-table
+--------------------+--------+----------+----------------------------+----------------------------+-------------------------+----------------------------+---------------+
| ket | step-1 | step-2 | step-3 | step-4 | step-5 | step-6 | step-7 |
+--------------------+--------+----------+----------------------------+----------------------------+-------------------------+----------------------------+---------------+
| count | one | 1.00 two | 1.00 three | 1.00 four | 1.00 five | 1.00 six, 0.03 twenty-four | 1.00 seven |
| one | | | | | | | |
| two | | | | | | | |
| three | | | | | | | |
| four | | | | | | | |
| five | | | | | | | |
| six | | | | | | | |
| seven | | | | | | | |
| Fibonacci | one | 1.00 one | 1.00 two | 1.00 three | 1.00 five | 1.00 eight | 1.00 thirteen |
| eight | | | | | | | |
| thirteen | | | | | | | |
| factorial | one | 1.00 two | 1.00 six, 0.03 twenty-four | 1.00 twenty-four, 0.03 six | 1.00 one-hundred-twenty | | |
| twenty-four | | | | | | | |
| one-hundred-twenty | | | | | | | |
+--------------------+--------+----------+----------------------------+----------------------------+-------------------------+----------------------------+---------------+
which works exactly as expected. Though seems the patterns for six and twenty-four have a little overlap, even at t = 0.5. Not sure how large t would have to be to remove that. Maybe something big like t = 0.9 or something?
Now, if we dial t down to say 0.05, then there is a whole lot less certainty about our sequences. Here are the first 3 steps:
sa: table[ket,step-1,step-2,step-3] rel-kets[encode]
+--------------------+---------------------------------------------+---------------------------------------------------------+--------------------------------------------------------------------+
| ket | step-1 | step-2 | step-3 |
+--------------------+---------------------------------------------+---------------------------------------------------------+--------------------------------------------------------------------+
| count | one | 1.00 two, 0.03 one-hundred-twenty | 0.90 three, 0.10 six |
| one | 0.75 two, 0.25 one, 0.02 one-hundred-twenty | 0.53 three, 0.26 six, 0.21 two, 0.01 one-hundred-twenty | 0.28 four, 0.28 five, 0.24 twenty-four, 0.20 three |
| two | 0.67 three, 0.33 six | 0.35 four, 0.35 five, 0.31 twenty-four | 0.32 five, 0.32 eight, 0.29 one-hundred-twenty, 0.07 six, 0.01 two |
| three | 0.50 four, 0.50 five | 0.45 five, 0.45 eight, 0.10 six | 0.41 thirteen, 0.41 six, 0.10 seven, 0.09 eight |
| four | 1.00 five | 0.82 six, 0.18 eight | 0.82 seven, 0.20 thirteen |
| five | 0.50 six, 0.50 eight | 0.51 seven, 0.51 thirteen | |
| six | 0.50 seven, 0.50 twenty-four, 0.01 thirteen | 1.00 one-hundred-twenty, 0.03 two | |
| seven | | | |
| Fibonacci | one | 0.83 one, 0.17 two, 0.00 one-hundred-twenty | 0.76 two, 0.08 three, 0.08 six, 0.08 one, 0.02 one-hundred-twenty |
| eight | 1.00 thirteen, 0.02 seven | | |
| thirteen | | | |
| factorial | one | 0.91 two, 0.09 one, 0.02 one-hundred-twenty | 0.75 six, 0.17 three, 0.08 two, 0.00 one-hundred-twenty |
| twenty-four | 1.00 one-hundred-twenty, 0.02 two | | |
| one-hundred-twenty | | | |
+--------------------+---------------------------------------------+---------------------------------------------------------+--------------------------------------------------------------------+
which is still nice and consistent with what we expect from the sequence learning algorithm. eg, with no context you don't know if six should be followed by seven, or twenty-four, depending on if you are counting numbers, or recalling factorial. So we see 50% chance of either. The 1% thirteen is just noise. Likewise, five is followed by six and eight with 50% chance, again depending on if we are counting or recalling Fibonacci.
We also need to observe a difference here between SDR's (sparse distributed representations) and my superpositions. SDR's are strictly boolean. Superpositions use floats, and I think that produces better results. Besides, there are plenty of other places in my project where having floats makes things possible that would not be possible with just 0 and 1. The biological interpretation of these floats, is an open question. I just assumed it is some time average of spikes. Perhaps floats are not biologically plausible? I suspect they are, but don't know for sure.
Next, the above is in proof of concept land. That seems to be the fate of my language. Proof of concept something in a few lines of code, then later implement a faster, but much longer, fully python version :(. For example, I don't think the above version would scale all that well. But it does show that Hawkins' sequence learning algorithm works beautifully, and that my if-then machines are interesting and useful.
Finally, if the above tables line-wrap, maybe the raw text files will display better:
count-fibonacci-factorial--t-0_5.txt
count-fibonacci-factorial--t-0_05.txt
count-fibonacci-factorial--t-0_05--3-steps.txt
Home
previous: learning sequences htm style using if then machines
next: random encode similarity matrices
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org