the main event pattern recognition of websites
We are finally there! We deliberately left |website 11> out of our average website hashes. Now, given those as an input, do they classify correctly?
Here is the BKO:
-- define the list of average websites:
|ave list> => |average abc> + |average adelaidenow> + |average slashdot> + |average smh> + |average wikipedia> + |average youtube>
-- we want average hash to be distinct from the other hashes:
|null> => map[hash-4B,average-hash-4B] "" |ave list>
-- now, let's see how well these patterns recognize the pages we left out of our average:
result |abc 11> => 100 similar[hash-4B,average-hash-4B] |abc 11>
result |adelaidenow 11> => 100 similar[hash-4B,average-hash-4B] |adelaidenow 11>
result |slashdot 11> => 100 similar[hash-4B,average-hash-4B] |slashdot 11>
result |smh 11> => 100 similar[hash-4B,average-hash-4B] |smh 11>
result |wikipedia 11> => 100 similar[hash-4B,average-hash-4B] |wikipedia 11>
result |youtube 11> => 100 similar[hash-4B,average-hash-4B] |youtube 11>
-- tidy results:
tidy-result |abc 11> => drop-below[40] result |_self>
tidy-result |adelaidenow 11> => drop-below[40] result |_self>
tidy-result |slashdot 11> => drop-below[40] result |_self>
tidy-result |smh 11> => drop-below[40] result |_self>
tidy-result |wikipedia 11> => drop-below[40] result |_self>
tidy-result |youtube 11> => drop-below[40] result |_self>
And now, drum-roll, the results!
sa: load improved-fragment-webpages.sw
sa: load create-average-website-fragments.sw
sa: load create-website-pattern-recognition-matrix.sw
sa: matrix[result]
[ average abc ] = [ 91.70 28.73 25.76 37.77 29.45 24.33 ] [ abc 11 ]
[ average adelaidenow ] [ 28.77 78.11 26.71 29.85 25.25 28.18 ] [ adelaidenow 11 ]
[ average slashdot ] [ 25.76 26.88 79.05 28.27 26.86 23.20 ] [ slashdot 11 ]
[ average smh ] [ 37.80 29.75 28.16 85.55 32.06 24.95 ] [ smh 11 ]
[ average wikipedia ] [ 29.71 25.25 26.91 31.86 85.19 22.09 ] [ wikipedia 11 ]
[ average youtube ] [ 24.32 28.18 23.47 24.92 21.94 82.12 ] [ youtube 11 ]
sa: matrix[tidy-result]
[ average abc ] = [ 91.70 0 0 0 0 0 ] [ abc 11 ]
[ average adelaidenow ] [ 0 78.11 0 0 0 0 ] [ adelaidenow 11 ]
[ average slashdot ] [ 0 0 79.05 0 0 0 ] [ slashdot 11 ]
[ average smh ] [ 0 0 0 85.55 0 0 ] [ smh 11 ]
[ average wikipedia ] [ 0 0 0 0 85.19 0 ] [ wikipedia 11 ]
[ average youtube ] [ 0 0 0 0 0 82.12 ] [ youtube 11 ]
Finally, let's look at the discrimination. ie the difference between the highest matching result and the second highest:
sa: discrimination |*> #=> discrim result |_self>
sa: table[page,discrimination] rel-kets[result] |>
+----------------+----------------+
| page | discrimination |
+----------------+----------------+
| abc 11 | 53.90 |
| adelaidenow 11 | 48.36 |
| slashdot 11 | 50.89 |
| smh 11 | 47.78 |
| wikipedia 11 | 53.14 |
| youtube 11 | 53.94 |
+----------------+----------------+
There we have it. Discrimination on the order of 50%! That is good.
Heaps more to come!
Home
previous: website similarity matrices
next: the second bko claim
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org