website similarity matrices
This time, take a look at how similar websites are to themselves over the 11 days.
Here is the BKO:
-- create website similarity matrices:
-- list of abc websites, note we include |abc 11> and |average abc>
|full abc list> => |abc 1> + |abc 2> + |abc 3> + |abc 4> + |abc 5> + |abc 6> + |abc 7> + |abc 8> + |abc 9> + |abc 10> + |abc 11> + |average abc>
-- we want abc-hash to be distinct from standard hash, to reduce the matrix to abc only
abc-hash-4B |*> #=> hash-4B |_self>
|null> => map[abc-hash-4B] "" |full abc list>
-- we want the abc-simm to be distinct from the standard simm, to reduce the matrix to abc only
abc-simm |*> #=> 100 self-similar[abc-hash-4B] |_self>
|null> => map[abc-simm,abc-similarity] "" |full abc list>
-- now the rest of them:
|full adelaidenow list> => |adelaidenow 1> + |adelaidenow 2> + |adelaidenow 3> + |adelaidenow 4> + |adelaidenow 5> + |adelaidenow 6> + |adelaidenow 7> + |adelaidenow 8> + |adelaidenow 9> + |adelaidenow 10> + |adelaidenow 11> + |average adelaidenow>
adelaidenow-hash-4B |*> #=> hash-4B |_self>
|null> => map[adelaidenow-hash-4B] "" |full adelaidenow list>
adelaidenow-simm |*> #=> 100 self-similar[adelaidenow-hash-4B] |_self>
|null> => map[adelaidenow-simm,adelaidenow-similarity] "" |full adelaidenow list>
|full slashdot list> => |slashdot 1> + |slashdot 2> + |slashdot 3> + |slashdot 4> + |slashdot 5> + |slashdot 6> + |slashdot 7> + |slashdot 8> + |slashdot 9> + |slashdot 10> + |slashdot 11> + |average slashdot>
slashdot-hash-4B |*> #=> hash-4B |_self>
|null> => map[slashdot-hash-4B] "" |full slashdot list>
slashdot-simm |*> #=> 100 self-similar[slashdot-hash-4B] |_self>
|null> => map[slashdot-simm,slashdot-similarity] "" |full slashdot list>
|full smh list> => |smh 1> + |smh 2> + |smh 3> + |smh 4> + |smh 5> + |smh 6> + |smh 7> + |smh 8> + |smh 9> + |smh 10> + |smh 11> + |average smh>
smh-hash-4B |*> #=> hash-4B |_self>
|null> => map[smh-hash-4B] "" |full smh list>
smh-simm |*> #=> 100 self-similar[smh-hash-4B] |_self>
|null> => map[smh-simm,smh-similarity] "" |full smh list>
|full wikipedia list> => |wikipedia 1> + |wikipedia 2> + |wikipedia 3> + |wikipedia 4> + |wikipedia 5> + |wikipedia 6> + |wikipedia 7> + |wikipedia 8> + |wikipedia 9> + |wikipedia 10> + |wikipedia 11> + |average wikipedia>
wikipedia-hash-4B |*> #=> hash-4B |_self>
|null> => map[wikipedia-hash-4B] "" |full wikipedia list>
wikipedia-simm |*> #=> 100 self-similar[wikipedia-hash-4B] |_self>
|null> => map[wikipedia-simm,wikipedia-similarity] "" |full wikipedia list>
|full youtube list> => |youtube 1> + |youtube 2> + |youtube 3> + |youtube 4> + |youtube 5> + |youtube 6> + |youtube 7> + |youtube 8> + |youtube 9> + |youtube 10> + |youtube 11> + |average youtube>
youtube-hash-4B |*> #=> hash-4B |_self>
|null> => map[youtube-hash-4B] "" |full youtube list>
youtube-simm |*> #=> 100 self-similar[youtube-hash-4B] |_self>
|null> => map[youtube-simm,youtube-similarity] "" |full youtube list>
Here are the resulting matrices:
-- load the data:
sa: load improved-fragment-webpages.sw
sa: load create-average-website-fragments.sw
sa: load create-website-similarity-matrices.sw
-- show our resulting matrices:
sa: matrix[abc-similarity]
[ abc 1 ] = [ 100.00 95.52 95.03 91.85 91.36 91.86 91.88 91.85 92.19 92.19 91.42 93.47 ] [ abc 1 ]
[ abc 2 ] [ 95.52 100.00 95.50 91.86 91.38 91.91 92.04 91.71 92.41 92.47 91.25 93.51 ] [ abc 2 ]
[ abc 3 ] [ 95.03 95.50 100.00 91.80 91.32 92.03 92.10 91.66 92.41 92.41 91.20 93.46 ] [ abc 3 ]
[ abc 4 ] [ 91.85 91.86 91.80 100.00 92.64 92.55 91.80 92.19 91.69 91.68 92.13 92.93 ] [ abc 4 ]
[ abc 5 ] [ 91.36 91.38 91.32 92.64 100.00 92.44 91.40 91.85 91.29 91.25 91.41 92.56 ] [ abc 5 ]
[ abc 6 ] [ 91.86 91.91 92.03 92.55 92.44 100.00 92.95 92.78 91.93 91.90 91.59 93.23 ] [ abc 6 ]
[ abc 7 ] [ 91.88 92.04 92.10 91.80 91.40 92.95 100.00 93.05 93.13 93.02 91.74 93.16 ] [ abc 7 ]
[ abc 8 ] [ 91.85 91.71 91.66 92.19 91.85 92.78 93.05 100.00 93.71 93.52 92.06 93.40 ] [ abc 8 ]
[ abc 9 ] [ 92.19 92.41 92.41 91.69 91.29 91.93 93.13 93.71 100.00 95.57 91.56 93.46 ] [ abc 9 ]
[ abc 10 ] [ 92.19 92.47 92.41 91.68 91.25 91.90 93.02 93.52 95.57 100.00 91.61 93.45 ] [ abc 10 ]
[ abc 11 ] [ 91.42 91.25 91.20 92.13 91.41 91.59 91.74 92.06 91.56 91.61 100.00 91.70 ] [ abc 11 ]
[ average abc ] [ 93.47 93.51 93.46 92.93 92.56 93.23 93.16 93.40 93.46 93.45 91.70 100.00 ] [ average abc ]
sa: matrix[adelaidenow-similarity]
[ adelaidenow 1 ] = [ 100.00 86.13 83.22 78.11 77.27 76.23 75.86 75.61 75.79 76.08 76.13 80.64 ] [ adelaidenow 1 ]
[ adelaidenow 2 ] [ 86.13 100.00 87.38 81.46 77.52 77.60 77.07 76.90 76.26 77.26 76.53 82.36 ] [ adelaidenow 2 ]
[ adelaidenow 3 ] [ 83.22 87.38 100.00 83.60 78.24 77.29 76.68 76.56 76.45 77.41 76.22 82.18 ] [ adelaidenow 3 ]
[ adelaidenow 4 ] [ 78.11 81.46 83.60 100.00 83.39 78.50 77.28 77.24 75.75 76.77 76.67 81.82 ] [ adelaidenow 4 ]
[ adelaidenow 5 ] [ 77.27 77.52 78.24 83.39 100.00 81.76 77.29 76.84 76.31 76.39 77.43 81.12 ] [ adelaidenow 5 ]
[ adelaidenow 6 ] [ 76.23 77.60 77.29 78.50 81.76 100.00 79.82 78.85 76.34 77.09 76.98 81.08 ] [ adelaidenow 6 ]
[ adelaidenow 7 ] [ 75.86 77.07 76.68 77.28 77.29 79.82 100.00 84.92 78.38 77.13 76.90 81.08 ] [ adelaidenow 7 ]
[ adelaidenow 8 ] [ 75.61 76.90 76.56 77.24 76.84 78.85 84.92 100.00 82.06 79.31 77.56 81.59 ] [ adelaidenow 8 ]
[ adelaidenow 9 ] [ 75.79 76.26 76.45 75.75 76.31 76.34 78.38 82.06 100.00 85.68 78.15 80.79 ] [ adelaidenow 9 ]
[ adelaidenow 10 ] [ 76.08 77.26 77.41 76.77 76.39 77.09 77.13 79.31 85.68 100.00 82.97 80.86 ] [ adelaidenow 10 ]
[ adelaidenow 11 ] [ 76.13 76.53 76.22 76.67 77.43 76.98 76.90 77.56 78.15 82.97 100.00 78.11 ] [ adelaidenow 11 ]
[ average adelaidenow ] [ 80.64 82.36 82.18 81.82 81.12 81.08 81.08 81.59 80.79 80.86 78.11 100.00 ] [ average adelaidenow ]
sa: matrix[slashdot-similarity]
[ average slashdot ] = [ 100.00 81.12 80.99 81.20 81.45 81.67 81.31 81.17 81.37 81.24 81.09 79.05 ] [ average slashdot ]
[ slashdot 1 ] [ 81.12 100.00 79.43 77.62 79.45 78.59 78.19 78.66 78.62 77.47 79.05 79.06 ] [ slashdot 1 ]
[ slashdot 2 ] [ 80.99 79.43 100.00 79.26 78.77 78.15 79.32 77.31 77.63 78.11 78.44 77.96 ] [ slashdot 2 ]
[ slashdot 3 ] [ 81.20 77.62 79.26 100.00 79.08 78.22 79.18 78.34 77.85 78.71 78.50 78.37 ] [ slashdot 3 ]
[ slashdot 4 ] [ 81.45 79.45 78.77 79.08 100.00 81.20 78.27 79.20 78.12 78.15 78.83 78.82 ] [ slashdot 4 ]
[ slashdot 5 ] [ 81.67 78.59 78.15 78.22 81.20 100.00 78.58 79.85 79.29 79.04 78.56 77.78 ] [ slashdot 5 ]
[ slashdot 6 ] [ 81.31 78.19 79.32 79.18 78.27 78.58 100.00 78.62 78.54 79.33 78.07 78.00 ] [ slashdot 6 ]
[ slashdot 7 ] [ 81.17 78.66 77.31 78.34 79.20 79.85 78.62 100.00 80.17 78.86 78.40 78.65 ] [ slashdot 7 ]
[ slashdot 8 ] [ 81.37 78.62 77.63 77.85 78.12 79.29 78.54 80.17 100.00 79.38 79.60 79.02 ] [ slashdot 8 ]
[ slashdot 9 ] [ 81.24 77.47 78.11 78.71 78.15 79.04 79.33 78.86 79.38 100.00 78.32 78.34 ] [ slashdot 9 ]
[ slashdot 10 ] [ 81.09 79.05 78.44 78.50 78.83 78.56 78.07 78.40 79.60 78.32 100.00 81.39 ] [ slashdot 10 ]
[ slashdot 11 ] [ 79.05 79.06 77.96 78.37 78.82 77.78 78.00 78.65 79.02 78.34 81.39 100.00 ] [ slashdot 11 ]
sa: matrix[smh-similarity]
[ average smh ] = [ 100.00 87.76 88.16 87.95 87.62 86.82 87.31 87.12 87.72 87.54 87.61 85.55 ] [ average smh ]
[ smh 1 ] [ 87.76 100.00 89.44 87.69 86.23 85.33 85.30 84.97 85.34 84.70 85.04 84.75 ] [ smh 1 ]
[ smh 2 ] [ 88.16 89.44 100.00 89.80 86.25 85.27 85.58 85.36 85.54 85.61 85.38 85.40 ] [ smh 2 ]
[ smh 3 ] [ 87.95 87.69 89.80 100.00 86.81 85.04 85.31 85.40 85.21 85.19 85.47 84.93 ] [ smh 3 ]
[ smh 4 ] [ 87.62 86.23 86.25 86.81 100.00 86.63 86.12 85.06 85.64 85.15 85.34 85.00 ] [ smh 4 ]
[ smh 5 ] [ 86.82 85.33 85.27 85.04 86.63 100.00 85.24 84.36 85.20 84.55 85.00 84.99 ] [ smh 5 ]
[ smh 6 ] [ 87.31 85.30 85.58 85.31 86.12 85.24 100.00 86.19 86.59 85.03 85.26 85.81 ] [ smh 6 ]
[ smh 7 ] [ 87.12 84.97 85.36 85.40 85.06 84.36 86.19 100.00 86.36 85.48 85.38 85.48 ] [ smh 7 ]
[ smh 8 ] [ 87.72 85.34 85.54 85.21 85.64 85.20 86.59 86.36 100.00 87.76 86.94 85.89 ] [ smh 8 ]
[ smh 9 ] [ 87.54 84.70 85.61 85.19 85.15 84.55 85.03 85.48 87.76 100.00 90.65 85.16 ] [ smh 9 ]
[ smh 10 ] [ 87.61 85.04 85.38 85.47 85.34 85.00 85.26 85.38 86.94 90.65 100.00 85.75 ] [ smh 10 ]
[ smh 11 ] [ 85.55 84.75 85.40 84.93 85.00 84.99 85.81 85.48 85.89 85.16 85.75 100.00 ] [ smh 11 ]
sa: matrix[wikipedia-similarity]
[ average wikipedia ] = [ 100.00 87.67 87.51 87.82 85.86 88.06 88.10 87.71 86.65 87.33 87.47 85.19 ] [ average wikipedia ]
[ wikipedia 1 ] [ 87.67 100.00 88.60 87.80 85.57 85.41 85.93 84.48 84.13 83.71 84.48 84.21 ] [ wikipedia 1 ]
[ wikipedia 2 ] [ 87.51 88.60 100.00 89.28 84.95 85.85 86.16 85.72 82.95 83.88 86.01 82.89 ] [ wikipedia 2 ]
[ wikipedia 3 ] [ 87.82 87.80 89.28 100.00 85.36 86.57 86.41 85.67 83.09 84.50 85.48 83.13 ] [ wikipedia 3 ]
[ wikipedia 4 ] [ 85.86 85.57 84.95 85.36 100.00 84.04 83.77 82.42 83.65 82.37 82.08 83.32 ] [ wikipedia 4 ]
[ wikipedia 5 ] [ 88.06 85.41 85.85 86.57 84.04 100.00 88.34 86.72 84.71 85.80 85.91 84.22 ] [ wikipedia 5 ]
[ wikipedia 6 ] [ 88.10 85.93 86.16 86.41 83.77 88.34 100.00 87.70 85.43 85.85 86.64 84.31 ] [ wikipedia 6 ]
[ wikipedia 7 ] [ 87.71 84.48 85.72 85.67 82.42 86.72 87.70 100.00 86.18 87.02 88.46 85.24 ] [ wikipedia 7 ]
[ wikipedia 8 ] [ 86.65 84.13 82.95 83.09 83.65 84.71 85.43 86.18 100.00 85.88 85.55 86.28 ] [ wikipedia 8 ]
[ wikipedia 9 ] [ 87.33 83.71 83.88 84.50 82.37 85.80 85.85 87.02 85.88 100.00 88.10 86.46 ] [ wikipedia 9 ]
[ wikipedia 10 ] [ 87.47 84.48 86.01 85.48 82.08 85.91 86.64 88.46 85.55 88.10 100.00 87.17 ] [ wikipedia 10 ]
[ wikipedia 11 ] [ 85.19 84.21 82.89 83.13 83.32 84.22 84.31 85.24 86.28 86.46 87.17 100.00 ] [ wikipedia 11 ]
sa: matrix[youtube-similarity]
[ average youtube ] = [ 100.00 85.21 84.63 85.38 85.98 86.04 85.33 84.43 84.90 81.10 84.43 82.12 ] [ average youtube ]
[ youtube 1 ] [ 85.21 100.00 83.96 84.76 85.73 85.18 85.04 80.41 81.58 77.43 82.51 79.80 ] [ youtube 1 ]
[ youtube 2 ] [ 84.63 83.96 100.00 87.30 85.63 84.66 81.56 81.12 79.96 79.08 78.90 79.32 ] [ youtube 2 ]
[ youtube 3 ] [ 85.38 84.76 87.30 100.00 86.54 87.12 83.85 80.31 80.86 78.14 80.15 78.80 ] [ youtube 3 ]
[ youtube 4 ] [ 85.98 85.73 85.63 86.54 100.00 89.46 85.78 81.19 82.22 76.97 81.13 79.50 ] [ youtube 4 ]
[ youtube 5 ] [ 86.04 85.18 84.66 87.12 89.46 100.00 86.87 81.77 81.96 77.08 81.08 79.79 ] [ youtube 5 ]
[ youtube 6 ] [ 85.33 85.04 81.56 83.85 85.78 86.87 100.00 82.21 82.71 78.36 81.86 80.81 ] [ youtube 6 ]
[ youtube 7 ] [ 84.43 80.41 81.12 80.31 81.19 81.77 82.21 100.00 85.97 82.26 84.98 85.98 ] [ youtube 7 ]
[ youtube 8 ] [ 84.90 81.58 79.96 80.86 82.22 81.96 82.71 85.97 100.00 81.99 87.14 84.61 ] [ youtube 8 ]
[ youtube 9 ] [ 81.10 77.43 79.08 78.14 76.97 77.08 78.36 82.26 81.99 100.00 82.12 82.82 ] [ youtube 9 ]
[ youtube 10 ] [ 84.43 82.51 78.90 80.15 81.13 81.08 81.86 84.98 87.14 82.12 100.00 86.58 ] [ youtube 10 ]
[ youtube 11 ] [ 82.12 79.80 79.32 78.80 79.50 79.79 80.81 85.98 84.61 82.82 86.58 100.00 ] [ youtube 11 ]
OK. That is kind of cool. Though the matrices will presumably line-wrap if your screen is too small. Take home message, all webpages are greater than 75% similar with themselves over the 11 day period. Which I guess means we don't even need to average over 10 days! Presumably the average will give better results though.
Home
previous: creating average webpage superpositions
next: the main event pattern recognition of websites
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org