more inverse simm results
OK. In the last post we discovered similar[inverse-links-to] seems to give some good results. Let's expand our test set, and try it on a few more examples.
-- define our test set:
|list> => |WP: Erwin_Schrdinger> + |WP: Richard_Feynman> + |WP: Cat> + |WP: Dog> + |WP: Apple> + |WP: Adelaide> + |WP: University_of_Adelaide> + |WP: Particle_physics> + |WP: Lisp_(programming_language)> + |WP: APL_(programming_language)> + |WP: SQL> + |WP: SPARQL> + |WP: The_Doors> + |WP: Rugby> + |WP: Australian_Football_League>
-- how many incoming links?
sa: how-many-in-links |*> #=> how-many inverse-links-to |_self>
sa: table[wikipage,how-many-in-links] "" |list>
+-----------------------------+-------------------+
| wikipage | how-many-in-links |
+-----------------------------+-------------------+
| Erwin_Schrdinger | 53 |
| Richard_Feynman | 79 |
| Cat | 14 |
| Dog | 24 |
| Apple | 21 |
| Adelaide | 81 |
| University_of_Adelaide | 10 |
| Particle_physics | 17 |
| Lisp_(programming_language) | 64 |
| APL_(programming_language) | 24 |
| SQL | 41 |
| SPARQL | 6 |
| The_Doors | 41 |
| Rugby | 0 |
| Australian_Football_League | 30 |
+-----------------------------+-------------------+
-- create the data:
sa: inverse-simm-op |WP: *> #=> select[1,500] 100 self-similar[inverse-links-to] |_self>
sa: |null> => map[inverse-simm-op,inverse-simm] "" |list>
-- define an operator to explore the resulting data:
sa: T |*> #=> table[wikipage,coeff] select[1,20] inverse-simm |_self>
-- now our examples:
sa: T |WP: Erwin_Schrdinger>
+---------------------------+--------+
| wikipage | coeff |
+---------------------------+--------+
| Erwin_Schrdinger | 100.0 |
| Max_Born | 32.075 |
| Niels_Bohr | 31.646 |
| Schrdinger_equation | 30.189 |
| Paul_Dirac | 29.31 |
| Wolfgang_Pauli | 28.302 |
| Werner_Heisenberg | 28.049 |
| Max_Planck | 26.984 |
| uncertainty_principle | 26.415 |
| photoelectric_effect | 24.528 |
| Roger_Penrose | 22.642 |
| Bohr_model | 20.755 |
| Arnold_Sommerfeld | 20.755 |
| Louis_de_Broglie | 20.755 |
| wave_function | 20.755 |
| Copenhagen_interpretation | 18.868 |
| quantum_state | 18.868 |
| Ernest_Rutherford | 17.742 |
| Maxwell's_equations | 17.241 |
| Pauli_exclusion_principle | 16.981 |
+---------------------------+--------+
sa: T |WP: Richard_Feynman>
+------------------------------------+--------+
| wikipage | coeff |
+------------------------------------+--------+
| Richard_Feynman | 100.0 |
| Werner_Heisenberg | 24.39 |
| special_relativity | 20.792 |
| Niels_Bohr | 20.253 |
| Paul_Dirac | 20.253 |
| particle_physics | 20.225 |
| classical_mechanics | 20.0 |
| fermion | 18.987 |
| spin_(physics) | 18.987 |
| Standard_Model | 17.722 |
| Schrdinger_equation | 17.722 |
| quantum_field_theory | 17.722 |
| electromagnetism | 17.241 |
| Erwin_Schrdinger | 16.456 |
| Pauli_exclusion_principle | 16.456 |
| quark | 16.456 |
| Stephen_Hawking | 16.456 |
| quantum_electrodynamics | 16.456 |
| Julian_Schwinger | 16.456 |
| Category:Concepts_in_physics | 16.279 |
+------------------------------------+--------+
sa: T |WP: Cat>
+----------+--------+
| wikipage | coeff |
+----------+--------+
| Cat | 100.0 |
| Horse | 31.25 |
| Donkey | 28.571 |
| Goat | 28.571 |
| Elephant | 21.429 |
| Pig | 21.429 |
| Rabbit | 21.429 |
| Deer | 21.429 |
| Mule | 21.429 |
| Goose | 21.429 |
| Dog | 20.833 |
| Sheep | 20 |
| Lion | 18.75 |
| Almond | 14.286 |
| Alder | 14.286 |
| Ant | 14.286 |
| Bear | 14.286 |
| Bee | 14.286 |
| Fox | 14.286 |
| Lizard | 14.286 |
+----------+--------+
sa: T |WP: Dog>
+------------------+--------+
| wikipage | coeff |
+------------------+--------+
| Dog | 100.0 |
| Horse | 29.167 |
| coyote | 22.222 |
| Gray_wolf | 21.429 |
| Arctic_fox | 20.833 |
| Cat | 20.833 |
| Canidae | 20.833 |
| Elephant | 20.833 |
| bobcat | 20.833 |
| red_fox | 20.833 |
| Donkey | 20.833 |
| red_wolf | 20.833 |
| Rabbit | 16.667 |
| African_wild_dog | 16.667 |
| gray_wolf | 16.667 |
| Domestic_sheep | 16.667 |
| dingo | 16.667 |
| Goat | 16.667 |
| Cattle | 16.667 |
| otter | 16.667 |
+------------------+--------+
sa: T |WP: Apple>
+----------------+--------+
| wikipage | coeff |
+----------------+--------+
| Apple | 100.0 |
| Strawberry | 33.333 |
| Cranberry | 23.81 |
| Grape | 23.81 |
| Tomato | 23.81 |
| Cherry | 23.81 |
| Kiwifruit | 19.048 |
| Blackberry | 19.048 |
| plum | 19.048 |
| Lime_(fruit) | 19.048 |
| Pineapple | 19.048 |
| Lemon | 19.048 |
| Blueberry | 19.048 |
| peach | 17.5 |
| pear | 17.391 |
| Orange_(fruit) | 16.667 |
| Pear | 14.286 |
| Banana | 14.286 |
| Peach | 14.286 |
| Squash_(plant) | 14.286 |
+----------------+--------+
sa: T |WP: Adelaide>
+-------------------------------------+--------+
| wikipage | coeff |
+-------------------------------------+--------+
| Adelaide | 100.0 |
| Brisbane | 37.079 |
| Perth | 32.099 |
| South_Australia | 26.042 |
| Melbourne | 18.687 |
| Canberra | 15.044 |
| The_Age | 14.634 |
| Sydney | 14.583 |
| Australian_Broadcasting_Corporation | 13.445 |
| Australian_rules_football | 12.346 |
| Auckland | 12.195 |
| Australian_Labor_Party | 11.111 |
| Darwin,_Northern_Territory | 11.111 |
| Triple_J | 11.111 |
| Seven_Network | 11.111 |
| States_and_territories_of_Australia | 11.111 |
| Karachi | 10.989 |
| Australian_Football_League | 9.877 |
| The_Australian | 9.877 |
| Western_Australia | 8.911 |
+-------------------------------------+--------+
sa: T |WP: University_of_Adelaide>
+----------------------------------------+-------+
| wikipage | coeff |
+----------------------------------------+-------+
| University_of_Adelaide | 100.0 |
| Port_Adelaide_Football_Club | 20 |
| Adelaide_Oval | 20 |
| Adelaide_city_centre | 20 |
| University_of_South_Australia | 20 |
| Port_Adelaide | 20 |
| Australian_Grand_Prix | 20 |
| State_Bank_of_South_Australia | 20 |
| Mount_Lofty | 20 |
| South_Eastern_Freeway | 20 |
| Southern_Expressway_(Australia) | 20 |
| Government_of_South_Australia | 20 |
| Flinders_University_of_South_Australia | 20 |
| South_Australian_Museum | 20 |
| Adelaide_Crows | 20 |
| Glenelg,_South_Australia | 20 |
| Australian_Central_Standard_Time | 20 |
| Australian_Central_Daylight_Time | 20 |
| Fleurieu_Peninsula | 20 |
| River_Torrens | 20 |
+----------------------------------------+-------+
sa: T |WP: Particle_physics>
+----------------------------------------+--------+
| wikipage | coeff |
+----------------------------------------+--------+
| Particle_physics | 100 |
| Optics | 20 |
| Cosmology | 20 |
| Acoustics | 17.647 |
| Condensed_matter_physics | 17.647 |
| Fluid_dynamics | 17.647 |
| Thermodynamics | 17.647 |
| kinematics | 17.647 |
| atomic,_molecular,_and_optical_physics | 17.647 |
| cosmic_inflation | 17.647 |
| Fluid_statics | 17.647 |
| Lambda-CDM_model | 17.647 |
| Biophysics | 17.647 |
| Category:Physics | 17.647 |
| Lev_Landau | 15.789 |
| Nuclear_physics | 14.286 |
| virtual_particle | 14.286 |
| quantum_chemistry | 13.636 |
| American_Physical_Society | 13.333 |
| Elementary_particle | 13.043 |
+----------------------------------------+--------+
sa: T |WP: Lisp_(programming_language)>
+------------------------------------+--------+
| wikipage | coeff |
+------------------------------------+--------+
| Lisp_(programming_language) | 100 |
| Smalltalk | 28.125 |
| Pascal_(programming_language) | 24.675 |
| Fortran | 23.881 |
| Scheme_(programming_language) | 23.438 |
| Ruby_(programming_language) | 23.188 |
| object-oriented_programming | 21.875 |
| PHP | 20.779 |
| Prolog | 20.312 |
| Haskell_(programming_language) | 20.312 |
| Ada_(programming_language) | 18.75 |
| APL_(programming_language) | 18.75 |
| BASIC | 18.75 |
| COBOL | 18.75 |
| functional_programming | 18.75 |
| John_McCarthy_(computer_scientist) | 18.75 |
| C_Sharp_(programming_language) | 18.75 |
| programming_language | 17.647 |
| JavaScript | 17.526 |
| compiler | 17 |
+------------------------------------+--------+
sa: T |WP: APL_(programming_language)>
+------------------------------------+--------+
| wikipage | coeff |
+------------------------------------+--------+
| APL_(programming_language) | 100.0 |
| Kenneth_E._Iverson | 33.333 |
| John_McCarthy_(computer_scientist) | 25.0 |
| John_Backus | 25.0 |
| Prolog | 23.333 |
| Alan_Kay | 20.833 |
| AWK | 20.833 |
| Grace_Hopper | 20.833 |
| ML_(programming_language) | 20.833 |
| Niklaus_Wirth | 20.833 |
| logic_programming | 20.833 |
| J_(programming_language) | 20.833 |
| bytecode | 19.231 |
| Lisp_(programming_language) | 18.75 |
| programmer | 17.857 |
| Objective-C | 17.241 |
| BASIC | 17.188 |
| Mathematica | 17.143 |
| SQL | 17.073 |
| ALGOL | 16.667 |
+------------------------------------+--------+
sa: T |WP: SQL>
+----------------------------------------+--------+
| wikipage | coeff |
+----------------------------------------+--------+
| SQL | 100.0 |
| Haskell_(programming_language) | 22.727 |
| PHP | 19.481 |
| APL_(programming_language) | 17.073 |
| Category:Cross-platform_software | 17.073 |
| Visual_Basic | 17.073 |
| relational_database | 17.073 |
| COBOL | 16.327 |
| PostgreSQL | 14.634 |
| R_(programming_language) | 14.634 |
| Run_time_(program_lifecycle_phase) | 14.634 |
| Ruby_(programming_language) | 14.493 |
| JavaScript | 13.402 |
| C_Sharp_(programming_language) | 13.208 |
| database | 12.644 |
| Lisp_(programming_language) | 12.5 |
| Common_Lisp | 12.195 |
| Graphical_user_interface | 12.195 |
| MySQL | 12.195 |
| Mathematica | 12.195 |
+----------------------------------------+--------+
sa: T |WP: SPARQL>
+--------------------------------------------------------------------------------------------+--------+
| wikipage | coeff |
+--------------------------------------------------------------------------------------------+--------+
| SPARQL | 100.0 |
| Web_Ontology_Language | 33.333 |
| Agris:_International_Information_System_for_the_Agricultural_Sciences_and_Technology | 33.333 |
| W3C_XML_Schema | 33.333 |
| GRDDL | 33.333 |
| Conceptual_interoperability | 33.333 |
| Category:Web_services | 33.333 |
| Resource_Description_Framework | 20 |
| Analytic_geometry | 16.667 |
| DHTML | 16.667 |
| Interpolation | 16.667 |
| GNU_nano | 16.667 |
| Pico_(text_editor) | 16.667 |
| Relational_database | 16.667 |
| Sir_Charles_Lyell | 16.667 |
| Synchronized_Multimedia_Integration_Language | 16.667 |
| Semantic_network | 16.667 |
| Backronym | 16.667 |
| Interoperability | 16.667 |
| RAS_syndrome | 16.667 |
+--------------------------------------------------------------------------------------------+--------+
sa: T |WP: The_Doors>
+---------------------------------------+--------+
| wikipage | coeff |
+---------------------------------------+--------+
| The_Doors | 100.0 |
| Jim_Morrison | 31.707 |
| Ray_Manzarek | 21.951 |
| Jefferson_Airplane | 17.073 |
| The_Band | 14.634 |
| Bee_Gees | 14.634 |
| Janis_Joplin | 12.195 |
| Summer_of_Love | 12.195 |
| Timothy_Leary | 12.195 |
| folk_rock | 12.195 |
| Governor_of_American_Samoa | 12.195 |
| Cream_(band) | 12.195 |
| Sgt._Pepper's_Lonely_Hearts_Club_Band | 12.195 |
| The_Byrds | 12.195 |
| Joan_Baez | 12.195 |
| Iron_Butterfly | 12.195 |
| Brian_Jones | 12.195 |
| Creedence_Clearwater_Revival | 12.195 |
| Johnny_Winter | 12.195 |
| The_Yardbirds | 12.195 |
+---------------------------------------+--------+
sa: T |WP: Rugby>
+----------+-------+
| wikipage | coeff |
+----------+-------+
+----------+-------+
sa: T |WP: Australian_Football_League>
+---------------------------------+--------+
| wikipage | coeff |
+---------------------------------+--------+
| Australian_Football_League | 100.0 |
| West_Coast_Eagles | 40 |
| Richmond_Football_Club | 40 |
| Sydney_Swans | 36.667 |
| St_Kilda_Football_Club | 36.667 |
| Collingwood_Football_Club | 36.667 |
| Hawthorn_Football_Club | 36.667 |
| Australian_rules_football | 33.871 |
| Essendon_Football_Club | 33.333 |
| North_Melbourne_Football_Club | 33.333 |
| Western_Bulldogs | 33.333 |
| Carlton_Football_Club | 33.333 |
| Brownlow_Medal | 33.333 |
| Seven_Network | 32.353 |
| Melbourne_Cricket_Ground | 30 |
| Australian_Bureau_of_Statistics | 30 |
| Special_Broadcasting_Service | 30 |
| Melbourne_Football_Club | 30 |
| Fitzroy_Football_Club | 30 |
| 2001_AFL_season | 30 |
+---------------------------------+--------+
Wow! That works unbelievably well. I don't know exactly why, but hey. And why similar[inverse-links-to] works better than similar[links-to], I don't know either! A question that comes to mind is, if we use a larger subset of wikipedia, will these results get better or worse? I suspect better, but not sure.
BTW, this is from:
100*30000/15559125
= 0.19 %
of the total English wikipedia.
And the resulting sw file is here.
Home
previous: what do we know about bananas
next: even more inverse simm results
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org