some more similar inverse links to results
This time using 300,000 pages of wikipedia (out of 15,000,000 total). So roughly 2% of total. Even with EC2, I don't really have the processing power (with the current code) to use much larger sets than this.
sa: load 300k--wikipedia-links.sw
sa: find-inverse[links-to]
sa: H |*> #=> how-many inverse-links-to merge-labels(|WP: > + |_self>)
sa: S |*> #=> table[wikipage,coeff] select[1,60] 100 self-similar[inverse-links-to] merge-labels(|WP: > + |_self>)
sa: S |Love>
+-----------------------------------+--------+
| wikipage | coeff |
+-----------------------------------+--------+
| Love | 100.0 |
| Pride | 17.391 |
| Pleasure | 13.043 |
| Jealousy | 13.043 |
| Philotes_(mythology) | 13.043 |
| Imagination | 13.043 |
| Pity | 13.043 |
| Envy | 13.043 |
| Peace | 12.121 |
| Matter | 12 |
| Fear | 8.696 |
| Measurement | 8.696 |
| Number | 8.696 |
| Observation | 8.696 |
| Misanthropy | 8.696 |
| Piety | 8.696 |
| Courage | 8.696 |
| Hope | 8.696 |
| Lust | 8.696 |
| Asteria | 8.696 |
| Orthrus | 8.696 |
| Modesty | 8.696 |
| Punishment | 8.696 |
| Idea | 8.696 |
| Politeness | 8.696 |
| Learning | 8.696 |
| Luck | 8.696 |
| Sexual_attraction | 8.696 |
| Necessity | 8.696 |
| Physical_intimacy | 8.696 |
| Wrath | 8.696 |
| Gluttony | 8.696 |
| Prediction | 8.696 |
| Darkness | 8.696 |
| Safety | 8.696 |
| Optimism | 8.696 |
| Doubt | 8.696 |
| Moderation | 8.696 |
| Compassion | 8.696 |
| Respect | 8.696 |
| Nomenclature | 8.696 |
| Courtship | 8.696 |
| Jonathan_Barnes | 8.696 |
| DielsKranz_numbering_system | 8.696 |
| John_Raven | 8.696 |
| De_amore_(Andreas_Capellanus) | 8.696 |
| Infatuation | 8.696 |
| Category:Love | 8.696 |
| Contempt | 8.696 |
| Memory | 8.696 |
| Quantity | 8.696 |
| cyclops | 8.696 |
| Curiosity | 8.696 |
| Passion_(emotion) | 8.696 |
| Category:Philosophy_of_love | 8.696 |
| nonverbal_communication | 8.696 |
| Air | 8.696 |
| Neikea | 8.696 |
| Peter_Kingsley_(scholar) | 8.696 |
| Inquiry | 8.696 |
+-----------------------------------+--------+
Time taken: 1 hour, 42 minutes, 23 seconds, 210 milliseconds
sa: S |Knowledge>
+----------------------------+--------+
| wikipage | coeff |
+----------------------------+--------+
| Knowledge | 100.0 |
| Inquiry | 16 |
| Measurement | 12 |
| Pride | 12 |
| Idea | 12 |
| Learning | 12 |
| Prediction | 12 |
| Experience | 12 |
| Memory | 12 |
| Intelligence_(trait) | 12 |
| understanding | 10.345 |
| Imre_Lakatos | 8.333 |
| Beauty | 8 |
| Outline_of_education | 8 |
| Faith | 8 |
| Love | 8 |
| Meaning_of_life | 8 |
| Metaphor | 8 |
| Nominalism | 8 |
| Number | 8 |
| Observation | 8 |
| Platonic_idealism | 8 |
| Pain | 8 |
| Pathological_science | 8 |
| Problem_of_other_minds | 8 |
| Misanthropy | 8 |
| Piety | 8 |
| Virtue | 8 |
| Lust | 8 |
| Discovery_(observation) | 8 |
| Ineffability | 8 |
| Belief | 8 |
| Organization | 8 |
| Modesty | 8 |
| Placebo | 8 |
| Punishment | 8 |
| Quasi-empirical_method | 8 |
| Pleasure | 8 |
| Jealousy | 8 |
| Authority | 8 |
| Karl_Mannheim | 8 |
| Paradigm | 8 |
| Intensionality | 8 |
| Problem_of_induction | 8 |
| Necessity | 8 |
| Elegance | 8 |
| Prattyasamutpda | 8 |
| Moderation | 8 |
| Phenomenalism | 8 |
| Nomenclature | 8 |
| Potentiality_and_actuality | 8 |
| Max_Scheler | 8 |
| Matter | 8 |
| Panpsychism | 8 |
| Information | 8 |
| knowledge_management | 8 |
| Lev_Shestov | 8 |
| Interpretation_(logic) | 8 |
| Outline_of_philosophy | 8 |
| Outline_of_logic | 8 |
+----------------------------+--------+
Time taken: 1 hour, 48 minutes, 29 seconds, 868 milliseconds
sa: H |Google>
|number: 704>
sa: S |Google>
+---------------------------------------+--------+
| wikipage | coeff |
+---------------------------------------+--------+
| Google | 100.0 |
| Apple_Inc. | 14.063 |
| Microsoft | 12.732 |
| Facebook | 11.222 |
| Yahoo! | 9.375 |
| World_Wide_Web | 8.807 |
| IBM | 8.093 |
| Sun_Microsystems | 7.955 |
| Android_(operating_system) | 7.812 |
| Internet | 7.487 |
| Amazon.com | 7.102 |
| Intel | 6.676 |
| Linux | 6.537 |
| Hewlett-Packard | 6.25 |
| Stanford_University | 6.108 |
| Twitter | 6.108 |
| web_browser | 6.108 |
| HTML | 5.824 |
| operating_system | 5.803 |
| YouTube | 5.657 |
| Forbes | 5.384 |
| Massachusetts_Institute_of_Technology | 5.324 |
| Java_(programming_language) | 4.83 |
| AOL | 4.687 |
| smartphone | 4.687 |
| open_source | 4.687 |
| C_(programming_language) | 4.608 |
| Silicon_Valley | 4.545 |
| Nokia | 4.403 |
| C++ | 4.403 |
| Microsoft_Windows | 4.354 |
| JavaScript | 4.261 |
| Wired_(magazine) | 4.261 |
| Motorola | 4.119 |
| XML | 4.119 |
| Wall_Street_Journal | 4.119 |
| CNET | 4.119 |
| copyright | 4.119 |
| software | 4.119 |
| Oracle_Corporation | 3.977 |
| Sony | 3.977 |
| Unix | 3.977 |
| Mac_OS_X | 3.977 |
| Wikipedia | 3.977 |
| Internet_Explorer | 3.835 |
| OS_X | 3.835 |
| source_code | 3.835 |
| eBay | 3.835 |
| computer_science | 3.748 |
| University_of_California,_Berkeley | 3.732 |
| IP_address | 3.693 |
| Larry_Page | 3.693 |
| iPhone | 3.693 |
| algorithm | 3.693 |
| free_software | 3.693 |
| University_of_Michigan | 3.551 |
| GNU_General_Public_License | 3.551 |
| database | 3.551 |
| Carnegie_Mellon_University | 3.409 |
| Cisco_Systems | 3.409 |
+---------------------------------------+--------+
Time taken: 1 day, 18 hours, 53 minutes, 1 second, 791 milliseconds
sa: H |Blog>
|number: 32>
sa: S |Blog>
+-----------------------------------------------------------------+-------+
| wikipage | coeff |
+-----------------------------------------------------------------+-------+
| Blog | 100 |
| Active_Server_Pages | 9.375 |
| Desktop_publishing | 9.375 |
| Online_chat | 9.375 |
| CAPTCHA | 9.375 |
| RSS | 9.302 |
| Dynamic_HTML | 6.25 |
| Malware | 6.25 |
| Chat_room | 6.25 |
| Content_management_system | 6.25 |
| ABC_World_News_Tonight | 6.25 |
| Cross-site_scripting | 6.25 |
| Primetime_(TV_series) | 6.25 |
| Phishing | 6.25 |
| home_page | 6.25 |
| Open_source_software | 6.25 |
| impact_factor | 6.25 |
| Terminate_and_Stay_Resident | 6.25 |
| electronic_mailing_list | 6.25 |
| Podcast | 6.25 |
| Google_Scholar | 6.25 |
| OPML | 6.25 |
| feed_aggregator | 6.25 |
| peer-review | 6.25 |
| Social_networking_service | 6.25 |
| Digg | 6.25 |
| carbon_copy | 6.25 |
| online_community | 6.25 |
| Freemium | 6.25 |
| Microsoft_Silverlight | 6.25 |
| Wikia | 6.25 |
| Peer-to-peer_file_sharing | 6.25 |
| Fully_qualified_domain_name | 6.25 |
| Category:Internet_forums | 6.25 |
| Category:American_broadcast_news_analysts | 6.25 |
| arXiv.org | 6.25 |
| preprint | 6.25 |
| Cicada_3301 | 6.25 |
| fansite | 6.25 |
| Affiliate_marketing | 6.25 |
| Category:American_television_news_anchors | 6.25 |
| Category:ABC_News_personalities | 6.25 |
| Category:American_television_reporters_and_correspondents | 6.25 |
| Lisa_McRee | 6.25 |
| Category:Electronic_publishing | 6.25 |
| Kevin_Newman_(journalist) | 6.25 |
| Robin_Roberts_(sportscaster) | 6.25 |
| Internet_Information_Services | 6.061 |
| newsmagazine | 5.882 |
| George_Stephanopoulos | 5.714 |
| news_presenter | 5.714 |
| FAQ | 5.556 |
| Internet_meme | 5.405 |
| Common_Gateway_Interface | 5.263 |
| Bulletin_board_system | 5.172 |
| Internet_slang | 5 |
| news_anchor | 4.651 |
| Document_Object_Model | 4.444 |
| Staff_writer | 4.444 |
| web_application | 4.348 |
+-----------------------------------------------------------------+-------+
Time taken: 2 hours, 12 minutes, 50 seconds, 381 milliseconds
sa: H |arXiv.org>
|number: 3>
sa: S |arXiv.org>
+------------------------------------------------------------------------------------+--------+
| wikipage | coeff |
+------------------------------------------------------------------------------------+--------+
| arXiv.org | 100 |
| citation_impact | 40 |
| serials_crisis | 40 |
| NEC_Research_Institute | 40 |
| postprint | 40 |
| institutional_repository | 40 |
| OAIster | 40 |
| SHERPA_(organisation) | 40 |
| Category:Electronic_publishing | 40 |
| Paul_Ginsparg | 33.333 |
| preprint | 27.273 |
| self-archiving | 25 |
| Category:Academic_publishing | 23.077 |
| Methodological_naturalism | 20 |
| Presocratics | 20 |
| Cryptology_ePrint_Archive | 20 |
| Open_publishing | 20 |
| Hubble_diagram | 20 |
| GZK_paradox | 20 |
| List_of_unsolved_problems_in_physics | 20 |
| Print_on_demand | 20 |
| TeV | 20 |
| Boundary_condition | 20 |
| Black_body_radiation | 20 |
| Subscriptions | 20 |
| R.P._Feynman | 20 |
| Citeseer | 20 |
| Citation_index | 20 |
| File:Solvay_conference_1927.jpg | 20 |
| File:Senenmut-Grab.JPG | 20 |
| bioacoustics | 20 |
| pattern_formation | 20 |
| University_Physics | 20 |
| File:Archimedes-screw_one-screw-threads_with-ball_3D-view_animated_small.gif | 20 |
| Bryn_Mawr_Classical_Review | 20 |
| File:Acceleration_components.JPG | 20 |
| Delayed_open-access_journal | 20 |
| Astronomical_ceiling_of_Senemut_Tomb | 20 |
| quantitative_finance | 20 |
| File:CMS_Higgs-event.jpg | 20 |
| James_Madison_Award | 20 |
| Public_Knowledge_Project | 20 |
| the_central_science | 20 |
| Difference_between_chemistry_and_physics | 20 |
| theses | 20 |
| Optical_physics | 20 |
| analytic_solution | 20 |
| weakly_interacting_massive_particle | 20 |
| superclusters | 20 |
| Open_Humanities_Press | 20 |
| iBooks_Author | 20 |
| econophysics | 20 |
| ultrasonics | 20 |
| OAI-PMH | 20 |
| Journal_of_Library_Administration | 20 |
| File:Einstein1921_by_F_Schmutzer_2.jpg | 20 |
| Ancient_Greek_poetry | 20 |
| Publish_or_perish | 20 |
| higher_dimension | 20 |
| IBEX | 20 |
+------------------------------------------------------------------------------------+--------+
Time taken: 41 minutes, 16 seconds, 954 milliseconds
sa: H |Theory_of_everything>
|number: 13>
sa: S |Theory_of_everything>
+-------------------------------------------------------------+--------+
| wikipage | coeff |
+-------------------------------------------------------------+--------+
| Theory_of_everything | 100.0 |
| Ultimate_fate_of_the_universe | 21.429 |
| Planck_scale | 17.391 |
| Big_Rip | 15.385 |
| Eddington_limit | 15.385 |
| Supersymmetry | 15.385 |
| Arrow_of_time | 15.385 |
| Dimensionless_physical_constant | 15.385 |
| Plumian_Professor_of_Astronomy_and_Experimental_Philosophy | 15.385 |
| Sir_Roger_Penrose | 15.385 |
| Bakerian_Lecture | 15.385 |
| grand_unified_theory | 15.385 |
| Big_Freeze | 15.385 |
| Topological_order | 15.385 |
| Baryon_asymmetry | 15.385 |
| Neutrino_mass | 15.385 |
| Unified_field_theory | 15.385 |
| Membrane_(M-theory) | 15.385 |
| Static_forces_and_virtual-particle_exchange | 15.385 |
| Generation_(particle_physics) | 15.385 |
| Stellar_nucleosynthesis | 14.286 |
| Compact_Muon_Solenoid | 13.333 |
| Cosmic_inflation | 13.333 |
| neutrino_oscillation | 12.5 |
| Hermann_Bondi | 11.765 |
| Category:Presidents_of_the_Royal_Astronomical_Society | 11.111 |
| YangMills_theory | 11.111 |
| anthropic_principle | 10.345 |
| Dark_matter | 9.524 |
| James_Watson | 9.091 |
| CP_violation | 8 |
| Anisotropy | 7.692 |
| Antiparticle | 7.692 |
| Acts | 7.692 |
| Centripetal_force | 7.692 |
| Graviton | 7.692 |
| Gluon | 7.692 |
| Hydrogen_atom | 7.692 |
| Liquid_crystal | 7.692 |
| Main_sequence | 7.692 |
| Morphogenesis | 7.692 |
| Panspermia | 7.692 |
| Proton_decay | 7.692 |
| Qubit | 7.692 |
| Tokamak | 7.692 |
| Quintessence_(physics) | 7.692 |
| Sonoluminescence | 7.692 |
| Gravitational_lens | 7.692 |
| High-temperature_superconductor | 7.692 |
| Fact | 7.692 |
| Timeline_of_gravitational_physics_and_relativity | 7.692 |
| Timeline_of_stellar_astronomy | 7.692 |
| List_of_astronomers | 7.692 |
| Astrophysicist | 7.692 |
| Triple-alpha_process | 7.692 |
| Religious | 7.692 |
| Quark_matter | 7.692 |
| Gravity_assist | 7.692 |
| Theory_of_Everything | 7.692 |
| Color_confinement | 7.692 |
+-------------------------------------------------------------+--------+
Time taken: 1 hour, 8 minutes, 42 seconds, 470 milliseconds
OK. Some cool results in there. Actually, I think they are amazing! I think I have done enough examples of this now.
Though maybe I should note, that the bigger the number H returns, the better the result. Which presumably means if we used even more of wikipedia, we would get even better results! And brings to mind the question, how many wikipages do we need to know more than the average human?
BTW, I don't think I have linked to this yet, the full wikipedia link structure in sw notation. bzip2 down to about 2 GB I seem to recall.
Home
previous: non linear resonance
next: she is out of my league
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org