letter 3 grams that precede a full stop
Just a quick one using our letter 3/5 ngram structures to find those 3-grams that precede both the comma and the full stop.
Simply enough:
sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: table[3gram] ket-sort common[inverse-next-2-letters] (|, > + |. >)
+-------+
| 3gram |
+-------+
| 2nd |
| 3rd |
| 4th |
| be |
| by |
| do |
| go |
| he |
| in |
| is |
| it |
| me |
| No |
| no |
| of |
| on |
| pa |
| so |
| to |
| up |
| us |
| "No |
| '85 |
| -by |
| -tm |
| ace |
| ach |
| ack |
| act |
| acy |
| ade |
| ads |
| ady |
| afe |
| aff |
| age |
| ago |
| aid |
| ail |
| aim |
| ain |
| air |
| ait |
| ake |
| ale |
| alk |
| all |
| als |
| ame |
| amp |
| and |
| ane |
| ang |
| ank |
| ans |
| ant |
| ape |
| aph |
| aps |
| ard |
| are |
| ark |
| arm |
| ars |
| art |
| ary |
| ase |
| ash |
| ask |
| ass |
| ast |
| asy |
| ata |
| ate |
| ath |
| ave |
| awn |
| ays |
| aze |
| bad |
| bag |
| bed |
| ber |
| ble |
| bly |
| box |
| bts |
| bye |
| cal |
| can |
| cap |
| cat |
| cco |
| ced |
| ces |
| cks |
| cle |
| cts |
| d I |
| day |
| dea |
| ded |
| dee |
| den |
| der |
| des |
| dge |
| dia |
| did |
| dle |
| dly |
| dog |
| don |
| dor |
| dow |
| ead |
| eak |
| eal |
| eam |
| ear |
| eat |
| eau |
| ece |
| ech |
| eck |
| ect |
| eds |
| eed |
| eek |
| eel |
| een |
| eep |
| eer |
| ees |
| eet |
| eft |
| egs |
| eks |
| eld |
| elf |
| ell |
| elp |
| els |
| ely |
| ems |
| end |
| ens |
| ent |
| eps |
| ere |
| ern |
| ers |
| ery |
| esh |
| esk |
| ess |
| est |
| ete |
| ets |
| ety |
| eve |
| ews |
| ext |
| eye |
| F.3 |
| fed |
| fee |
| fer |
| fle |
| fly |
| for |
| ful |
| gar |
| ged |
| gel |
| ger |
| ges |
| ght |
| gle |
| gro |
| gth |
| gue |
| had |
| ham |
| hat |
| haw |
| hed |
| hem |
| hen |
| her |
| hes |
| him |
| hin |
| hip |
| his |
| hod |
| hop |
| hot |
| hts |
| hur |
| hus |
| ial |
| ian |
| ica |
| ice |
| ich |
| ick |
| ics |
| ida |
| ide |
| ids |
| ied |
| ief |
| ier |
| ies |
| iew |
| ife |
| iff |
| ify |
| ign |
| ike |
| ild |
| ile |
| ill |
| ils |
| ily |
| ime |
| ina |
| ind |
| ine |
| ing |
| ink |
| Inn |
| ins |
| int |
| iny |
| ion |
| ips |
| ird |
| ire |
| irl |
| irm |
| irs |
| irt |
| iry |
| ise |
| ish |
| iss |
| ist |
| ite |
| ith |
| its |
| ity |
| ium |
| ius |
| ive |
| ize |
| ked |
| ken |
| ker |
| ket |
| key |
| kly |
| lar |
| law |
| lay |
| lds |
| led |
| Lee |
| leg |
| lem |
| len |
| ler |
| les |
| ley |
| lic |
| lip |
| lls |
| lly |
| lor |
| low |
| lse |
| lso |
| lts |
| lue |
| lve |
| mad |
| mal |
| man |
| mas |
| may |
| med |
| men |
| mer |
| mes |
| met |
| mly |
| mon |
| mpt |
| n 4 |
| nah |
| nal |
| nce |
| nch |
| ncy |
| nds |
| ndy |
| ned |
| nee |
| nel |
| nen |
| ner |
| nes |
| net |
| ney |
| nge |
| ngs |
| nks |
| nly |
| nny |
| not |
| now |
| nse |
| nth |
| nto |
| nts |
| nty |
| nue |
| oad |
| oak |
| oal |
| oat |
| obe |
| ock |
| ods |
| ody |
| oes |
| ofa |
| off |
| ofs |
| oke |
| oks |
| old |
| ole |
| ome |
| oms |
| one |
| ong |
| ons |
| ont |
| ood |
| oof |
| ook |
| ool |
| oom |
| oon |
| oor |
| oot |
| ope |
| ord |
| ore |
| ork |
| orm |
| orn |
| ors |
| ort |
| ory |
| ose |
| oss |
| ost |
| ote |
| oth |
| ots |
| oul |
| our |
| ous |
| out |
| ove |
| owd |
| own |
| ows |
| ped |
| pen |
| per |
| pes |
| pet |
| pew |
| phy |
| ple |
| ply |
| pty |
| que |
| r A |
| r's |
| ram |
| ran |
| rap |
| rat |
| rce |
| rch |
| rds |
| red |
| ree |
| ren |
| rer |
| res |
| ret |
| rey |
| rge |
| rks |
| rld |
| rly |
| rms |
| rol |
| rop |
| ror |
| row |
| rse |
| rst |
| rth |
| rts |
| rty |
| rue |
| rug |
| run |
| rve |
| sal |
| saw |
| say |
| sco |
| sed |
| see |
| sen |
| ser |
| ses |
| set |
| sex |
| she |
| sin |
| sir |
| sit |
| six |
| sky |
| sly |
| som |
| son |
| sts |
| sty |
| sun |
| t I |
| tal |
| tar |
| tch |
| ted |
| tel |
| ten |
| tep |
| ter |
| tes |
| ths |
| thy |
| tic |
| tie |
| tle |
| tly |
| tol |
| ton |
| too |
| tor |
| tre |
| try |
| tte |
| two |
| ual |
| ubt |
| uch |
| uct |
| ued |
| ues |
| uff |
| ugh |
| ull |
| ulp |
| ult |
| umb |
| ume |
| umn |
| und |
| une |
| ung |
| unk |
| unt |
| ure |
| urn |
| urs |
| urt |
| ury |
| use |
| uth |
| uty |
| van |
| ved |
| vel |
| ven |
| ver |
| ves |
| vil |
| War |
| was |
| way |
| wed |
| wer |
| wit |
| xes |
| yed |
| yer |
| yes |
| Yes |
| yet |
| yle |
| you |
| zes |
+-------+
So we see there are a lot, but not all possible combinations. I don't know, but to me this is starting to feel like grammar. Grammar seems to be "these structures are common and therefore likely correct, and these structures are rare, and therefore likely wrong". Sure, not exactly grammar yet, but it feels like we are getting closer. Anyway, I will keep thinking about it.
Maybe down the line try for a big set of ngram structures, the full set of p/q ngram structures where:
p is in {1,2,3,4,5,6,7,8,9}
and
q is in {2,3,4,5,6,7,8,9,10}
Home
previous: some letter rambler examples
next: start and end chars for 3grams that precede a full stop
updated: 19/12/2016
by Garry Morrison
email: garry -at- semantic-db.org