introducing the ngram stitch
Otherwise known as the Rambler algo. The basic outline is you have a big corpus of conversational text, eg from a web-board, and then you process that a little, and then the algo creative-writes/rambles.
I'll just give the algo for 3/5 ngram stitch, but should extend in the obvious way to other p/q.
extract all the 5-grams from your seed text
start with a seed string.
extract the last 3 words from string
find a set of 5-grams that start with those 3 words and pick one randomly
add the last 2 words from that 5-gram to your string
Then we use this code to find our n-grams:
return [[" ".join(s[i:i+3])," ".join(s[i+3:i+5])] for i in range(len(s) - 4)]
# learn ngram pairs:
with open(filename,'r') as f:
text = f.read()
words = re.sub('[<|>=]','',text)
for ngram_pairs in create_ngram_pairs(words.split()):
head,tail = ngram_pairs
dest = "sw-examples/ngram-pairs--webboard.sw"
Some example learn rules in that sw are:
next-2 |Looking forward to> => |that. it> + |doing something> + |it. I> + |when the> + |the Paranoid> + |tomorrow's. ("flow",> + |seeing The> + |tomorrow. 3.1415926...can't> + |you posting> + |the "Geometric> + |it. Breaking> + |being a> + |Joe Biden>
next-2 |forward to that.> => |it was>
next-2 |to that. it> => |was 4>
next-2 |that. it was> => |4 below> + |only 100db>
next-2 |it was 4> => |below zero> + |years ago>
next-2 |was 4 below> => |zero maybe>
And then we need this function operator:
# extract-3-tail |a b c d e f g h> == |f g h>
# assumes one is a ket
split_str = one.label.rsplit(' ',3)
if len(split_str) < 4:
return ket(" ".join(split_str[1:]))
Then after all that preparation, our Ramlber algo simplifies to:
ramble |*> #=> merge-labels(|_self> + | > + pick-elt next-2 extract-3-tail |_self>)
Examples in the next post.
BTW, I find it interesting that we can compact down the Rambler algo to 1 line of BKO.
previous: working towards natural language
next: some rambler examples
by Garry Morrison
email: garry -at- semantic-db.org