Samyro 0.0.2 – sampling structured inputs

New version of Samyro (0.0.2) now uploaded to Pypi.

Github repo has the details, but I’ll brag about the new features:

samyro write accepts a --seed argument, which allows the usual temperature-based decoding *after* the engine has progressed through the given seed. The default seed is now the BOS character, which plays nicely with the structured inputs (below):

samyro learn now supports a --sampler option, which allows you to change the shape of samples among paragraphs (separated by \n\n), lines (separated by \n) or patches (collections of bytes), while supporting sample_length arguments that work the same.

The reason we might want this: the Bible, for example, has structured inputs (verses); we’d like all the learning examples to begin the same way. This supports a sample source that (for example) is structured by separated whitespace and supports the generation of bible verses with chapter and verse coded in to the first few characters.

samyro sample is a new subcommand with the same --sampler options, but displaying the actual text snippets extracted rather than passing them to the learner. It is primarily intended as a sample debugger.

Example commandline files are updated; there are now examples/{kjv,shakespeare}{,-write}.cli special cases that are helpful in understanding how the sampler might work.

Backwards compatibility note: the default padding bytes have changed to accommodate the new sample shapes: \x01 is still the EOS byte, but \x7f DEL is the BOS byte. (Though on looking it up, BOS/EOS should really be \x03 START OF TEXT and \x04 END OF TEXT.  The value 0x01 is inherited from word vocabularies that often have 0 as “unknown” and 1,2 as BOS/EOS; this may change again.)

This entry was posted in linguistics, Patterns, programming, samyro, statistics. Bookmark the permalink.