New version of Samyro (0.0.2) now uploaded to Pypi.
Github repo has the details, but I’ll brag about the new features:
samyro write
accepts a --seed
argument, which allows the usual temperature-based decoding *after* the engine has progressed through the given seed. The default seed is now the BOS
character, which plays nicely with the structured inputs (below):
samyro learn
now supports a --sampler
option, which allows you to change the shape of samples among paragraphs
(separated by \n\n
), lines
(separated by \n
) or patches
(collections of bytes), while supporting sample_length arguments that work the same.
The reason we might want this: the Bible, for example, has structured inputs (verses); we’d like all the learning examples to begin the same way. This supports a sample source that (for example) is structured by separated whitespace and supports the generation of bible verses with chapter and verse coded in to the first few characters.
samyro sample
is a new subcommand with the same --sampler
options, but displaying the actual text snippets extracted rather than passing them to the learner. It is primarily intended as a sample debugger.
Example commandline files are updated; there are now examples/{kjv,shakespeare}{,-write}.cli
special cases that are helpful in understanding how the sampler might work.
Backwards compatibility note: the default padding bytes have changed to accommodate the new sample shapes: \x01
is still the EOS byte, but \x7f DEL
is the BOS byte. (Though on looking it up, BOS/EOS should really be \x03 START OF TEXT
and \x04 END OF TEXT
. The value 0x01
is inherited from word vocabularies that often have 0 as “unknown” and 1,2 as BOS/EOS; this may change again.)
Leave a Reply