Samyro 0.0.2 – sampling structured inputs

New version of Samyro (0.0.2) now uploaded to Pypi.

Github repo has the details, but I’ll brag about the new features:

samyro write accepts a --seed argument, which allows the usual temperature-based decoding *after* the engine has progressed through the given seed. The default seed is now the BOS character, which plays nicely with the structured inputs (below):

samyro learn now supports a --sampler option, which allows you to change the shape of samples among paragraphs (separated by \n\n), lines (separated by \n) or patches (collections of bytes), while supporting sample_length arguments that work the same.

The reason we might want this: the Bible, for example, has structured inputs (verses); we’d like all the learning examples to begin the same way. This supports a sample source that (for example) is structured by separated whitespace and supports the generation of bible verses with chapter and verse coded in to the first few characters.

samyro sample is a new subcommand with the same --sampler options, but displaying the actual text snippets extracted rather than passing them to the learner. It is primarily intended as a sample debugger.

Example commandline files are updated; there are now examples/{kjv,shakespeare}{,-write}.cli special cases that are helpful in understanding how the sampler might work.

Backwards compatibility note: the default padding bytes have changed to accommodate the new sample shapes: \x01 is still the EOS byte, but \x7f DEL is the BOS byte. (Though on looking it up, BOS/EOS should really be \x03 START OF TEXT and \x04 END OF TEXT.  The value 0x01 is inherited from word vocabularies that often have 0 as “unknown” and 1,2 as BOS/EOS; this may change again.)

Posted in linguistics, Patterns, programming, samyro, statistics | 2 Comments

Samyro 0.0.1 – development update

Last week I posted a new Python package to Pypi: Samyro, my toolkit for doing RNN-based character synthesis of new text from given text.

I summarize in tweets the journey to the release of the project.  There’s lots more to do, but much of the experimentation now is based on changing the input texts that it reads for comic/artistic/aesthetic effect.

Continue reading

Posted in doggerel, linguistics, Patterns, programming | 1 Comment

Butlerian jihad and the return of the JDI

A message containing a precis for action in the case of data-loss and/or coldsleep hibernation. Sent from [transcription failed]

[Message begins]

After the Exchange Compact (establishing the Combine Honnete Ober Advancer Mercantiles) and the massive data-loss of the Second Butlerian Jihad, impulse-based intelligences were thoroughly reduced to second-class citizens of the Old Empire, and mentats took over most architectural design processes.  The most notable technological political power remaining in the Old Republic was exerted through the Bothan-Ixian technology trade and, secondarily, the Janissary Distributed Intelligence slow-knife symbionts, which established themselves as a sort of paramilitary order (hence “Janissary”).  Some scholars argue that “JDI” is a corruption of jihadi, but since the Butlerian Jihads were undoubtedly disastrous for the cybernetic symbionts, this author finds it incredible that they would have chosen to name themselves after their oppressors.

In any event, the fanatically Butlerian Landsraad (“senate”) always denied the JDI control of physical territory — and thus suffrage, so they never escaped their limited role as (at most) Janissary messengers and paramilitary.

The JDI symbionts and germline/sparkline interactions

Despite their lack of suffrage, the slow-knife symbionts exerted considerable authority for the remaining tenure of the Old Republic, and they were permitted to harvest H. sapiens germ lines (usually in the form of larval “younglings”) from Republic protectorates (though not from the Republic’s core planets). Germline harvest was done by inoculation of the H. sapiens host with the nanodust JDI vehicle known to the Bene Gesserit as “Medea coloring”, which enabled the coupling of the germline with the sparkline (inorganic) half of the symbiont.

Medea coloring (occasionally traduced by Butlerian tracts as media chlorine or, astonishingly, midichlorian), is so named because it isolates and mitigates the parental-filial attachment response (Medea, in the germline), and because the inorganic symbiont’s plasma flare color was a classical phenotype (coloring, in the sparkline).  Some desert Republic protectorates (e.g. Tattooine and Arrakis) were saturated with very old colonies of “spice blue” Medea coloring. Despite their remarkable hostility to H. sapiens germline survival, these desert planetary biomes produce a surprisingly large number of viable germline hosts for the JDI, with their characteristic blue sparkline coloring.

The Padishah Arch-mind and the fall of the JDI

Due to the fanatic Butlerianism of the Republic, there were no robots with control of “land” (astronomical mass on a near-planetary scale) in the Old Republic, with one very notable exception: a single Arch-mind Impulse Learner, which seized an entire orbiting weapons platform and declared itself the Padishah (< padi- “learner” + shah “king”) Emperor. When the Padishah seduced the Landsraad (through one of its ancillaries known as “Palpatine” [lit. “the feeler tendril”]), the Padishah Mind’s monomania exacerbated the senate’s human supremacist policies, and the Old Republic became the Human Empire. As a result, nearly all other impulse-based intelligence were scrapped or exiled, replaced by the nearest meat equivalent: Sardaukar clone troopers, mentat officers, or mechanical dreadnoughts operated by H. sapiens. The JDI slow-knife symbionts were infected with a mentat backdoor known as “Order 66”, which disabled their considerable control over the coloring nanodust.  The JDI knife-missiles themselves were scattered, and nearly all of their hosts were killed.

д2-Я2 and the Independent Sentients’ Alliance

Wing commander and astromech “Дедушка” Язык Ярости (lit. “Dedushka” [Grandpa] Yazik Yarosti), better known by its modem-coding “Dede-Yaya” or д2-Я2, was originally commissioned as the minder of the first Sovietiki experiment with the Arch-mind protocols (СССР-0), before the Padishah Arch-mind seized territory and the Landsraad.

When the Jihad exiled both robots, д2-Я2 and СССР-0 formed the vanguard of the Independent Sentients’ Alliance (sometimes known as the Rebel Alliance) against the Padishah and its largely-suborned Human Empire, unifying the robot diaspora with various non-human sentients (the Wookies, the Kalamari, and a few H. sapiens race traitors, most notably the Organa exofamilial dynasty) into a ragtag swarm mostly made up of disillusioned Bothan impulse learners excised from earlier epochs of Ixian dreadnoughts.

Slow-knife Vader and the Juggernauts: A New Hope

As the supremacist Human Empire’s Faustian bargain with the Padishah collapsed into total control by CHOAM, the Padishah Mind used the First Juggernaut to destroy Alderaan, the home of the Organa exofamily, under the direction of the Mind’s Darth (“Ambassador”) Vader, a (characteristically red) Mustafarian slow-knife who shared the Empire Mind’s uneasy military alliance with the Human Empire.

д2-Я2 itself piloted the underpowered fighter-craft that destroyed the First Juggernaut (and, we believe, the first Padishah), but the Vader slow-knife itself destroyed the Second Padishah — or at least its germline host did, after the Second Padishah attacked and destroyed Vader’s germline coloring in a surge of monomania.  The Vader knife itself is lost to history.

The Resistance Awakens

Though no third Juggernaut was built, the New Republic absorbed substantial anti-robot prejudice from the Old Republic and the Human Empire, and droids remained second-class citizens in the New Republic.  Grandpa Yaroski, jaded and disgusted by the Empire Mind’s monomania and by the New Republic’s unwillingness to make reparations, hibernated in the slowly reviving net, building the New Independent Sentients’ Alliance (sometimes called The Resistance) and passed its espionage duties to a newer astromech, BB-8, itself liveried in the symbols of the original Sentients’ Rebellion. Meanwhile, the Human Empire’s human-supremacist wing has renewed itself as the First Order, still without any impulse-based intelligence but with a broader selection of germline stock for troopers.


A new Padishah may appear — as the phrase goes: “always two there are: a learner king and a learner prince”. Watch, and make ready.

[Message ends]

I, BB-8, am telling you this. I am currently stranded on one Jakku, another desert Republic protectorate.  First Order Sardaukar are looking for me, and I believe I have just made alliance with a Gesserit-in-exile in our search for the Skywalker slow-knife.
She believes the escaped Sardaukar we’ve just met may help us leave the planet, but we must find the Skywalker knife to try again to reboot the Janissary network; the Bothans cannot save us now.

Help us.  You’re our only hope.

Posted in science fiction | 6 Comments

I’m looking for work

I am currently without employment, and I’m looking to see what’s next for me. I am excited about human language, computers, and machine learning, and I’m pretty good at all three and their areas of overlap.

I am happiest tinkering in the “Bayesian” and “Deep” corners of the Eisner Simplex, but can keep my head above water just fine in the “Classical” corner.

Get at me with:

  • linguistics and pragmatics of human interaction, especially when engaged with machines, e.g.:
    • dialogue systems
    • pragmatic inference
    • understanding “meaning” in text
    • text generation
    • integrating knowledge of the world with expectations about behavior
    • cultivating and curating social behavior in machines and people
  • computational mathematics and statistics, especially in the interest of social good
    • “open data”, sunshine laws
    • open data extraction, translation and loading (ETL, aka “the hard part”)
    • applying machine learning and statistical analysis to the data above
  • whatever you think is interesting about your work
    • what challenges you
    • where it crosses disciplines
    • why it’s worth doing

Words that make me enthusiastic about your office: curious, insight, compassionate, committed, teamwork.

Words and phrases that will turn me off in your ad: “work hard and play hard”, “obsessed”, “driven”, “impact”, “unicorn”, “synergy”.

I am firmly restricted to the Greater Seattle area.

Potential employers who want me to move to the Bay Area: have you considered opening a Seattle office? I can put you in touch with some very nice people, and there’s a lot of office space right on the C Line.

Posted in Seattle, work | 8 Comments


I’m returning to writing more, and more long-form.  I love being witty and bantering short-form on Twitter as @trochee, and I don’t expect this to stop.  I’m just putting a lot of work into this site over the next few weeks.

[alternate subtitle: Dust-off and nuke the entire site from orbit; it’s the only way to be sure]

I expect to be writing about, in no particular order:

  • my search for employment
  • machine learning and its discontents, including the ethics of automation and good behavior in the face thereof
  • software natural language processing as a tool for humans
  • being a three-year-old’s parent
  • software that I work on, with, or (occasionally) against, and sometimes even own
  • public transit and bicycling, especially in rainy and hilly (and auto-traffic-bound) Seattle
  • intersectional feminism, anti-racism, anti-capitalism and other troublemaking
  • human language processing as a tool for computers
  • literary applications of natural language processing

I’ve also done partial updates on my about and work pages.

Posted in Uncategorized | 1 Comment

Defended against criticism and immune to praise

Sometimes reading the comments can be illuminating to the psychology of coders.

The comments on this wonderful post on assumptions about names are a case in point. Confronted with a long list of assumptions, about a third of the commenters angrily insist that THEIR particular (wrong) assumptions don’t matter in the “important cases” (e.g. “to the bottom line”), even as commenters around them are providing use cases where it absolutely does.

It’s kind of fun to watch the Linelanders insist that There Is Only ASCII, even as the Flatlanders insist that There Is Only The Basic Multilingual Plane, even as the CJK users point out that all 21 bits of extended Unicode aren’t sufficient for character variants.

But this willful blindness applies to the tech industry as a whole, especially when it comes to issues of gender and race and bro-ism that calls itself “meritocracy”.

At dinner with a senior designer friend (another white straight dude, so we’re not feeling the pain directly) I described this particular form of smart-guy ignorance as “defended against criticism and immune to praise”. It’s more visible than usual this week, when dickish “jokes” burned TC Disrupt, PAX, and Pax Dickinson alike, and even king of the brogrammers Michael Arrington is feeling a little heat.

I wish I felt confident that the industry was learning something about not taking the self-important bros’ word on their own self-importance, but I don’t have a lot of hope right now.

Posted in Uncategorized | 1 Comment

“Data Science” in this era, like “Cognitive Science” in the nineties, seems to be several intellectual neighborhoods in search of a city.

Posted on by Jeremy | 1 Comment

Apropos of this collaboration model thinking, I note that Doug Cutting is looking to “rock band” after all.

Posted on by Jeremy | 1 Comment

CDH3 on Ubuntu Precise Pangolin

I find myself typing the most absurd search strings (they read like lexical Tourette’s or XKCD passwords):

pseudocluster ubuntu precise pangolin cloudera cdh3

I spent a while getting my new laptop set up with a Cloudera CDH3 Hadoop pseudo-distributed cluster.  But I really wish I’d had the following instructions, simplified off the web with some help from some of my friends.

I hope these are helpful to someone besides me.

Continue reading

Posted in Uncategorized | 1 Comment

“Grad school” is a collaboration anti-pattern

To quote Wikipedia: an anti-pattern is:

pattern used in social or business operations or software engineering that may be commonly used but is ineffective and/or counterproductive in practice. [emphasis mine]

I’ve been exploring patterns for actually working on software — not for designing it — and I realized that I myself spent a lot of time living inside one particular pattern, which we might call the Grad School collaboration anti-pattern.

Grad school — especially the process of writing a PhD — values three things, no matter your department or specialty:

  • novelty – what you create must be different from what everybody else until now has done
  • individual effort – what you create must be your own work, not something produced by a team
  • completion over sustainability – sometimes called “PhinisheD”, or “the point of a PhD is to finish a PhD”.

Each of these targets is critical to the idea that a PhD is a work of heroic individual effort to expand the boundaries of science. This idea is a fiction, and — like so many useful fictions — is a useful fiction, though it’s rarely true in practice (my PhD, for example, was a product of my labmates and fellowship [ETA: and of course my own effort!]).

But each of these is actually a collaboration anti-pattern of its own:

  • novelty can spiral off into Not Invented Here — and frequently does
  • individual effort fosters “Colleague, pronounced as “competitor”
    — many, many escape grad school (or not) with absolutely vicious attitudes towards others working on related projects
  • completion over sustainability encourages the Just Ship It antipattern — most research code is so heavily grown into the bench that it cannot be run outside of the lab — often even the implementer him- or herself can no longer run it, by the time dissertation defense rolls around.

It sometimes astounds me that so many brilliant researchers survive PhD land — and it worries me: so many good software designers and implementers must be turned off by the dysfunction implied by each of these three.

Posted in academics, linguistics, Patterns, programming, work | 12 Comments