I’m returning to writing more, and more long-form. I love being witty and bantering short-form on Twitter as @trochee, and I don’t expect this to stop. I’m just putting a lot of work into this site over the next few weeks.
[alternate subtitle: Dust-off and nuke the entire site from orbit; it’s the only way to be sure]
I expect to be writing about, in no particular order:
- my search for employment
- machine learning and its discontents, including the ethics of automation and good behavior in the face thereof
- software natural language processing as a tool for humans
- being a three-year-old’s parent
- software that I work on, with, or (occasionally) against, and sometimes even own
- public transit and bicycling, especially in rainy and hilly (and auto-traffic-bound) Seattle
- intersectional feminism, anti-racism, anti-capitalism and other troublemaking
- human language processing as a tool for computers
- literary applications of natural language processing
I’ve also done partial updates on my about and work pages.
Sometimes reading the comments can be illuminating to the psychology of coders.
The comments on this wonderful post on assumptions about names are a case in point. Confronted with a long list of assumptions, about a third of the commenters angrily insist that THEIR particular (wrong) assumptions don’t matter in the “important cases” (e.g. “to the bottom line”), even as commenters around them are providing use cases where it absolutely does.
It’s kind of fun to watch the Linelanders insist that There Is Only ASCII, even as the Flatlanders insist that There Is Only The Basic Multilingual Plane, even as the CJK users point out that all 21 bits of extended Unicode aren’t sufficient for character variants.
But this willful blindness applies to the tech industry as a whole, especially when it comes to issues of gender and race and bro-ism that calls itself “meritocracy”.
At dinner with a senior designer friend (another white straight dude, so we’re not feeling the pain directly) I described this particular form of smart-guy ignorance as “defended against criticism and immune to praise”. It’s more visible than usual this week, when dickish “jokes” burned TC Disrupt, PAX, and Pax Dickinson alike, and even king of the brogrammers Michael Arrington is feeling a little heat.
I wish I felt confident that the industry was learning something about not taking the self-important bros’ word on their own self-importance, but I don’t have a lot of hope right now.
“Data Science” in this era, like “Cognitive Science” in the nineties, seems to be several intellectual neighborhoods in search of a city.
I find myself typing the most absurd search strings (they read like lexical Tourette’s or XKCD passwords):
pseudocluster ubuntu precise pangolin cloudera cdh3
I spent a while getting my new laptop set up with a Cloudera CDH3 Hadoop pseudo-distributed cluster. But I really wish I’d had the following instructions, simplified off the web with some help from some of my friends.
I hope these are helpful to someone besides me.
To quote Wikipedia: an anti-pattern is:
a pattern used in social or business operations or software engineering that may be commonly used but is ineffective and/or counterproductive in practice. [emphasis mine]
I’ve been exploring patterns for actually working on software — not for designing it — and I realized that I myself spent a lot of time living inside one particular pattern, which we might call the Grad School collaboration anti-pattern.
Grad school — especially the process of writing a PhD — values three things, no matter your department or specialty:
- novelty – what you create must be different from what everybody else until now has done
- individual effort – what you create must be your own work, not something produced by a team
- completion over sustainability – sometimes called “PhinisheD”, or “the point of a PhD is to finish a PhD”.
Each of these targets is critical to the idea that a PhD is a work of heroic individual effort to expand the boundaries of science. This idea is a fiction, and — like so many useful fictions — is a useful fiction, though it’s rarely true in practice (my PhD, for example, was a product of my labmates and fellowship [ETA: and of course my own effort!]).
But each of these is actually a collaboration anti-pattern of its own:
- novelty can spiral off into Not Invented Here — and frequently does
- individual effort fosters “Colleague, pronounced as “competitor”
— many, many escape grad school (or not) with absolutely vicious attitudes towards others working on related projects
- completion over sustainability encourages the Just Ship It antipattern — most research code is so heavily grown into the bench that it cannot be run outside of the lab — often even the implementer him- or herself can no longer run it, by the time dissertation defense rolls around.
It sometimes astounds me that so many brilliant researchers survive PhD land — and it worries me: so many good software designers and implementers must be turned off by the dysfunction implied by each of these three.
Here’s my favorite collaboration pattern so far: the Bank Heist collaboration pattern. This pattern, which we know from The A-Team, Ocean’s 11 and Leverage, among others, shares many properties with an excellent developer team:
- You don’t have to like following orders to be on the team.
- Everybody’s a generalist, and an expert in one area (pickpocket, cat burglar, safe-cracker, grifter, etc) but nobody is an expert at everything.
- “Building the team” is part of the fun.
- There is – or should be – mutual respect for complementary skills.
- Everybody on the team needs to do their part and get out of the other people’s way.
- Prima donnas ruin the whole party.
- There’s even a role for management: the Nate Ford/Danny Ocean “mastermind” character is an ideal manager: he can do enough of all the other players’ roles to see how they can all work together and set up the whole job.
I don’t know if identifying this collaboration pattern is actually useful, or if it’s just entertaining, but it is undoubtedly attractive: most people I’ve shared this collaboration pattern with get very excited to work with a team that uses this collaboration pattern. If you or a team you’re on derives some benefit from this pattern, drop me a note.
A few afterthoughts (connecting to the “theater ensemble” thoughts from Beth on Twitter):
Heist movies pick up the drama when the team starts to violate these prescriptions: when the grifter decides he’d be a better mastermind than the current leader, for example. This opens up two perspective games I like to play:
- heistify: take your boring office politics (“QA is dawdling because they were convinced the dev will botch it anyway”) and rewrite into a bank heist: “safe-cracker didn’t bother bringing his stethoscope because he figured the second-story man wouldn’t be able to kill the alarms”. Much more fun, isn’t it?
- shyster: make heist movies boring again by inverting the transformation above.
Finally, heist movies have awesome soundtracks. Who wouldn’t want their workday scored with horn stings? (And, as Josh points out: you’d have a sweet van.)
Software development is a fundamentally social process: it’s all about working together. We (software developers as a caste) have expressions like “programming by contract” and design patterns like “Delegation” that reflect how we humans work together – and we use these patterns to describe how we instruct our robot minions to function. We think about our programs with social metaphors because we’re social apes: we think well with social metaphors, and our software design patterns reflect how we think best.
But we rarely use these social metaphors to think about how we make software. We need patterns for collaboration that match our social creature wetware, the way “delegation” and “factory” and “handshake” patterns help design software. Continue reading
From John D. Cook‘s Probability Facts twitter feed, discovered the infamous RANDU, and this absolutely marvelous quote:
One of us recalls producing a “random” plot with only 11 planes, and being told by his computer center’s programming consultant that he had misused the random number generator: “We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random.” Figure that out.
which in turn reminds me of this:
“RFC 1149.5 specifies 4 as the standard IEEE-vetted random number.” — Randall Munroe
My brother Daniel introduced me to a new term he and his security-geek friends are trying to encourage the rest of us mere mortals to adopt: “TPC”, or “Trusted Physical Console”.
In short, it’s the sturdy, small laptop running a trusted operating system, to which you (and probably only you) have control of the tools available. (Probably not counting: laptop provided by your boss or school, especially if you don’t have root privileges.) It’s a nifty term — especially from the point-of-view of improving security and privacy culture — because it reminds us that “the cloud” is not necessarily trustworthy, even (perhaps especially) for technically savvy people.
I made a lazyweb request for recommendations for a TPC that I can use on the bus, and here’s my out-loud thoughts summarizing the responses (received on Twitter and elsewhere):