Category: Patterns

  • Greater data science, part 1.1: the undisciplined

    This is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 paper. Fields that aren’t disciplines As discussed previously, disciplines are fields that have all three of : content: the field has something to say, organizational structure: the field has well-formed ways of saying it, and a standard of validity:…

  • Greater data science, part 1: the discipline

    This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting Tukey: Let’s call these three core conditions content, structure, and (a means of determining) validity.  Anything with an answer…

  • IDEs are Code Smell

    Some wise thoughts from my complementary-distribution doppelganger Bill McNeill, currently occupying our ecological niche in Austin: IDE-independence has a lot of advantages.

  • Relational skills and the three wh’s

    There’s a fairly tidy — but imperfect — correspondence between the three wh’s and the relational skillsets I proposed yesterday. how corresponds well to the tooling skillset what roughly corresponds to the data stewardship skillset … leaving why to correspond to the collaboration skillset, which seems apt: why do data science if you don’t have someone you’re doing it with, or for?…

  • Relational data science skills

    Here’s what I see as ideal “data science” leadership. This post is a nod to the classic Conway Venn Diagram, but more focused on relational skills rather than the specific individual output (much as Tunkelang suggests here). Tooling skills Here, it’s most helpful to be comfortable with the family of “data science” tools that is out there, and be…

  • Three wh-‘s of data science

    “Big data” bandwagoneers may remember the three Vs of big data: volume, variety, and velocity (sometimes joined by veracity or variability[0]).  These concerns are real, though (if you’re not Google, Amazon or the NSA), your data is probably not as big as you think it is. Data “science”, though, is a bigger question than working with big data.  Sometimes…

  • Samyro 0.0.2 – sampling structured inputs

    New version of Samyro (0.0.2) now uploaded to Pypi. Github repo has the details, but I’ll brag about the new features: samyro write accepts a –seed argument, which allows the usual temperature-based decoding *after* the engine has progressed through the given seed. The default seed is now the BOS character, which plays nicely with the structured…

  • Samyro 0.0.1 – development update

    Last week I posted a new Python package to Pypi: Samyro, my toolkit for doing RNN-based character synthesis of new text from given text. I summarize in tweets the journey to the release of the project.  There’s lots more to do, but much of the experimentation now is based on changing the input texts that…

  • “Grad school” is a collaboration anti-pattern

    To quote Wikipedia: an anti-pattern is: a pattern used in social or business operations or software engineering that may be commonly used but is ineffective and/or counterproductive in practice. [emphasis mine] I’ve been exploring patterns for actually working on software — not for designing it — and I realized that I myself spent a lot of time…

  • “Bank heist” collaboration pattern

    Here’s my favorite collaboration pattern so far: the Bank Heist collaboration pattern. This pattern, which we know from The A-Team, Ocean’s 11 and Leverage, among others, shares many properties with an excellent developer team: You don’t have to like following orders to be on the team. Everybody’s a generalist, and an expert in one area (pickpocket, cat burglar, safe-cracker, grifter, etc)…