Category: work

  • Greater data science, part 1.1: the undisciplined

    This is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 paper. Fields that aren’t disciplines As discussed previously, disciplines are fields that have all three of : content: the field has something to say, organizational structure: the field has well-formed ways of saying it, and a standard of validity:…

  • Greater data science, part 2.1 – software engineering for scientists

    This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. In many scientific labs, the skills and knowledge required for the research (e.g. linguistics fieldwork, sociological interview practices, wet-lab biological analysis) are not the same skills involved in software engineering or in data curation and maintenance. Some scientists…

  • IDEs are Code Smell

    Some wise thoughts from my complementary-distribution doppelganger Bill McNeill, currently occupying our ecological niche in Austin: IDE-independence has a lot of advantages.

  • Visualization libraries in Jupyter, Python, & R

    I’ve become a near-rabid fan of the Jupyter data analysis environment (hello Scott!), and I am deeply impressed by the work that Continuum (and some of my former colleagues at Google) have put into supporting it.  (I share some of these concerns, but that’s a post for another time.) This week I have been teaching myself…

  • Relational skills and the three wh’s

    There’s a fairly tidy — but imperfect — correspondence between the three wh’s and the relational skillsets I proposed yesterday. how corresponds well to the tooling skillset what roughly corresponds to the data stewardship skillset … leaving why to correspond to the collaboration skillset, which seems apt: why do data science if you don’t have someone you’re doing it with, or for?…

  • Relational data science skills

    Here’s what I see as ideal “data science” leadership. This post is a nod to the classic Conway Venn Diagram, but more focused on relational skills rather than the specific individual output (much as Tunkelang suggests here). Tooling skills Here, it’s most helpful to be comfortable with the family of “data science” tools that is out there, and be…

  • Three wh-‘s of data science

    “Big data” bandwagoneers may remember the three Vs of big data: volume, variety, and velocity (sometimes joined by veracity or variability[0]).  These concerns are real, though (if you’re not Google, Amazon or the NSA), your data is probably not as big as you think it is. Data “science”, though, is a bigger question than working with big data.  Sometimes…

  • I’m looking for work

    I am currently without employment, and I’m looking to see what’s next for me. I am excited about human language, computers, and machine learning, and I’m pretty good at all three and their areas of overlap. I am happiest tinkering in the “Bayesian” and “Deep” corners of the Eisner Simplex, but can keep my head above water…

  • Leaving LA

    “Data Science” in this era, like “Cognitive Science” in the nineties, seems to be several intellectual neighborhoods in search of a city.

  • “Grad school” is a collaboration anti-pattern

    To quote Wikipedia: an anti-pattern is: a pattern used in social or business operations or software engineering that may be commonly used but is ineffective and/or counterproductive in practice. [emphasis mine] I’ve been exploring patterns for actually working on software — not for designing it — and I realized that I myself spent a lot of time…