Author: Jeremy

  • Greater data science, part 1.1: the undisciplined

    This is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 paper. Fields that aren’t disciplines As discussed previously, disciplines are fields that have all three of : content: the field has something to say, organizational structure: the field has well-formed ways of saying it, and a standard of validity:…

  • reviewing software engineering for scientists – sourmash

    Apropos of software engineering for scientists, I had the opportunity to be a reviewer for C. Titus Brown‘s JOSS publication of sourmash, is a pretty cool Python library around some very fast C code for computing (and comparing) MinHash sketches on (gene) sequences. My critique of sourmash is marked “minor revisions only” because the core…

  • Greater data science, part 2.1 – software engineering for scientists

    This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. In many scientific labs, the skills and knowledge required for the research (e.g. linguistics fieldwork, sociological interview practices, wet-lab biological analysis) are not the same skills involved in software engineering or in data curation and maintenance. Some scientists…

  • spelling be hard

    I’ve written a half dozen pieces of commentary on David Donoho’s work, all the while spelling his name wrong; at least once in a permalink URL. Oh, well.  At least I can edit the posts here.

  • Greater data science, part 2: data science for scientists

    This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Many aspects of Donoho’s 2015 “greater data science” can support scientists of other stripes — and not just because “data scientist is like food cook” — if data science is a thing after all, then it has specific expertise that…

  • Greater data science, part 1: the discipline

    This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting Tukey: Let’s call these three core conditions content, structure, and (a means of determining) validity.  Anything with an answer…

  • Orlando

    I was going to post a follow up entry or two on data science, but then a red-blooded American homophobic terrorist went and murdered fifty people in Orlando with a military weapon, because they were GLBTQ, or because they were Latinx, or both. And another one got stopped on his way to Pride in LA,…

  • IDEs are Code Smell

    Some wise thoughts from my complementary-distribution doppelganger Bill McNeill, currently occupying our ecological niche in Austin: IDE-independence has a lot of advantages.

  • RMarkdown notebooks with Jupyter front-end

    Hey, nifty. I just found out that you can write RMarkdown-style literate Python files and use the Jupyter notebook environment to view and execute them (with the notedown package, which also allows you to edit them in place).  This has nice implications for source control — changes to ipython notebooks are pretty ugly.

  • Donoho’s “Greater Data Science’, part 0

    “50 years of Data Science”. Donoho, David.  2015. [link to downloadable versions] Donoho’s got a manifesto that ain’t foolin’ around.  I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia on this remarkable essay. Data science is a thing after all I’ve said elsewhere (probably…