Trochaisms – Letters make words; sentences make paragraphs

Category: programming

reviewing software engineering for scientists – sourmash

Apropos of software engineering for scientists, I had the opportunity to be a reviewer for C. Titus Brown‘s JOSS publication of sourmash, is a pretty cool Python library around some very fast C code for computing (and comparing) MinHash sketches on (gene) sequences. My critique of sourmash is marked “minor revisions only” because the core…

June 17, 2016
Greater data science, part 2.1 – software engineering for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. In many scientific labs, the skills and knowledge required for the research (e.g. linguistics fieldwork, sociological interview practices, wet-lab biological analysis) are not the same skills involved in software engineering or in data curation and maintenance. Some scientists…

June 16, 2016
Greater data science, part 2: data science for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Many aspects of Donoho’s 2015 “greater data science” can support scientists of other stripes — and not just because “data scientist is like food cook” — if data science is a thing after all, then it has specific expertise that…

June 16, 2016
Greater data science, part 1: the discipline

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting Tukey: Let’s call these three core conditions content, structure, and (a means of determining) validity. Anything with an answer…

June 13, 2016
IDEs are Code Smell

Some wise thoughts from my complementary-distribution doppelganger Bill McNeill, currently occupying our ecological niche in Austin: IDE-independence has a lot of advantages.

June 10, 2016
RMarkdown notebooks with Jupyter front-end

Hey, nifty. I just found out that you can write RMarkdown-style literate Python files and use the Jupyter notebook environment to view and execute them (with the notedown package, which also allows you to edit them in place). This has nice implications for source control — changes to ipython notebooks are pretty ugly.

June 8, 2016
Donoho’s “Greater Data Science’, part 0

“50 years of Data Science”. Donoho, David. 2015. [link to downloadable versions] Donoho’s got a manifesto that ain’t foolin’ around. I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia on this remarkable essay. Data science is a thing after all I’ve said elsewhere (probably…

June 7, 2016
Relational data science skills

Here’s what I see as ideal “data science” leadership. This post is a nod to the classic Conway Venn Diagram, but more focused on relational skills rather than the specific individual output (much as Tunkelang suggests here). Tooling skills Here, it’s most helpful to be comfortable with the family of “data science” tools that is out there, and be…

May 18, 2016
Rolling the dice at the Just World Casino

tl;dr: The tech frame of “lean startup”, venture capital funding, “exit strategies”, and relentless “valuation” talk is fundamentally anti-human for nearly all of us. [ETA (immediately after publication):] Startup idea: They are treated like bees; they are robbed of the honey they make. — Hottest Startups (@HottestStartups) May 16, 2016 The kneejerk libertarianism and Randian…

May 15, 2016
Three wh-‘s of data science

“Big data” bandwagoneers may remember the three Vs of big data: volume, variety, and velocity (sometimes joined by veracity or variability[0]). These concerns are real, though (if you’re not Google, Amazon or the NSA), your data is probably not as big as you think it is. Data “science”, though, is a bigger question than working with big data. Sometimes…

May 3, 2016