Category Archives: data science

Greater data science, part 1.1: the undisciplined

This is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 paper. Fields that aren’t disciplines As discussed previously, disciplines are fields that have all three of : content: the field has something to say, … Continue reading

Posted in data science, Patterns, work | 1 Comment

Apropos of software engineering for scientists, I had the opportunity to be a reviewer for C. Titus Brown‘s JOSS publication of sourmash, is a pretty cool Python library around some very fast C code for computing (and comparing) MinHash sketches … Continue reading

Posted on by Jeremy | 1 Comment

Greater data science, part 2.1 – software engineering for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. In many scientific labs, the skills and knowledge required for the research (e.g. linguistics fieldwork, sociological interview practices, wet-lab biological analysis) are not … Continue reading

Posted in data science, programming, statistics, work | 6 Comments

I’ve written a half dozen pieces of commentary on David Donoho’s work, all the while spelling his name wrong; at least once in a permalink URL. Oh, well.  At least I can edit the posts here.

Posted on by Jeremy | 1 Comment

Greater data science, part 2: data science for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Many aspects of Donoho’s 2015 “greater data science” can support scientists of other stripes — and not just because “data scientist is like food cook” … Continue reading

Posted in data science, programming, statistics | 3 Comments

Greater data science, part 1: the discipline

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting … Continue reading

Posted in data science, math, Patterns, programming, statistics | 1 Comment

Hey, nifty. I just found out that you can write RMarkdown-style literate Python files and use the Jupyter notebook environment to view and execute them (with the notedown package, which also allows you to edit them in place).  This has nice implications for … Continue reading

Posted on by Jeremy | 1 Comment

Donoho’s “Greater Data Science’, part 0

“50 years of Data Science”. Donoho, David.  2015. [link to downloadable versions] Donoho’s got a manifesto that ain’t foolin’ around.  I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia … Continue reading

Posted in academics, data science, programming, statistics | 4 Comments

Visualization libraries in Jupyter, Python, & R

I’ve become a near-rabid fan of the Jupyter data analysis environment (hello Scott!), and I am deeply impressed by the work that Continuum (and some of my former colleagues at Google) have put into supporting it.  (I share some of these … Continue reading

Posted in data science, statistics, tech, work | Leave a comment

Relational data science skills

Here’s what I see as ideal “data science” leadership. This post is a nod to the classic Conway Venn Diagram, but more focused on relational skills rather than the specific individual output (much as Tunkelang suggests here). Tooling skills Here, it’s most helpful to … Continue reading

Posted in data science, Patterns, programming, statistics, work | 2 Comments