Category Archives: statistics

Greater data science, part 2.1 – software engineering for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. In many scientific labs, the skills and knowledge required for the research (e.g. linguistics fieldwork, sociological interview practices, wet-lab biological analysis) are not … Continue reading

Posted in data science, programming, statistics, work | 6 Comments

Greater data science, part 2: data science for scientists

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Many aspects of Donoho’s 2015 “greater data science” can support scientists of other stripes — and not just because “data scientist is like food cook” … Continue reading

Posted in data science, programming, statistics | 3 Comments

Greater data science, part 1: the discipline

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper. Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting … Continue reading

Posted in data science, math, Patterns, programming, statistics | 1 Comment

Donoho’s “Greater Data Science’, part 0

“50 years of Data Science”. Donoho, David.  2015. [link to downloadable versions] Donoho’s got a manifesto that ain’t foolin’ around.  I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia … Continue reading

Posted in academics, data science, programming, statistics | 4 Comments

Visualization libraries in Jupyter, Python, & R

I’ve become a near-rabid fan of the Jupyter data analysis environment (hello Scott!), and I am deeply impressed by the work that Continuum (and some of my former colleagues at Google) have put into supporting it.  (I share some of these … Continue reading

Posted in data science, statistics, tech, work | Leave a comment

Relational data science skills

Here’s what I see as ideal “data science” leadership. This post is a nod to the classic Conway Venn Diagram, but more focused on relational skills rather than the specific individual output (much as Tunkelang suggests here). Tooling skills Here, it’s most helpful to … Continue reading

Posted in data science, Patterns, programming, statistics, work | 2 Comments

Three wh-‘s of data science

“Big data” bandwagoneers may remember the three Vs of big data: volume, variety, and velocity (sometimes joined by veracity or variability[0]).  These concerns are real, though (if you’re not Google, Amazon or the NSA), your data is probably not as big as you think … Continue reading

Posted in Patterns, programming, statistics, work | 2 Comments

Samyro 0.0.2 – sampling structured inputs

New version of Samyro (0.0.2) now uploaded to Pypi. Github repo has the details, but I’ll brag about the new features: samyro write accepts a –seed argument, which allows the usual temperature-based decoding *after* the engine has progressed through the given … Continue reading

Posted in linguistics, Patterns, programming, samyro, statistics | 2 Comments

Looking for work, 2012 edition

A short note (implied by my updates on Twitter), just to say: I was laid off last week from my previous employment in an abrupt downsizing — a company pivot, evidently away from the work I like to do.  I’m … Continue reading

Posted in admin, information theory, linguistics, programming, statistics, work | Leave a comment

Those who got fired up about Chomsky’s difficult comments regarding empiricism, including myself, will be gratified to see that Peter Norvig, patron saint of data-driven computational linguistics (inter alia), has released his own comments, along the same lines as mine, only … Continue reading

Posted on by Jeremy | 7 Comments