“50 years of Data Science”. Donoho, David. 2015. [link to downloadable versions]
Donoho’s got a manifesto that ain’t foolin’ around. I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia on this remarkable essay.
Data science is a thing after all
I’ve said elsewhere (probably also elsewhere on this blog) that I’m not sure “data science” is a thing: to paraphrase
Dorothy Parker Aldous Huxley, data science has always seemed like “ 72 nine suburbs in search of a city metropolis”.
But I’m here to bring the good word: Greater Data Science, as Denoho describes it, probably is a thing.
Data science is more than “merely” CS or statistics
These six divisions aren’t covered by statistics (as an academic discipline), by computer science (though the union of the machine learning, distributed computation, and databases wings cover some of these), nor by the union of the two, which largely leaves out (1), (6), and — to some degree — (5). Existing “Data Science” masters’ programs tend to cover some of the overlap between GDS and the union of statistics and computer science, and applied data scientists “in the field” (usually the industry) sometimes have fairly deep knowledge of (1) as it applies to their particular subdomain, e.g. geocoding.
Almost nobody covers (6) “Science about data science” from a well-informed content and structure, and I’d like to see more data scientists getting involved in all six parts here. Even more important to me is the idea that we share theory and praxis — in all six “activities” — across applications, which is familiar to statisticians but not to computer science nor applied domains like biostatistics or NLP.
Future posts in this series will include thoughts about:
- what is a discipline (and what isn’t — at least not yet)
- the power of metanalysis (including surveys of methods) and the analogies to the common task framework
- the value (and risks) of mentorship and the common task frameworks.