Donoho’s “Greater Data Science’, part 0

“50 years of Data Science”. Donoho, David. 2015. [link to downloadable versions]

Donoho’s got a manifesto that ain’t foolin’ around. I have a lot of thoughts about it, but I’m going to write them up as an open-ended series of marginalia on this remarkable essay.

Data science is a thing after all

I’ve said elsewhere (probably also elsewhere on this blog) that I’m not sure “data science” is a thing: to paraphrase ~~Dorothy Parker~~ Aldous Huxley, data science has always seemed like “72 nine suburbs in search of a ~~city~~ metropolis”.

But I’m here to bring the good word: Greater Data Science, as Denoho describes it, probably is a thing.

Data science is more than “merely” CS or statistics

The activities of Greater Data Science are classified into 6 divisions: 1. Data Exploration and Preparation 2. Data Representation and Transformation 3. Computing with Data 4. Data Modeling 5. Data Visualization and Presentation 6. Science about Data Science — These six are not neatly captured in statistics, computer science, *or* the union of the two.

These six divisions aren’t covered by statistics (as an academic discipline), by computer science (though the union of the machine learning, distributed computation, and databases wings cover some of these), nor by the union of the two, which largely leaves out (1), (6), and — to some degree — (5). Existing “Data Science” masters’ programs tend to cover some of the overlap between GDS and the union of statistics and computer science, and applied data scientists “in the field” (usually the industry) sometimes have fairly deep knowledge of (1) as it applies to their particular subdomain, e.g. geocoding.

Almost nobody covers (6) “Science about data science” from a well-informed content and structure, and I’d like to see more data scientists getting involved in all six parts here. Even more important to me is the idea that we share theory and praxis — in all six “activities” — across applications, which is familiar to statisticians but not to computer science nor applied domains like biostatistics or NLP.

Forward pointers

Future posts in this series will include thoughts about:

what is a discipline (and what isn’t — at least not yet)
the power of metanalysis (including surveys of methods) and the analogies to the common task framework
the value (and risks) of mentorship and the common task frameworks.

Posted

June 7, 2016

academics, data science, programming, statistics

Jeremy

Tags:

Comments

4 responses to “Donoho’s “Greater Data Science’, part 0”

trochee

June 7, 2016

Denoho’s “Greater Data Science’, part 0 https://t.co/Z7kNAGCm6S https://t.co/xDVxGqRqaT

Reply
Greater data science, part 2: data science for scientists | Trochaisms

June 16, 2016

[…] is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 […]

Reply
Greater data science, part 1: the discipline | Trochaisms

June 16, 2016

[…] is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 […]

Reply
Greater data science, part 2.1 – software engineering for scientists | Trochaisms

June 16, 2016

[…] is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 […]

Reply