Greater data science, part 1: the discipline

This is part of an open-ended series of marginalia to Donoho’s 50 Years of Data Science 2015 paper.

Donoho compares “data science” (or “data analysis”, a term he inherits from John Tukey) to statistics in terms of three foundational conditions, quoting Tukey:

(a1) intellectual content, (a2) organization in an understandable form, (a3) reliance upon the test of experience as the ultimate standard of validity.

These three are the answers to “what”, “how”, and “why” — for any science.

Let’s call these three core conditions contentstructure, and (a means of determining) validity.  Anything with an answer in all three rows we might call a discipline, but we distinguish among disciplines by their choice of validity conditions.

Disciplines without empiricism

Tukey (and Donoho) suggest that a science is determined by using experience (or predictive power) as their means of validity. Tukey himself declared mathematics a different kind of discipline thus:

By these tests mathematics is not a science, since its ultimate standard of validity is an agreed-upon sort of logical consistency and provability.

Other disciplines may use different validity conditions:

  • Mathematics uses a “sort of… provability” as its standard of validity, as claimed by Tukey himself.
  • New Criticism and American Protestantism both make claims based on the literality (the “immediateness” of their respective reference texts) to uphold their validity.
  • Some branches of linguistics and philosophy appeal to explanatory simplicity as a standard of validity (but I see you, Minimalism — you just pushed the complexity somewhere else and declared it not interesting; that’s called “ignoring externalities”).
  • Secular judicial law, the Talmud, the Hadiths, Roman Catholic papal decrees, and the Marvel No-Prize all incorporate adherence to precedent as a standard of validity.

I don’t mean to dismiss fields that use something other than “experience” (empiricism) as a standard of validity (though that standard is obviously appealing).  Other standards are relevant even within fields we might still consider sciences: the Bohr model of the atom is a form that can be explained to high school students, and it has validity because it’s an effective scaffold for teaching, not because it’s the most accurate in experiment (it’s known to be wrong).

This entry was posted in data science, math, Patterns, programming, statistics. Bookmark the permalink.