Greater data science, part 1.1: the undisciplined

This is part of an open-ended series of marginalia to Denoho’s 50 Years of Data Science 2015 paper.

Fields that aren’t disciplines

As discussed previously, disciplines are fields that have all three of :

  • content: the field has something to say,
  • organizational structure: the field has well-formed ways of saying it, and
  • a standard of validity: a way in which the field admits new contributions.
Three intersecting circles ("useful", "structured", and "delimited"  make up "Knowledge" diagram, intersecting in the middle with "discipline". Other overlaps are also labeled.

Tukey & Donoho’s three categories make a nice Venn diagram (pace Drew Conway).

Two out of three ain’t bad

Fields that have content but no organizational structure are usually learned “by osmosis”, where information is passed through informal or formal co-placement like apprenticeships. These fields are not disciplines (for lack of structure), and I dub them lore (best practices in the diagram). There are plenty of nice things about lore; arguably, “data science” (and much software) industry practices are mostly lore:

  • Hadoop and its niche
  • graph database wizardry
  • visualization demos that sparkle
  • ontologies that SPARQL
  • regex and formatting tricks for standardizing non-standardized input
  • etc

but the industry practice as a whole suffers from this being lore: access to secret knowledge requires social engagements that are unevenly distributed (yes, in the Gibson “future is already here; it’s just unevenly distributed” sense).  (Keen readers may have noticed the Marvel No-Prize mentioned in the previous post is probably lore, rather than a discipline, since the entire point of the prize is to award in-group knowledge.)

There are also fields with content and structure but no standard of validity except perhaps the approval by the community’s gatekeepers; these fields maybe are not “disciplines” at all but might be called a craft or guild, depending on your level of cynicism.

Lastly, there are fields without much content, but with organizational structure and well-formed means of admission: we might choose to call one of these fields a society or a pedagogy, in that we have a means to instruct but not necessarily something worthwhile in which to instruct There’s not much to say about these, but plenty of academic departments and industry org charts have structure and rules and process for climbing (or preventing the climbing) of the ladders — much of which has little to do with the intellectual content of the organization, but which invoke structure and means of validity.

One-legged tables aren’t very stable

And a brief lap around the outside of the diagram:

  • useful information without a means of admitting validity or theoretical backing is at best an insight, reflecting its difficulty in transmitting it to others
  • theoretical structure without content or easily gleaned utility is (at best) a code of behavior; the rules of chivalry, for example, have (since the transition away from feudalism) become a formal code without guiding content or much utility with respect to the facts of the present day.  (Not to say that codes of behavior aren’t worthwhile — but codes of behavior when combined with best practices can form a complete discipline.)
  • Finally, a well-formed machinery for validation — without content or theory — is a Chinese Room sort of crank-turner. The Sieve of Eratosthenes, for example, is a procedure for validation that produces prime numbers, but says nothing about what to do with those primes nor what theoretical framework producing more primes reinforces.

And Goebbels has no b**ls at all

As a post-scriptum: the background (negative space) in the diagram represents those kinds of knowledge that have no utility, no theory, and no procedure for validation. Those spaces are “not even wrong“, attributed to the infamous jerk Fermi.

As a post-post-scriptum, I developed some Python tooling to generate that diagram, and I got to be funny with it and use all the least-pleasant versions of each category name:

I plan to bundle up the tooling into a joke-template package to publish sometime in the next week or two, to completely kill that joke dead.


This entry was posted in data science, Patterns, work. Bookmark the permalink.