Cowboy Data Science

Meme from Steve Temple

The opposite of "cowboy coding" is having a solid process, which includes unit testing. Data Science has matured to the point now that we should look to add this as standard practice and start to expect it.

The challenges are:

A good data science unit test would load known data, execute the transformations and algorithms, and compare against expected results. If after several iterations of tweaking code to answer new questions these expected results from various intermediate processing stages diverge from the actual results, the unit tests will flag them, preventing the reporting of erroneous results.

Plus there are all the other usual benefits of unit tests:

  • Validates from the outset that the processing and algorithms are behaving as expected
  • Serves as documentation to readers of the code what the expected inputs and outputs are
  • Provides confidence during refactoring