Perils of a Chief Data Scientist at the National Government Level

Last week, my compatriot gushed over the New Chief Data Scientist of the United States Government, DJ Patil. He was not alone; the guys at the Partially Derivative podcast did as well.

Now, yes, we should be happy that government is not keeping itself in the stone age. But my two specific criticisms of all this gushing are:

  1. We should be rooting more for data science to be in the hands of the people than we are rooting for it to be in the hands of government.
  2. The catchphrase from the 90's was that the Internet was the ultimate disintermediator, reducing transaction costs, and enabling peer-to-peer transactions without the need for large organizations such as government. During the 20th century, we've been lulled into relying on government as the default, and with the Internet and now Data Science, it's time to return to normal-sized government.

Data Science for the people

As I wrote in my Reverse Democracy blog post (image below), government (or at least, the political candidates), have already been bludgeoning the populace with data science. My hope was that by 2020, the people would get together and counter the politician's data science with their own data science.

We already have some of that with,, and, but we need much more. As much as I like the work has been doing over the past decade exposing voter fraud and electronic voting machine tampering, I'd like to see more data science-oriented analysis of skewed voting results. As yet another application of data science, I'd like to see grass roots populist candidates be able to make use of data science for running their campaigns as the billion-dollar campaigns had access to in 2012.

Prisoner's Dilemma

The Prisoner's Dilemma is about cooperation vs. selfish interests, and the reasoning was that there are certain services government is more efficient at providing than the market. Michael Walker said something to this effect.

But why is government more efficient at some things? It comes down to Coase's Theorem. It says that the free market is always the most efficient, assuming transaction costs are zero, but since transaction costs are rarely zero, the free market is not always the most efficient. Roads are a good example. Here in Colorado, we have full-speed tolling (75 MPH posted, 80 MPH prevailing, and even faster for those luxury pickup trucks coming out of Douglas County). You don't slow down, and you don't even need a transponder; billing by license plate video capture (again, at full speed in the main lanes) is only 25% higher in price. This manner of toll collection has reduced transaction costs to near zero -- no wages for human toll operators, and no lost time for drivers to fumble for change. Now, E-470 was organized and built by the Colorado government, but imagine combining this wireless tolling technology with Kickstarter to get a new road built. Government would have to become involved only to provide the courts for eminent domain. Kickstarter itself is another example of reduced transaction costs -- imagine trying to advertise and organize that large a group of people in the pre-Internet world.

Another example is the very Ski travel machine learning contest that the Data Science Association is running along with Big Data in Denver. It is an example of what transportation geeks call "Transportation Demand Management" (TDM). Other forms of TDM include electronic message boards, demand-priced tolling, government-coordinated van pools, incentivizing building residential near transit stops, taxing gas or mileage, and many more forms. TDM was once strictly the purview of government, but now thanks to the availability of data sources and data science tools (that a decade ago would have cost thousands or tens of thousands of dollars), individuals are compete in the Data Science Association contest to supply TDM to the people.

Transaction costs have indeed been lowered: wireless communication, Internet, Kickstarter, PayPal. And with these lowered transaction costs, we need to think what the people can do for themselves instead of automatically thinking what the government can do.

The gushing by Michael Walker and the Partially Derivative guys over the U.S. governments use of healthcare data I found particularly scary. The Partially Derivative guys were like, "hey, wouldn't it be neat to log in to a website to view my medical history?" No, no it wouldn't. Such information would be ripe for abuse by government. It would be a dream come true for Edgar Hoover. Combine it with Data Science, and an Edgar Hoover would have automated means to identify "outliers". Hope you aren't an outlier.


It would be ridiculous to hold back data science from government. But a prerequisite to celebration is to have equal power in the hands of the people. And with the tools available to us, it's more a matter of changing our way of thinking than any other restriction.