# Do We "Deserve" Unreasonable Effectiveness?

The "Unreasonable Effectiveness" meme has been accelerating. I count seven such papers and presentations, as applied to machine learning, since Google's 2009 Unreasonable Effectiveness of Data.

But what is meant by "unreasonable effectiveness"? If you read those machine learning papers, there are a few different takes on it:

• The effectiveness is unexpected, or
• The effectiveness is a great relief after decades of struggle with old techniques, or
• Empirical rather than theoretical mathematical techniques are needed to cope with the complexity of real-world data, and such empirical techniques turn out to be effective.

How about the dictionary definition of "unreasonable effectiveness"? Unreasonable is defined as:

not reasonable or rational; acting at variance with or contrary to reason; not guided by reason or sound judgment; irrational:

Does this sound like an ethical approach to data science? Let's take a step back. The 2009 Google paper was not the first to use the term "unreasonable effectiveness". In fact, they cite the 1960 paper Unreasonable Effectiveness of Mathematics in the Natural Sciences. In that paper, the author, Eugene Wigner, puts forth that it is unreasonable to expect mathematics to describe macroscopic physics so well, such as the formula for distance fallen from gravity, y = gt2/2. Wigner agrees with Newton that the formula is non-intuitive since it involves a second derivative.

How much more non-intuitive, then, is something like the Ideal Gas Law or the physical formula governing a spring? These are quite far from the five fundamental forces of physics, yet these formulas work. They describe emergent properties of complex systems. They are wholly emperical.

Wigner takes it further. In his paper, we see an echo of the dictionary definition of "unreasonable" cited above (emphasis added):

A possible explanation of the physicist’s use of mathematics to formulate his laws of nature is that he is a somewhat irresponsible person

And then to emphasize the realm of non-understanding (emphasis added):

The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. We should be grateful for it and hope that it will remain valid in future research and that it will extend, for better or for worse, to our pleasure, even though perhaps also to our bafflement, to wide branches of learning.

## Is the use of machine learning responsible?

There are two reasons why we might think the use of machine learning is irresponsible. First, a machine learning system may be involved in a critical or human safety situation, and what we'll know about it is that it works most of the time, but not necessarily why, or the worst case or outlier scenarios, especially in the case of neural networks.

Second is that as machine learning progresses, we might reach the singularity. Machine learning to date has just been about optimization. In fact, that is one reason machine learning researchers and practitioners have stuck with that moniker, to distinguish it clearly from artificial intelligence, especially human-level Artificial General Intelligence (AGI). But as neural networks continue to be refined, such as in recurrent neural networks (RNN) and neural turing machines (NTM), as researchers strive to make neural networks even more like the human brain, eventually they might actually get there, or get close enough to create a dangerous situation.

If these results in recent years from machine learning are unexpected and "unreasonably effective", how can we be sure that, as the years roll on and the technical advances pile up, "unreasonably effective" doesn't eventually mean a positive feedback loop in intelligence?

## What happened to O(n log n)?

Even putting the singularity aside and focusing on just the other concern, that of whether a machine learning model is appropriate for a human safety application, think back to sophomore Computer Science classes. Remember how sorting algorithms were analyzed to death to bound them to O(n log n)? Where is this analysis in neural network algorithms? We don't even know how they work. We know empirically, statistically, that they work, and we know some things around the edges, but we lack a complete understanding. That's fine for recommending the next movie to watch, and it's even fine for resource allocation (which indirectly does create life or death situations) because statistically the population is better off, but it wouldn't be fine for a nuclear launch decision. Not that there is actually a neural network in such a situation, but there's a line in that spectrum somewhere.

And the concluding statement of the 2009 Google paper?

Go out and gather some data, and see what it can do.