Rise of Big Compute

I've been harping on the importance of GPUs since my October, 2012 blog post Supercomputing for $500 and more recently in my reviews here of the SC13 conference. A couple of news stories this month indicate broader recognition of the growing importance of "Big Compute".

First is the November 9, 2013 TEDx Virginia Tech talk (released to YouTube Dec. 5) Big compute vs big data by Wu Feng who created the GPU-based HokieSpeed. In the talk, embedded below, Wu Feng shares his observation that the U.S. seems to be focused more on Big Data while China, with its fastest supercomputer in the world, seems to be focused more on Big Compute, with Big Data being just a special application, or subset, of the realm of Big Compute.

Wu Feng concludes that both Big Compute and Big Data are needed. We've been hearing a lot about "Big Data" over the past three years or so (a lot over the past 1-2 years, and a little starting with the 2004 Google papers), but "Big Compute" hasn't yet reached the same buzzword status. It needs to in order for real Data Science to progress. A lot of today's data science either is simple statistics over large data sets, or it is advanced machine learning over small data sets (that can fit in the RAM of a single machine for processing in R or iPython Notebook). There is a top-tier Big Data vendor out there that can't sell any Hadoop nodes with more than 384GB of RAM. In this new era of distributed RAM processing systems such as Spark and Druid, vendors need to catch up, so we data scientists can catch up.

The second news story this month is ExtremeTech's Massive surge in Litecoin mining leads to graphics card shortage. While Litecoin, a competitor to Bitcoin, is a niche application that is all-compute-no-data, the fact that such a niche application can cause a run on GPU cards illustrates how thin the GPU market is. If U.S. IT were embracing Big Compute more fully, the run would instead have been caused by business and scientific applications. We all felt it a couple of years ago when the Thai monsoon doubled the price of hard drives and it made all the major newspapers and media outlets. But there's been nothing in the non-tech media about the GPU shortage.

The recognition of the importance of Big Compute -- GPUs, high core counts, SIMD width, and large RAM -- is rising, at least TED and tech news outlets. But it's not mainstream yet. It needs to be for Data Science to progress.

Comments

Michael.Walker's picture

Our own Michael Malak makes the news. See: http://venturebeat.com/2013/12/16/marketers-take-note-some-data-scientists-want-big-compute/

"Michael Malak, an engineer working with data at Time Warner Cable and a board member of the Data Science Association, identified a few specific hardware components that could help data scientists do better big compute: graphic-processing units (GPUs) and random-access memory (RAM)."

rindeck's picture

We couldn’t agree more that a focus on compute is missing from the discussion and design of most solutions.  Systems now have the ability to generate, move, and store massive amounts of data but preparing that data for intelligent use and creating competitive advantage requires increasingly more computation.  For decades, high-performance computing has only been successful in isolated narrowly focused arenas but we’ve been driving an industry shift toward broadly applicable heterogeneous system architectures.  We believe that the true nature of Big Compute involves applying the right resources to the right data at the right time for optimal system performance. 

You point out that GPUs can be a great complement to conventional CPUs for the right class of problem.    We’ve found that a variety of components work best for many types of problems.  General purpose CPUs, which “aren’t bad” at everything, typically handle problems with branching, random memory access, and complex business logic.  GPUs are great at matrix and floating point operations, or for scaling SIMD problems to thousands of threads.  Imagine adding another tool to that mix that was custom-built for the bit/Byte manipulations of cryptography, regular expressions, searching, etc.  I’m referring to FPGAs which have had widespread use in academic investigation but limited commercial applications.  VelociData and its related companies have been delivering solutions built on heterogeneous computing platforms for the past decade.  These applications include biocomputation, government intelligence, Wall Street market data, and more recently big enterprise data dealing with structured and unstructured data, addressing the rich variety of data sources, and solving the velocity of data at scale.

Again, we wholeheartedly agree that Big Compute is an important topic and Big Data isn’t useful unless you have the compute power to process it properly.  VelociData embraces a heterogeneous compute architecture involving CPUs, GPUs, and FPGAs and we’ve seen tremendous results of 10x, 100x, or even 1000x the speed of CPU-only approaches.  Thank you for keeping the topic active and reminding people that there are alternatives to simply throwing conventional compute resources at every problem.