Multi-Tenancy Myths

Although the term has been around Hadoop circles for a few years, 2014 saw its rise to prominence. And while it's fine for an organization mature in Hadoop to adopt multi-tenancy, the technology is still too immature to fire up in an organization new to Hadoop.

First of all, Hadoop is just like any other IT asset: you need dev, test, and prod systems. If you don't have at least these three Hadoop clusters, it's too early to think about multi-tenancy. And don't even think about trying to use multi-tenancy to shove dev, test and prod onto a single physical cluster!

The need for separate environments is more pronounced in Hadoop, not less. Developers need to be able to experiment with all the new technologies, such as data streaming and in-memory computing, without having to worry about stomping on production or accessing PII (personally identifiable information) data.

Second of all, the multi-tenancy software, YARN or Mesos, are still new. To be sure, large, established Hadoop shops have put them into production. But YARN only allocates RAM and # of CPU cores. Allocating disk and network are planned for the future according to a Hortworks blog post. Even within the current YARN realm of RAM and # of CPU cores, the default scheduler doesn't even consider them jointly.

And while major Hadoop distributions now incorporate YARN (probably because the standard Apache distribution of Hadoop includes it), none incorporates Mesos, let alone Aurora. The 2010 Mesos had the same limitations as YARN, managing just RAM and # of CPU cores. Aurora, which builds on Mesos, adds CPU time slices and disk space as resources.

If your organization is having trouble justifying the capital expense of a Hadoop cluster, multi-tenancy is not a way to get it to pencil out, despite how nice that might look in an Excel spreadsheet. Instead, multi-tenancy is a way for an organization that already has multiple sets of Hadoop clusters running proven software (two or more sets of dev, test, prod) and is looking to consolidate clusters.