michaelmalak's blog

AWS Spark matches Edison supercomputer for Big Data HPC

I've been blogging about the overlap between HPC and Big Data for a couple of years now. I found out today IDC has even given it an acronym: HPDA (High Performance Data Analytics). HPDA applications go beyond just the simulation tasks traditionally associated with HPC, and use the prodigious amounts of data often associated with business data (click streams, social media interactions, etc.) that are increasingly common in scientific domains (DNA sequencing, not throwing away sensor data, imaging, etc.).

Spark 1.4 for Data Scientists; Spark 1.5 & 1.6 for core improvements

The theme at Spark Summit 2015 this week can be boiled down to "Spark 1.4 is for data scientists". The "first new supported language in over a year" is the highlight of Spark 1.4: SparkR, originally an AMPLab project, is now part of the Apache Spark distribution. Another data science improvement is Spark ML (which has the ML pipelines, and also which may eventually replace Spark MLLib) is now out of beta alpha.