Like many others last night, I was stuck staring at server errors and blank browser screens while trying to purchase advance movie tickets to Star Wars Episode VII: The Force Awakens. There is a bit of irony here. I had no trouble purchasing tickets online in 1999 for Episode I, in 2002 for Episode II, or in 2005 for Episode III. But now that "buying movie tickets on the Internet" is a "thing", servers were overloaded.
Apache Spark itself
Spark originally came out of Berkeley AMPLab and even today AMPLab projects, even though they are not in Apache Spark Foundation, enjoy a status a bit over your everyday github project.
Spark's own MLLib forms the bottom layer of the three-layer ML Base, with MLI being the middle layer and ML Optimizer being the most abstract layer.
The data scientist is dead. Long live data science!
Well, not dead, but certainly dying. Up until late 2012, the Google search popularity for "data scientist" tracked that for "data science" but thereafter has sagged.
This trend is even confirmed, though to a lesser degree, in Indeed.com job postings:
Why is this? I can think of three possible reasons:
Spark 1.5 was released today. Of the 1,516 Jira tickets that comprise the 1.5 release, I have highlighted a few important ones below, broken down by major Spark component.
The first major phase of Project Tungsten (aside from a small portion that went into 1.4)
Last year, I blogged about The Fourth Bubble in the Data Science Venn Diagram: Social Sciences, where ignoring the human aspect of data science -- how people consume and interact with products and services and conversely how data science affects people -- leads to misdirected data science that is suboptimal or even possibly harmful.
Remember the "business rules" craze of the early 2000s? They were popular especially with mortgage lenders. An example ILOG JRules decision table for mortgage lending is shown above.