michaelmalak's blog

Data Governance Begins With Data Acquisition

Like many others last night, I was stuck staring at server errors and blank browser screens while trying to purchase advance movie tickets to Star Wars Episode VII: The Force Awakens. There is a bit of irony here. I had no trouble purchasing tickets online in 1999 for Episode I, in 2002 for Episode II, or in 2005 for Episode III. But now that "buying movie tickets on the Internet" is a "thing", servers were overloaded.

39 Machine Learning Libraries for Spark, Categorized

Apache Spark itself

1. MLlib


Spark originally came out of Berkeley AMPLab and even today AMPLab projects, even though they are not in Apache Spark Foundation, enjoy a status a bit over your everyday github project.

ML Base

Spark's own MLLib forms the bottom layer of the three-layer ML Base, with MLI being the middle layer and ML Optimizer being the most abstract layer.

Data Science Overtaking The Data Scientist

The data scientist is dead. Long live data science!

Well, not dead, but certainly dying. Up until late 2012, the Google search popularity for "data scientist" tracked that for "data science" but thereafter has sagged.

This trend is even confirmed, though to a lesser degree, in Indeed.com job postings:

Why is this? I can think of three possible reasons:

Google Car and The Fourth Bubble in the Data Science Venn Diagram

Last year, I blogged about The Fourth Bubble in the Data Science Venn Diagram: Social Sciences, where ignoring the human aspect of data science -- how people consume and interact with products and services and conversely how data science affects people -- leads to misdirected data science that is suboptimal or even possibly harmful.