michaelmalak's blog

Table of XX2Vec Algorithms

XX2Vec Embed In Sup/Unsup Algorithms used
Char2Vec Character Sentence Unsupervised CNN -> LSTM
Word2Vec Word Sentence Unsupervised ANN
GloVe Word Sentence Unsupervised SGD
Doc2Vec Paragraph Vector Document Supervised ANN -> Logistic Regression
Image2Vec Image Elements Image Unsupervised DNN
Video2Vec Video Elements Video Supervised CNN -> MLP

The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.

Spark Streaming 1.6: Stop Using updateStateByKey()

Last night, Tathagata Das resolved SPARK-11290, "Implement trackStateByKey for improved state management", which will bring a 7x performance improvement to Spark Streaming when Spark 1.6 is released in December, 2015.

trackStateByKey() offers three benefits over updateStateByKey(), which has served as the workhorse of Spark Streaming since its inception in 2012: