2015-05-26 Intro to Apache Ignite & Semi-supervised Learning

Register here.

Rock Bottom Restaurant & Brewery - Tuesday May 19, 2015 @ 6:00pm MDT

NOTE: For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.

Location: Rock Bottom Brewery - 16th Street Mall #100, Denver, CO 80265 - Map: https://goo.gl/maps/Pphtt

Agenda:

6:00 - 6:20 Schmooze - Food shall be served

6:20 - 6:30 Announcements

6:30 - 7:30 Intro to Apache Ignite: Distributed Framework for Unified In-memory Data Fabric by Nikita Ivanov

7:30 - 8:00 Extending Word2Vec for Performance and Semi-supervised Learning by Michael Malak

8:00 - 8:30 Networking

Intro to Apache Ignite: Distributed Framework for Unified In-memory Data Fabric - Abstract

An introduction to Apache Ignite™ (incubating), which is an open source, distributed framework for a unified In-Memory Data Fabric. Ignite provides a high-performance, distributed in-memory data management software layer that has been designed to operate between both new and existing data sources and applications, boosting application performance and scale by orders of magnitude. We will start with a summary of the technical drivers and market forces, and will cover popular and emerging use cases for in-memory computing, from financial industry trading platforms to mobile payment processing, online advertising, online/mobile gaming back-ends and more. We will then present some foundational concepts and terminology, and discuss the architecture, capabilities and benefits of the Ignite In-Memory Data Fabric in quite some detail.

Nikita Ivanov - Bio

Nikita Ivanov is founder and CTO of GridGain Systems, the leading Java in-memory data fabric and a PMCC Member of the Apache Ignite™ (incubating) project. Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems.

Extending Word2Vec for Performance and Semi-supervised Learning - Abstract

MLLib Word2Vec is an unsupervised learning technique that can generate vectors of features that can then be clustered. But the weakness of unsupervised learning is that although it can say an apple is close to a banana, it can’t put the label of “fruit” on that group. We show how MLLib Word2Vec can be combined with the human-created data of YAGO2 (which is derived from the crowd-sourced Wikipedia metadata), along with the NLP metrics Levenshtein and Jaccard, to properly label categories. As an alternative to GraphX even though YAGO2 is a graph, we make use of Ankur Dave’s powerful IndexedRDD, which is slated for inclusion in Spark 1.3 or 1.4. IndexedRDD is also used in a second way: to further parallelize MLLib Word2Vec. The use case is labeling
columns of unlabeled data uploaded to the Oracle Data Enrichment Cloud Service (ODECS) cloud app, which processes big data in the cloud.

Michael Malak - Bio

Michael Malak has been implementing Spark solutions for two Fortune 200 companies since early 2013. He is currently at Oracle in Colorado in a team developing a Spark-based Big Data cloud app. He has an M.S. Math from George Mason University. His book Spark GraphX In Action is due to be published later in 2015.

Date: 
Tuesday, May 19, 2015 - 6:00pm to 8:30pm