Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

The innards of H2O

Cliff Click (0xdata)
11:15–11:55 Thursday, 2/06/2016
Data innovations
Location: Capital Suite 14 Level: Advanced
Average rating: ***..
(3.67, 3 ratings)

Prerequisite knowledge

Attendees should know Java and have some parallel or distributed computing experience.

Description

H2O is an open source platform for machine learning and big data/big math. Cliff Click offers a technical talk focused on the insides of H2O, specifically focusing on its single-system-image aspect. Cliff explains how you can write simple, single-threaded Java code and have H2O autoparallelize and auto-scale-out to hundreds of nodes and thousands of cores.

H2O is clustering: from your laptop to hundreds of nodes, you get a single system image, allowing easy aggregation of all the memory and all the cores and a simple coding style that scales wide at in-memory speeds. H2O is easily 1,000x faster than disk-based clustering solutions and often 10x faster than best-of-breed alternative in-memory solutions.

H2O is big data: we ingest a wide variety of formats—in parallel and distributed across the cluster—and store the data column-compressed, often exceeding 2x to 4x gzip-on-disk.

H2O is big math: we do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience.

H2O is machine learning: on this big data, big math platform we have best-of-breed implementations of effective and popular machine-learning algorithms (e.g., deep learning (neural nets), GBM, random forest, GLM, K-means, PCA, naive Bayes, and more) with all the features you need to do real data science built in. Finally, H2O interacts directly with Python, R, Scala, Spark, REST/JSON, and a JS-based web browser, making it the most interconnected machine-learning platform out there.

Photo of Cliff Click

Cliff Click

0xdata

Cliff Click is the CTO and cofounder of 0xdata, a firm dedicated to creating a new way to think about web-scale math and real-time analytics. Cliff wrote his first compiler when he was 15 (Pascal to TRS Z-80), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). He helped Azul Systems build an 864-core pure-Java mainframe that keeps GC pauses on 500 GB heaps to under 10 ms and worked on all aspects of that JVM. Previously, Cliff worked on HotSpot at Sun Microsystems. He is at least partially responsible for bringing Java into the mainstream. He is the author of about 15 patents, has published many papers about HotSpot technology, and is regularly invited to speak at industry and academic conferences. Cliff holds a PhD in computer science from Rice University.