H2O is an open source platform for machine learning and big data/big math. Cliff Click offers a technical talk focused on the insides of H2O, specifically focusing on its single-system-image aspect. Cliff explains how you can write simple, single-threaded Java code and have H2O autoparallelize and auto-scale-out to hundreds of nodes and thousands of cores.
H2O is clustering: from your laptop to hundreds of nodes, you get a single system image, allowing easy aggregation of all the memory and all the cores and a simple coding style that scales wide at in-memory speeds. H2O is easily 1,000x faster than disk-based clustering solutions and often 10x faster than best-of-breed alternative in-memory solutions.
H2O is big data: we ingest a wide variety of formats—in parallel and distributed across the cluster—and store the data column-compressed, often exceeding 2x to 4x gzip-on-disk.
H2O is big math: we do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience.
H2O is machine learning: on this big data, big math platform we have best-of-breed implementations of effective and popular machine-learning algorithms (e.g., deep learning (neural nets), GBM, random forest, GLM, K-means, PCA, naive Bayes, and more) with all the features you need to do real data science built in. Finally, H2O interacts directly with Python, R, Scala, Spark, REST/JSON, and a JS-based web browser, making it the most interconnected machine-learning platform out there.
Cliff Click is the CTO and cofounder of 0xdata, a firm dedicated to creating a new way to think about web-scale math and real-time analytics. Cliff wrote his first compiler when he was 15 (Pascal to TRS Z-80), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). He helped Azul Systems build an 864-core pure-Java mainframe that keeps GC pauses on 500 GB heaps to under 10 ms and worked on all aspects of that JVM. Previously, Cliff worked on HotSpot at Sun Microsystems. He is at least partially responsible for bringing Java into the mainstream. He is the author of about 15 patents, has published many papers about HotSpot technology, and is regularly invited to speak at industry and academic conferences. Cliff holds a PhD in computer science from Rice University.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.