Source code and install instructions can be found here.
Jayant Shekhar, Vartika Singh, and Krishna Sankar explore techniques for building machine-learning apps using Spark ML as well as the principles of graph processing with Spark GraphX. Jayant, Vartika, and Krishna cover the various algorithms available in Spark ML—including those for doing basic statistics, classification and regression, collaborative filtering, clustering, dimensionality reduction, and frequent pattern mining, as well as streaming k-means clustering—and walk attendees through demos of the provided source code, solving various problems using these algorithms. They will also outline use cases for graph processing and offer an overview of programming with Spark GraphX followed by coding for different graph processing problems using GraphX.
How to apply Spark ML libraries for:
How to use GraphX to:
Jayant Shekhar is the founder of Sparkflows Inc., which enables machine learning on large datasets using Spark ML and intelligent workflows. Jayant focuses on Spark, streaming, and machine learning and is a contributor to Spark. Previously, Jayant was a principal solutions architect at Cloudera working with companies both large and small in various verticals on big data use cases, architecture, algorithms, and deployments. Prior to Cloudera, Jayant worked at Yahoo, where he was instrumental in building out the large-scale content/listings platform using Hadoop and big data technologies. Jayant also worked at eBay, building out a new shopping platform, K2, using Nutch and Hadoop among others, as well as KLA-Tencor, building software for reticle inspection stations and defect analysis systems. Jayant holds a bachelor’s degree in computer science from IIT Kharagpur and a master’s degree in computer engineering from San Jose State University.
Vartika Singh is a field data science architect at Cloudera. Previously, Vartika was a data scientist applying machine learning algorithms to real-world use cases ranging from clickstream to image processing. She has 12 years of experience designing and developing solutions and frameworks utilizing machine learning techniques.
Krishna Sankar is a Distinguished Engineer − Artificial Intelligence & Machine Learning at U.S. Bank focusing on augmented intelligence, digital human as well as areas like AI explainability. Earlier stints include Senior Data Scientist with Volvo Cars, Chief Data Scientist at blackarrow.tv, Data Scientist/Tata America Intl, Director of Data Science/Bioinformatics startup & as a Distinguished Engineer/Cisco. He has been speaking at various conferences incl ML tutorials at Strata SJC & LONDON 2016, Spark Summit [goo.gl/ab30lD], Strata-Sparkcamp, OSCON, Pycon & Pydata, writes about Nash Equilibrium, Isaac Asimov and Robots Rules[goo.gl/5yyRv6 as well as has been guest lecturing at the Naval Postgraduate School. His occasional blogs can be found at https://medium.com/@ksankar
They include NeurIPS2018 — Conference Summary [https://goo.gl/VgeyDT], Deep Thinking by Garry Kasparov: The Education Of A Machine [https://goo.gl/9qv671] and Ask not if AlphaZero can beat humans in Go — Ask if AlphaZero can teach humans to be a Go champion [https://goo.gl/vPzN9B]. His other passions are semantic Go engines, flying Drones (working towards Drone Pilot License (FAA UAS Pilot) and Lego Robotics – you will find him at the Detroit FLL World Competition as Robots Design Judge
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.