1. Download Spark
2. Install Spark on Mac
3. Install Spark on Windows
4. Run spark-shell (You may see java exceptions related to winutils binary. It can be ignored.)
5. Install Git
Vartika Singh and Jayant Shekhar walk you through techniques for building and tuning machine-learning apps using Spark MLlib and Spark ML Pipelines and graph processing with GraphX. Vartika and Jayant cover regression, classification, clustering, GraphX, and deep learning algorithms in the Spark MLlib, ML, and GraphX libraries as well as the nuances of feature extraction, parameter tuning, statistical analysis for optimization and dimensionality reduction as it applies to these algorithms, and the use cases addressed therein. Using hands-on coding, you’ll learn to solve various problems using the mentioned algorithms and techniques.
Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.
Jayant Shekhar is the founder of Sparkflows Inc., which enables machine learning on large datasets using Spark ML and intelligent workflows. Jayant focuses on Spark, streaming, and machine learning and is a contributor to Spark. Previously, Jayant was a principal solutions architect at Cloudera working with companies both large and small in various verticals on big data use cases, architecture, algorithms, and deployments. Prior to Cloudera, Jayant worked at Yahoo, where he was instrumental in building out the large-scale content/listings platform using Hadoop and big data technologies. Jayant also worked at eBay, building out a new shopping platform, K2, using Nutch and Hadoop among others, as well as KLA-Tencor, building software for reticle inspection stations and defect analysis systems. Jayant holds a bachelor’s degree in computer science from IIT Kharagpur and a master’s degree in computer engineering from San Jose State University.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.