Some experience coding in Python, SQL, Java, or Scala, plus some familiarity with Big Data issues/concepts.
What’s required for a laptop to use in the tutorial?
NB: do not install Spark with Homebrew or Cygwin
We will provide USB sticks with the necessary data+code. To save time, if people participating in the tutorial want to download in advance, the USB contents are here
Also, please see the Apache Spark developer certification exam being held at Strata on Fri Feb 20: http://www.oreilly.com/go/sparkcert
Spark Camp, organized by the creators of the Apache Spark project at Databricks, will be a day long hands-on introduction to the Spark platform including Spark Core, the Spark Shell, Spark Streaming, Spark SQL, MLlib, GraphX, and more. We will start with an overview of use cases and demonstrate writing simple Spark applications. We will cover each of the main components of the Spark stack via a series of technical talks targeted at developers that are new to Spark. Intermixed with the talks will be periods of hands-on lab work. Attendees will download and use Spark on their own laptops, as well as learn how to configure and deploy Spark in distributed big data environments including common Hadoop distributions and Mesos.
Developer Certification for Apache Spark
O’Reilly has partnered with Databricks, creators of Spark, to offer the Developer Certification for Apache Spark. The next Spark certification exam takes place at Strata + Hadooop World in San Jose on Friday, February 20. Learn more.
O’Reilly author (Just Enough Math and Enterprise Data Workflows with Cascading) and a “player/coach” who’s led innovative Data teams building large-scale apps. Director of Community Evangelism for Apache Spark with Databricks, advisor to Amplify Partners . Expert in machine learning, cluster computing, and Enterprise use cases for Big Data. Interests: Spark, Ag+Data, Open Data, Mesos, PMML, Cascalog, Scalding, Clojure, Python, Chatbots, NLP.
Holden Karau is a software development engineer at Databricks and is active in open source. She the author of a book on Spark and has assisted with Spark workshops. Prior to Databricks she worked on a variety of search and classification problems at Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science.
Krishna Sankar is a consulting data scientist working on retail analytics, social media data science, and forays into deep learning, as well as codeveloping the DeepLearnR package interfacing R over TensorFlow/Skflow. Previously, Krishna was a chief data scientist at Blackarrow.tv, where he focused on optimizing user experience via inference, intelligence, and interfaces. Earlier stints include principal architect/data scientist at Tata America Intl., director of data science at a bioinformatics startup, and distinguished engineer at Cisco. He is a frequent speaker at conferences, including Spark Summit, Spark Camp, OSCON, PyCon, and PyData, on topics such as predicting NFL winners, Spark, data science, machine learning, and social media analysis, as well as a guest lecturer at the Naval Postgraduate School. Krishna’s occasional blogs can be found at Doubleclix.wordpress.com. His other passion is Lego robotics. You will find him at the St. Louis First Lego League World Competition as a robot design judge.
Consulting professor at Stanford within ICME, conducting research and teaching courses targeting doctorate students. Technical Advisor at Databricks. I focus on Discrete Applied Mathematics, Machine Learning Theory and Applications, and Large-Scale Distributed Computing.
Denny Lee is a Principal Program Manager at Microsoft. He is a hands-on distributed systems and data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. His key focuses surround solving complex large scale data problems – providing not only architectural direction but the hands-on implementation of these systems.
He has extensive experience in building greenfield teams as well as turn around / change catalyst. Prior to joining Azure DocumentDB, Denny worked as a Technology Evangelist at Databricks, Senior Director of Data Sciences Engineering at Concur, and was part of the incubation team that built Hadoop on Windows and Azure (currently known as Microsoft HDInsight).
Chris Fregly is a Research Scientist at PipelineIO – a Machine Learning and Artificial Intelligence Startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, Founder of Advanced Spark and TensorFlow Meetup, and Author of the upcoming O’Reilly Video Series and Online Training, “High Performance Distributed Tensorflow in Production: Hands-on Experience Training and Serving Tensorflow AI Models”
Previously, Chris was a Distributed Systems Engineer at Netflix, Data Solutions Engineer at Databricks, and a Founding Member of the IBM Spark Technology Center in San Francisco.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.