Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Spark Camp: Ask Us Anything

Paco Nathan (derwen.ai), Matei Zaharia (Databricks), michael dddd (Databricks), Reynold Xin (Databricks), Holden Karau (Google), Reza Zadeh (Matroid | Stanford), Sameer Farooqui (Databricks), Denny Guang-yeu Lee (Microsoft), Chris Fregly (PipelineAI)
2:20pm–3:00pm Friday, 02/20/2015
Ask Us Anything
Location: 211 B

Join the Spark team for an informal question and answer session. Several of the Spark committers, trainers, etc., from Databricks will be on hand to field a wide range of detailed questions. Even if you don’t have a specific question, join in to hear what others are asking.

Photo of Paco Nathan

Paco Nathan

derwen.ai

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.

Photo of Matei Zaharia

Matei Zaharia

Databricks

Matei Zaharia started the Spark project at UC Berkeley and is currently CTO of Databricks. He serves as Spark’s vice president at Apache. In spring 2015, he is also beginning an assistant professor position at MIT.

Photo of michael dddd

michael dddd

Databricks

Michael Armbrust is the lead developer of the Spark SQL and Structured Streaming projects at Databricks. Michael’s interests broadly include distributed systems, large-scale structured storage, and query optimization. Michael holds a PhD from UC Berkeley, where his thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence.

Photo of Reynold Xin

Reynold Xin

Databricks

Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.

Photo of Holden Karau

Holden Karau

Google

Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Photo of Reza Zadeh

Reza Zadeh

Matroid | Stanford

Reza Bosagh Zadeh is founder and CEO at Matroid and an adjunct professor at Stanford University, where he teaches two PhD-level classes: Distributed Algorithms and Optimization and Discrete Mathematics and Algorithms. His work focuses on machine learning, distributed computing, and discrete applied mathematics. His awards include a KDD best paper award and the Gene Golub Outstanding Thesis Award. Reza has served on the technical advisory boards of Microsoft and Databricks. He is the initial creator of the linear algebra package in Apache Spark. Through Apache Spark, Reza’s work has been incorporated into industrial and academic cluster computing environments. Reza holds a PhD in computational mathematics from Stanford, where he worked under the supervision of Gunnar Carlsson. As part of his research, Reza built the machine learning algorithms behind Twitter’s who-to-follow system, the first product to use machine learning at Twitter.

Photo of Sameer Farooqui

Sameer Farooqui

Databricks

Sameer Farooqui is a client services engineer at Databricks, where he works with customers on Apache Spark deployments. Sameer works with the Hadoop ecosystem, Cassandra, Couchbase, and general NoSQL domain. Prior to Databricks, he worked as a freelance big data consultant and trainer globally and taught big data courses. Before that, Sameer was a systems architect at Hortonworks, an emerging data platforms consultant at Accenture R&D, and an enterprise consultant for Symantec/Veritas (specializing in VCS, VVR, and SF-HA).

Photo of Denny Guang-yeu Lee

Denny Guang-yeu Lee

Microsoft

Denny Lee is a Principal Program Manager at Microsoft. He is a hands-on distributed systems and data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. His key focuses surround solving complex large scale data problems – providing not only architectural direction but the hands-on implementation of these systems.

He has extensive experience in building greenfield teams as well as turn around / change catalyst. Prior to joining Azure DocumentDB, Denny worked as a Technology Evangelist at Databricks, Senior Director of Data Sciences Engineering at Concur, and was part of the incubation team that built Hadoop on Windows and Azure (currently known as Microsoft HDInsight).

Photo of Chris Fregly

Chris Fregly

PipelineAI

Chris Fregly is founder and research engineer at PipelineAI, a San Francisco-based streaming machine learning and artificial intelligence startup. Previously, Chris was a distributed systems engineer at Netflix, a data solutions engineer at Databricks, and a founding member of the IBM Spark Technology Center in San Francisco. Chris is a regular speaker at conferences and meetups throughout the world. He’s also an Apache Spark contributor, a Netflix Open Source committer, founder of the Global Advanced Spark and TensorFlow meetup, author of the upcoming book Advanced Spark, and creator of the O’Reilly video series Deploying and Scaling Distributed TensorFlow in Production.