Apache Spark: Ask Us Anything

Paco Nathan (O'Reilly Media), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Hossein Falaki (Databricks Inc.), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Hadoop & Beyond
Location: 127-128
Average rating: ****.
(4.00, 2 ratings)

Join the Spark Team for an informal question and answer session.

Photo of Paco Nathan

Paco Nathan

O'Reilly Media

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Aaron Davidson


Aaron Davidson is an Apache Spark committer and software engineer at Databricks. He has implemented Spark standalone cluster fault tolerance and shuffle file consolidation, and has helped in the design, implementation, and testing of Spark’s external sorting and driver fault tolerance.

Photo of Sameer Farooqui

Sameer Farooqui


Sameer Farooqui is a client services engineer at Databricks, where he works with customers on Apache Spark deployments. Sameer works with the Hadoop ecosystem, Cassandra, Couchbase, and general NoSQL domain. Prior to Databricks, he worked as a freelance big data consultant and trainer globally and taught big data courses. Before that, Sameer was a systems architect at Hortonworks, an emerging data platforms consultant at Accenture R&D, and an enterprise consultant for Symantec/Veritas (specializing in VCS, VVR, and SF-HA).

Photo of Hossein Falaki

Hossein Falaki

Databricks Inc.

Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).

Photo of Alex Sicoe

Alex Sicoe


Alex Sicoe recently joined Elsevier as a software developer within the company’s big data analytics platform team. Previously he worked as an engineer with Big Data Partnership
working with clients on projects involving Apache Spark, Apache Cassandra, Apache Storm, Apache Hadoop. He has extensive experience building data pipelines involving such systems as well as giving training courses on them. He also worked at CERN on building a large scale monitoring system for the ATLAS experiment on top of Apache Cassandra.

Photo of Olivier Girardot

Olivier Girardot

Lateral Thoughts

Olivier Girardot is a software engineer and co-founder of Lateral Thoughts working on Machine Learning, Big Data and DevOps solutions with clients to help them tackle problems that require both expertise and experience. In order to become more efficient both as a company and as a team.

Comments on this page are now closed.


Picture of Paco Nathan
Paco Nathan
8-03-2015 6:58 CET

Hi Avusherla,

Best to direct these kinds of questions to <user@spark.apache.org> email list, where there are many people who can join the discussion: http://spark.apache.org/community.html

Avusherla Bharath
8-03-2015 4:51 CET

I have a question regarding SPARK. Few days back I have tried SPARK on my system and it is working fine. Now I want to install SPARK cluster on Hadoop Multinode Cluster. So do i need to install SPARK on each slave node where Hadoop slave is present. How do i install it on Hadoop multinode cluster.