Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Deep learning and natural language processing with Spark

Andy Petrella (Kensu), Melanie Warrick (Google)
11:15–11:55 Thursday, 2/06/2016
Data science & advanced analytics
Location: Capital Suite 8/9 Level: Advanced
Average rating: ***..
(3.35, 17 ratings)

Prerequisite knowledge

Attendees should be proficient in probability, statistics, and algebra as well as programming and familiar with distributed computing techniques.


Deep learning is taking data science by storm. Unfortunately, most existing solutions aren’t particularly scalable. Andy Petrella and Melanie Warrick show how to implement a Spark­-ready version of the long short­-term memory (LSTM) neural network, widely used in the hardest natural language processing and understanding problems, such as automatic summarization, machine translation, question answering, and discourse. Andy and Melanie then demo an LSTM network with interactive, real­-time visualizations using the Spark Notebook and Spark Streaming.

Photo of Andy Petrella

Andy Petrella


Andy Petrella is a mathematician turned distributed computing entrepreneur. Besides being a Scala/Spark trainer, Andy participated in many projects built using Spark, Cassandra, and other distributed technologies in various fields including geospatial analysis, the IoT, and automotive and smart cities projects. Andy is the creator of the Spark Notebook, the only reactive and fully Scala notebook for Apache Spark. In 2015, Andy cofounded Data Fellas with Xavier Tordoir around their product the Agile Data Science Toolkit, which facilitates the productization of data science projects and guarantees their maintainability and sustainability over time. Andy is also member of the program committee for the O’Reilly Strata, Scala eXchange, Data Science eXchange, and Devoxx events.

Photo of Melanie Warrick

Melanie Warrick


Melanie Warrick is a senior developer advocate at Google with a passion for machine learning problems at scale. Melanie’s previous experience includes work as a founding engineer on Deeplearning4j and as a data scientist and engineer at