Schedule: Nerdcore sessions

Location: Thames Suite Level: Intermediate
Ben Smith (InReach Ventures)
Average rating: ***..
(3.00, 1 rating)
A practical step-by-step description of how the LAMP based Top10 Alpha was turned into fully data-driven product. Based around a real-time data processing pipeline and asynchronous stack, Top10's infrastructure now hinges on AKKA, along with Scala, Nodejs and a host of other technologies. This has enabled interesting uses of the data and new, exciting user-facing features. Read more.
Location: Thames Suite Level: Intermediate
Noel Welsh (Underscore Consulting)
Average rating: ***..
(3.57, 7 ratings)
Big data often doesn't sit well with companies that want to move fast. Technologies like Hadoop can be expensive to setup, slow to produce results, and time consuming to maintain. Streaming algorithms provide an alternative. They are simple to implement, very efficient, and give real-time results. In this talk I will describe several key streaming algorithms, and give examples of their use. Read more.
Location: Thames Suite Level: Intermediate
Tags: 20min
Edmund Jackson (Cambridge Data Science)
Average rating: *****
(5.00, 2 ratings)
Data Science projects are difficult to realise as they require both mathematical and IT abstractions at once. We need databases, linear algebra, message queues... all at once. Traditional environments like Java/C#/Matlab/Mathematica provide only one. I will talk about the new language, Clojure, provides all the platform power of the JVM, as well as the language and libraries to do data science. Read more.
Location: Thames Suite Level: Intermediate
Average rating: ****.
(4.50, 2 ratings)
Logic programming recently gained new interest with people processing large data volumes with Hadoop. This talk demonstrates the basic concepts by using Cascalog. Read more.
Location: Thames Suite Level: Intermediate
Amund Tveit (Atbrox)
Average rating: **...
(2.60, 5 ratings)
This presentation will give an overview of mapreduce-based algorithms described in recent papers written by academic and industrial researchers. Included areas: AI/Machine Learning, Bioinformatics, Information Retrieval. Focus will be on patterns of problems and the corresponding mapreduce solution patterns. Some background material: Read more.
Location: Thames Suite Level: Advanced
Paolo Castagna (Cloudera)
Average rating: ****.
(4.00, 1 rating)
As open data and linked data communities grow, so do the number and average size of freely available datasets. Often these datasets are modelled and interlinked using RDF. This talk shares tips and tricks, use cases and practical examples of how to effectively use tools from the Hadoop ecosystem to process large RDF datasets. Read more.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts.