Schedule: Data sessions

Location: Sutton South Level: Intermediate
Tyler Bell (Factual), Leo Polovets (Factual)
Average rating: ****.
(4.00, 1 rating)
Factual creates canonical reference sets of 40 million entities from over 2.5 billion fragmentary inputs. This talk explains the Hadoop-based science of our approach combined with what we believe to be a necessary art -- the application of domain-specific knowledge -- in creating pragmatic data services. Read more.
Location: Sutton South Level: Non-technical
Average rating: ****.
(4.00, 1 rating)
You’ve collected a ton of data and your team is busily crunching numbers and coming to conclusions... but are they the right ones? You can only know with the right context and you can’t get context working in a silo. We invite you to bring the rest of the world into your data warehouse. Don’t worry, it’ll add more value than it takes and instead of working on the data, you can work on your vision. Read more.
Location: Sutton South Level: Non-technical
Moderated by:
Daniel Tunkelang (Various)
Andrew Hogue (Foursquare), Breck Baldwin (Alias-i), Evan Sandhaus (New York Times), Wlodek Zadrozny (IBM)
Average rating: ****.
(4.00, 1 rating)
Structured search improves the search experience through the identification of entities and their relationships in documents and queries. This panel will explore the current state of structured and semi-structured search, as well as exploring the open problems in an area that promises to revolutionize information seeking. Read more.
Location: Sutton South Level: Intermediate
Elizabeth Charnock (Cataphora)
Average rating: ***..
(3.50, 2 ratings)
Experts say that there is no such thing as clean data, yet every day critical decisions are being made based on electronic data. Elizabeth Charnock, author of E-Habits, will discuss how to make decisions based on digital character, and not on individual bits or bytes. Read more.
Location: Sutton South Level: Intermediate
In this session we'll discuss strategies for building agile big-data clouds that make it much faster and easier for data scientists to discover, provision and analyze data. We'll discuss where and how new technologies (both vendor and OSS) fit into this model. Read more.
Location: Murray Hill Suite A Level: Intermediate
Scott Nicholson (Poynt)
Average rating: ****.
(4.33, 3 ratings)
Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights. Read more.
Location: Sutton South Level: Intermediate
Paul Brown (Paradigm4 Inc.)
The science and commercial worlds share requirements for a high performance informatics platform to support collection, curation, collaboration, exploration, and analysis of massive datasets. SciDB is an open source analytical database that provides better analytical performance than relational databases as well as supports key features such as provenance and versioning. Read more.
Location: Murray Hill Suite A
Monica Rogati (Data Natives)
How do data infrastructure, insights and products change when your user base grows by orders of magnitude? Read more.
Location: Sutton North Level: Intermediate
Chris van der Walt (United Nations Global Pulse), Dane Petersen (Adaptive Path), Sara Farmer (UN Global Pulse)
Average rating: *****
(5.00, 1 rating)
United Nations Global Pulse and Adaptive Path have been collaborating on a new global crisis impact tool called HunchWorks that allows experts to post hypotheses about emerging crises and crowd source verification. The presentation will focus on lessons learned from a complex project that combines human expertise and big data algorithms using human-centered design and assistive intelligence. Read more.
Location: Murray Hill Suite A
Peter Sirota (Amazon Web Services), Justin Moore (Facebook)
This session will address specific use cases relevant to customers with big data needs. We will highlight customers already successfully utilizing this service as well as showcase top scenarios and explain why it makes sense to leverage the cloud for Big Data needs. Read more.
Location: Sutton South Level: Intermediate
Ben Gimpert (Altos Research)
All big data models are wrong but some are useful, as George Box might have said. Models are not the end result of a big data architecture, but exploratory tools in their own right. They are most useful when data scientists try to understand the business, and when our users learn a bit about data. How can the actual process of modeling improve a big data system, and teach the organization? Read more.
Location: Murray Hill Suite A Level: Intermediate
Ryan Boyd (Neo4j), Chris Schalk (Google)
Average rating: **...
(2.00, 1 rating)
Google is a Data business: over the past few years, many of the tools Google created to store, query, analyze, visualize its data, have been exposed to developers as services. This talk will give you an overview of Google services for Data Crunchers. Read more.
Location: Sutton South Level: Non-technical
Justin Moore (Facebook)
With over 700 million check-ins, 10 million nodes in the social graph, and billions of cumulative signals, Justin Moore will be explaining how foursquare processes, analyzes, and builds products to help people explore the real world. Read more.
Location: Sutton North Level: Intermediate
Dwight Merriman (10gen)
Average rating: **...
(2.00, 1 rating)
This session will introduce the history and philosophy of MongoDB. We'll also review a few key use cases for NoSQL and MongoDB in particular. Read more.
Location: Sutton South Level: Non-technical
Ken Farmer (ProtectWise)
While most of the focus in data science is on the rapid analysis of vast volumes of data, the hardest part of most solutions is the data acquisition, movement, transformation, and loading - the "data logistics". This presentation will describe the common challenges and solutions - including the best and worst practices that can be reused from Data Warehousing. Read more.


  • Aster Data
  • EMC Greenplum
  • GE
  • Lexis Nexis
  • MarkLogic
  • Tableau Software
  • Cloudera
  • DataStax
  • Informatica
  • DataSift
  • Splunk
  • Amazon Web Services
  • Datameer
  • Impetus
  • Karmasphere
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Sybase
  • Xeround
  • Media-Science
  • Platfora

Sponsorship Opportunities

For information on sponsorship opportunities at the conference, contact Susan Stewart at

Press & Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata Contacts