Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY
 
1 E 07/1 E 08
11:20am Parallel SQL and analytics with Solr Yonik Seeley (Cloudera)
1:15pm JupyterLab: The evolution of the Jupyter Notebook Brian Granger (Cal Poly San Luis Obispo), Sylvain Corlay (QuantStack), Jason Grout (Bloomberg LP)
2:55pm The future of column-oriented data processing with Arrow and Parquet Julien Le Dem (WeWork), Jacques Nadeau (Dremio)
1 E 10/1 E11
11:20am Creating and evaluating a distance measure Melissa Santos (Big Cartel)
1:15pm Holographic data visualizations: Welcome to the real world Brad Sarsfield (Microsoft HoloLens)
2:05pm A unified ecosystem for market data visualization Janaki Parameswaran (FINRA), Kishore Ramachandran (FINRA)
1 E 12/1 E 13
11:20am Powering real-time analytics on Xfinity using Kudu Sridhar Alla (BlueWhale), Kiran Muglurmath (Comcast)
2:55pm Triggers in Apache Beam (incubating) Kenneth Knowles (Google)
5:25pm Fast cars, big data: How streaming data can help Formula 1 Ted Dunning (MapR, now part of HPE)
1 E 15/1 E 16
1:15pm Where's the puck headed? Michael Dauber (Amplify Partners), Shivon Zilis (Bloomberg Beta), Sarah Guo (Greylock), Matt Witheiler, Sam Pullara (Sutter Hill Ventures)
2:05pm The insight-driven business Brian Hopkins (Forrester Research)
2:55pm A data-first approach to drive real-time applications Jack Norris (MapR Technologies)
5:25pm What Crimean War gunboats teach us about the need for schema registries Alexander Dean (Snowplow Analytics Ltd)
3D 12
11:20am Using graph databases to operationalize insights from big data Emil Eifrem (Neo Technology), Tim Williamson (Monsanto )
1:15pm Big data in healthcare Taposh Roy (Kaiser Permanente), Rajiv Synghal (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente)
2:55pm Architecting for change: LinkedIn's new data ecosystem Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)
4:35pm Big data architectural patterns and best practices on AWS Siva Raghupathy (Amazon Web Services)
5:25pm BI and SQL analytics with Hadoop in the cloud Henry Robinson (Cloudera), Justin Erickson (Cloudera)
River Pavilion
11:20am Adventures from the frontlines of data for good JeanCarlo Bonilla (DataKind), Susan Sun (DataKind), Caitlin Augustin (DataKind)
1:15pm Building data lakes in the cloud alex bordei (Bigstep )
2:05pm Big data, big decisions: Key legal considerations for the collection and use of big data Kristi Wolff (Kelley Drye & Warren LLP), Crystal Skelton (Kelley Drye & Warren LLP)
4:35pm The personalization spectrum Sara Watson (Digital Asia Hub)
3D 10
11:20am File format benchmark: Avro, JSON, ORC, and Parquet Owen O'Malley (Cloudera)
1:15pm Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera)
2:05pm Elastic data services on Mesos via Mesosphere’s DC/OS Adam Bordelon (Mesosphere), Mohit Soni (Mesosphere)
2:55pm Debunking HDFS erasure coding performance myths Zhe Zhang (LinkedIn), Uma Maheswara Rao G (Intel)
4:35pm Using parallel graph-processing libraries for cancer genomics Crystal Valentine (MapR Technologies)
5:25pm Unlocking unstructured text data with summarization Mike Lee Williams (Cloudera Fast Forward Labs)
3D 08
2:05pm Creating real-time, data-centric applications with Impala and Kudu Marcel Kornacker (Cloudera), Todd Lipcon (Cloudera)
2:55pm How a Spark-based feature store can accelerate big data adoption in financial services Kaushik Deka (Novantas), Phil Jarymiszyn (Novantas)
Hall 1C
11:20am Why should I trust you? Explaining the predictions of machine-learning models Carlos Guestrin (Apple | University of Washington )
2:55pm How the Washington Post uses machine learning to predict article popularity Eui-Hong Han (The Washington Post), Shuguang Wang (The Washington Post)
4:35pm Conditional recurrent neural nets, generative AI Twitter bots, and DL4J Josh Patterson (Patterson Consulting), Dave Kale (Skymind)
Hall 1B
11:20am The state of Spark and what's next after Spark 2.0 Ram Sriharsha (Databricks)
1:15pm Top five mistakes when writing Spark applications Ted Malaska (Capital One), Mark Grover (Lyft)
2:05pm Tuning Spark machine-learning workloads Raj Krishnamurthy (IBM)
2:55pm Big data processing with Hadoop and Spark, the Uber way Praveen Murugesan (Uber Technologies Inc)
4:35pm Delivering near real-time mobility insights at Swisscom Francois Garillot (Swisscom)
1 C04 / 1 C05
2:05pm Citi, Standard Charter Bank, and Polaris: The modern information pipeline that fuels investigations of money laundering, fraud, and human trafficking Shankar Ganapathy (Paxata), Mark Nelson (Standard Chartered Bank), Veronica Liwak (Polaris )
2:55pm Data science for executives Jeremy Achin (DataRobot), Tom de Godoy (DataRobot)
4:35pm Beyond the numbers: Expanding the size of your analytic discovery team Edd Wilder-James (Google), Maksim Pecherskiy (City of San Diego), Robert Stratton (Neustar), Chris Kakkanatt (Pfizer)
5:25pm Making real-time analytics on the data lake a reality Amit Vij (Kinetica), Mark Brooks (Kinetica DB, Inc.)
1 E 09
11:20am The keys to an event-based microservices application Crystal Valentine (MapR Technologies)
1:15pm Five ways to modernize your BI tools and make them work on more data Scott Anderson (ClearStory Data), Andrew Yeung (ClearStory Data)
2:05pm Building a modern data architecture Ben Sharma (Zaloni)
5:25pm Why is this disruption different from all other disruptions? Hadoop as a game changer in financial services Matt Turck (FirstMark Capital), Einat Burshtine (Credit Suisse), Shui Cheung yip (Pershing LLC (Bank of New York Mellon)), Alasdair Anderson (Nordea)
1 E 14
11:20am Big data meets the IoT Cheryl Wiebe (Think Big, a Teradata Company)
2:05pm Trusted IoT and big data ecosystems Reiner Kappenberger (HPE Security–Data Security)
1B 01/02
2:05pm Filling the data lake Chuck Yarbrough (Pentaho)
1B 03/04
1:15pm A new “Sparkitecture” for modernizing your data warehouse Jack Gudenkauf (Hewlett Packard Enterprise)
2:05pm Top data wrangling use cases in enterprise analytics Connor Carreras (Trifacta)
2:55pm From data to insights using analytics Johan Bjerke (Splunk Inc)
4:35pm Gaining extreme agility and performance using a Spark-free approach to data management Jake Dolezal (McKnight Consulting Group Global Services)
7:05pm Dinner | Room: On Your Own
Javits North
8:50am Wednesday keynotes Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
9:00am The new dynamics of big data Mike Olson (Cloudera)
9:15am Decision 2016: What is your data platform? Jack Norris (MapR Technologies)
9:25am US venture: Risk, values, founder outcomes Susan Woodward (Sand Hill Econometrics)
9:50am The art and science of serendipity Pagan Kennedy (Inventology: How We Dream Up Things That Change the World)
10:05am Modern analytics with Dell EMC Patricia Florissi (Dell EMC)
10:10am Transforming healthcare through precision data science: Myths and facts Sriram Vishwanath (Accordion Health Inc. | University of Texas, Austin)
10:20am The trouble with polls Jill Lepore (Harvard University | The New Yorker)
10:50am Morning Break sponsored by Intel | Room: Hall 3E
3:35pm Afternoon Break sponsored by Teradata | Room: Hall 3E
6:05pm Booth Crawl | Room: Hall 3E
7:30am Coffee Break sponsored by Basho | Room: Break
12:00pm Lunch sponsored by MapR Wednesday BoF Tables | Room: Hall 3 A/B
6:30am Data Dash | Room: Hudson River Park, Pier 84
8:00pm Data After Dark: Aboard the City at Sea (sponsored by Cisco, Cloudera, and ThoughtSpot) | Room: Intrepid Sea, Air and Space Museum, Pier 86, W 46th St & 12th Ave, New York, NY 10036
11:20am-12:00pm (40m) Data innovations
Parallel SQL and analytics with Solr
Yonik Seeley (Cloudera)
Yonik Seeley explores recent Apache Solr features in the areas of faceting and analytics, including parallel SQL, streaming expressions, distributed join, and distributed graph queries, as well as the trade-offs of different approaches and strategies for maximizing scalability.
1:15pm-1:55pm (40m) Data innovations
JupyterLab: The evolution of the Jupyter Notebook
Brian Granger (Cal Poly San Luis Obispo), Sylvain Corlay (QuantStack), Jason Grout (Bloomberg LP)
Brian Granger, Sylvain Corlay, and Jason Grout offer an overview of JupyterLab, the next-generation user interface for Project Jupyter that puts Jupyter Notebooks within a powerful user interface that allows the building blocks of interactive computing to be assembled to support a wide range of interactive workflows used in data science.
2:05pm-2:45pm (40m) Data innovations
Designing a location intelligence platform for everyone by integrating data, analysis, and cartography
Stuart Lynn (CartoDB), Andy Eschbacher (CARTO)
Geospatial analysis can provide deep insights into many datasets. Unfortunately the key tools to unlocking these insights—geospatial statistics, machine learning, and meaningful cartography—remain inaccessible to nontechnical audiences. Stuart Lynn and Andy Eschbacher explore the design challenges in making these tools accessible and integrated in an intuitive location intelligence platform.
2:55pm-3:35pm (40m) Data innovations
The future of column-oriented data processing with Arrow and Parquet
Julien Le Dem (WeWork), Jacques Nadeau (Dremio)
In pursuit of speed, big data is evolving toward columnar execution. The solid foundation laid by Arrow and Parquet for a shared columnar representation across the ecosystem promises a great future. Julien Le Dem and Jacques Nadeau discuss the future of columnar and the hardware trends it takes advantage of, like RDMA, SSDs, and nonvolatile memory.
4:35pm-5:15pm (40m) Data innovations
Beyond Hadoop at Yahoo: Interactive analytics with Druid
Himanshu Gupta (Yahoo)
Himanshu Gupta explains why Yahoo has been increasingly investing in interactive analytics and how it leverages Druid to power a variety of internal- and external-facing data applications.
5:25pm-6:05pm (40m) Data innovations
The Netflix data platform: Now and in the future
Kurt Brown (Netflix)
The Netflix data platform is constantly evolving, but fundamentally it's an all-cloud platform at a massive scale (40+ PB and over 700 billion new events per day) focused on empowering developers. Kurt Brown dives into the current technology landscape at Netflix and offers some thoughts on what the future holds.
11:20am-12:00pm (40m) Data-driven business
Creating and evaluating a distance measure
Melissa Santos (Big Cartel)
Whether we're talking about spam emails, merging records, or investigating clusters, there are many times when having a measure of how alike things are makes them easier to work with (e.g., with unstructured data that isn't incorporated into your data models). Melissa Santos offers a practical approach to creating a distance metric and validating with business owners that it provides value.
1:15pm-1:55pm (40m) Visualization & user experience
Holographic data visualizations: Welcome to the real world
Brad Sarsfield (Microsoft HoloLens)
Data visualizations using interactive holograms help us make smarter decisions and explore ideas faster by inspecting every vantage point of our data and interacting with it in new, more personal and human ways. There are new rules for the new world. Join Brad Sarsfield as he explores and experiments with the possibilities of the next generation of data visualization experiences.
2:05pm-2:45pm (40m) Enterprise adoption
A unified ecosystem for market data visualization
Janaki Parameswaran (FINRA), Kishore Ramachandran (FINRA)
FINRA ingests over 50 billion records of stock market trading data daily into multipetabyte databases. Janaki Parameswaran and Kishore Ramachandran explain how FINRA technology integrates data feeds from disparate systems to provide analytics and visuals for regulating equities, options, and fixed-income markets.
2:55pm-3:35pm (40m) Visualization & user experience
The devil is in the details: Interactive, multiscale visualization of data lineage
Sean Kandel (Trifacta)
Traditional ways of visualizing data lineage provide static mapping source datasets to various targets or outputs. As the breadth of analysis occurring in schema-on-read environments increases, tracking how elements of the data were derived is critical. Sean Kandel introduces a new way to visualize data lineage allowing stakeholders a transparent view into their data.
4:35pm-5:15pm (40m) Visualization & user experience
What ties to what? Visualizing large-scale customer text data with bipartite graphs
Mark Turner (Teradata)
Which suppliers are most likely to have delivery or quality issues? Does service, product placement, or price make the biggest difference in customer sentiment? Text data from sources like email and social media can give answers. Mark Turner explains how to see the associations between any two variables in text data by combining text analytics and the bipartite graph visualization technique.
5:25pm-6:05pm (40m) Visualization & user experience
Investigating event graphs at scale: Going from theory to practice
Leo Meyerovich (Graphistry)
Visual analysis is changing in the era of GPU clusters. Now that scale compute is easier, the bottleneck is mapping data to visualizations and intelligently interacting with them. Using datasets uploaded to Graphistry, Leo Meyerovich provides a glimpse into the emerging workflows for graph and linked event analysis and offers common tricks for success.
11:20am-12:00pm (40m) IoT & real-time
Powering real-time analytics on Xfinity using Kudu
Sridhar Alla (BlueWhale), Kiran Muglurmath (Comcast)
Sridhar Alla and Kiran Muglurmath explain how real-time analytics on Comcast Xfinity set-top boxes (STBs) help drive several customer-facing and internal data-science-oriented applications and how Comcast uses Kudu to fill the gaps in batch and real-time storage and computation needs, allowing Comcast to process the high-speed data without the elaborate solutions needed till now.
1:15pm-1:55pm (40m) IoT & real-time
Apache Kafka: The rise of real-time data and stream processing
Neha Narkhede (Confluent)
Neha Narkhede explains how Apache Kafka serves as a foundation to streaming data applications that consume and process real-time data streams and introduces Kafka Connect, a system for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library. Neha also describes the lessons companies like LinkedIn learned building massive streaming data architectures.
2:05pm-2:45pm (40m) IoT & real-time
Watermarks: Time and progress in Apache Beam (incubating) and beyond
Slava Chernyak (Google)
Watermarks are a system for measuring progress and completeness in out-of-order streaming systems and are utilized to emit correct results in a timely manner. Given the trend toward out-of-order processing in existing streaming systems, watermarks are an increasingly important tool when designing streaming pipelines. Slava Chernyak explains watermarks and explores real-world applications.
2:55pm-3:35pm (40m) IoT & real-time
Triggers in Apache Beam (incubating)
Kenneth Knowles (Google)
Triggers specify when a stage of computation should emit output. With a small language of primitive conditions, triggers provide the flexibility to tailor a streaming pipeline to a variety of use cases and data sources. Kenneth Knowles delves into the details of language- and runner-independent semantics for triggers in Apache Beam and explores real-world implementations in Google Cloud Dataflow.
4:35pm-5:15pm (40m) IoT & real-time
Analytics for large-scale time series and event data
Ira Cohen (Anodot)
Time series and event data form the basis for real-time insights about the performance of businesses such as ecommerce, the IoT, and web services, but gaining these insights involves designing a learning system that scales to millions and billions of data streams. Ira Cohen outlines a system that performs real-time machine learning and analytics on streams at massive scale.
5:25pm-6:05pm (40m) IoT & real-time
Fast cars, big data: How streaming data can help Formula 1
Ted Dunning (MapR, now part of HPE)
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their fair share. Ted Dunning presents a demo of how data streaming can be applied to the analytics problems posed by modern motorsports. Although he won't be bringing Formula 1 cars to the talk, Ted demonstrates a physics-based simulator to analyze realistic data from simulated cars.
11:20am-12:00pm (40m) Data-driven business
Making on-demand grocery delivery profitable with data science
Jeremy Stanley (Instacart)
Fifteen years ago, Webvan spectacularly failed to bring grocery delivery online. Speculation has been high that the current wave of on-demand grocery delivery startups will meet similar fates. Jeremy Stanley explains why this time the story will be different—data science is the key.
1:15pm-1:55pm (40m) Data-driven business
Where's the puck headed?
Michael Dauber (Amplify Partners), Shivon Zilis (Bloomberg Beta), Sarah Guo (Greylock), Matt Witheiler, Sam Pullara (Sutter Hill Ventures)
In a panel discussion, top-tier VCs look over the horizon and consider the big trends in big data, explaining what they think the field will look like a few years (or more) down the road. Join us to hear about the trends that everyone is seeing and areas for investment that they find exciting.
2:05pm-2:45pm (40m) Data-driven business
The insight-driven business
Brian Hopkins (Forrester Research)
Uber, Netflix, LinkedIn, Tesla, Stitch Fix, Earnest—the list of digital disruptors using data to steal customers grows every month. But is it just that these firms are data driven? Is because they have smart data scientists and Hadoop? The secret to their success is that these firms go further in order to be insight driven. Brian Hopkins explains what they're doing and how to join them.
2:55pm-3:35pm (40m) Data-driven business
A data-first approach to drive real-time applications
Jack Norris (MapR Technologies)
Leading companies that are getting the most out of their data are not focusing on queries and data lakes; they are actively integrating analytics into their operations. Jack Norris reviews three customer case studies in ad/media, financial services, and healthcare to show how a focus on real-time data streams can transform the development, deployment, and future agility of applications.
4:35pm-5:15pm (40m) Data-driven business
Winning with data: How ThredUp, Twilio, and Warby Parker use data to build advantage
Daniel Mintz (Looker)
Daniel Mintz dives into case studies from three companies—ThredUp, Twilio, and Warby Parker—that use data to generate sustainable competitive advantages in their industries.
5:25pm-6:05pm (40m) Data-driven business
What Crimean War gunboats teach us about the need for schema registries
Alexander Dean (Snowplow Analytics Ltd)
In 1853, Britain’s workshops built 90 new gunboats for the Royal Navy in just 90 days—an astonishing feat of engineering made possible by industrial standardization. Snowplow's Alexander Dean argues that data-sophisticated corporations need a new standardization of their own, in the form of schema registries like Confluent Schema Registry or Snowplow’s own Iglu.
11:20am-12:00pm (40m) Enterprise adoption
Using graph databases to operationalize insights from big data
Emil Eifrem (Neo Technology), Tim Williamson (Monsanto )
Tim Williamson and Emil Eifrem explain how organizations can use graph databases to operationalize insights from big data, drawing on the real-life example of Monsanto’s use of graph databases to conduct real-time graph analysis of the company’s data to transform the business in ways that were previously impossible.
1:15pm-1:55pm (40m) Enterprise adoption
Big data in healthcare
Taposh Roy (Kaiser Permanente), Rajiv Synghal (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente)
While other industries have embraced the digital era, healthcare is still playing catch-up. Kaiser Permanente has been a leader in healthcare technology and first started using computing to improve healthcare results in the 1960s. Taposh Roy, Rajiv Synghal, and Sabrina Dahlgren offer an overview of Kaiser’s big data strategy and explain how other organizations can adopt similar strategies.
2:05pm-2:45pm (40m) Enterprise adoption
Swipe, dip, and hover: Managing card payment data at Visa
Nandu Jayakumar (Oracle)
Visa, the world’s largest electronic payments network, is transforming the way it manages data: database appliances are giving way to Hadoop and HBase; proprietary ETL technologies are being replaced by Spark; and enterprise warehouse data models will be complemented by flexible data schemas. Nandu Jayakumar explores the adoption of big data practices at a conservative, financial enterprise.
2:55pm-3:35pm (40m) Data-driven business
Architecting for change: LinkedIn's new data ecosystem
Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes, such as client-side activity tracking, a unified reporting platform, and data virtualization techniques to simplify migration, that enable LinkedIn to roll out future product innovations with minimal downstream impact.
4:35pm-5:15pm (40m) Data innovations
Big data architectural patterns and best practices on AWS
Siva Raghupathy (Amazon Web Services)
Siva Raghupathy demonstrates how to use Hadoop innovations in conjunction with Amazon Web Services (cloud) innovations.
5:25pm-6:05pm (40m) Enterprise adoption
BI and SQL analytics with Hadoop in the cloud
Henry Robinson (Cloudera), Justin Erickson (Cloudera)
Henry Robinson and Justin Erickson explain how to best take advantage of the flexibility and cost-effectiveness of the cloud with your BI and SQL analytic workloads using Apache Hadoop and Apache Impala (incubating), covering the architectural considerations, best practices, tuning, and functionality available when deploying or migrating BI and SQL analytic workloads to the cloud.
11:20am-12:00pm (40m)
Adventures from the frontlines of data for good
JeanCarlo Bonilla (DataKind), Susan Sun (DataKind), Caitlin Augustin (DataKind)
JeanCarlo Bonilla, Susan Sun, and Caitlin Augustin explore how DataKind volunteer teams navigate the road to social impact by automating evidence collection for conservationists and helping expand the reach of mobile surveys so that more voices can be heard.
1:15pm-1:55pm (40m) Enterprise adoption
Building data lakes in the cloud
alex bordei (Bigstep )
Alex Bordei walks you through the steps required to build a data lake in the cloud and connect it to on-premises environments, covering best practices in architecting cloud data lakes and key aspects such as performance, security, data lineage, and data maintenance. The technologies presented range from basic HDFS storage to real-time processing with Spark Streaming.
2:05pm-2:45pm (40m) Law, ethics, governance
Big data, big decisions: Key legal considerations for the collection and use of big data
Kristi Wolff (Kelley Drye & Warren LLP), Crystal Skelton (Kelley Drye & Warren LLP)
Companies making data-driven decisions must consider critical legal obligations that may apply to the collection and use of data. Failing to do so has landed many tech stars and startups in hot legal water. Attorneys Kristi Wolff and Crystal Skelton discuss privacy, data security, and other legal considerations for using data across several industry types.
2:55pm-3:35pm (40m) Law, ethics, governance
Thinking outside the black box: The imperative for accountability and transparency in predictive analytics
Brett Goldstein (University of Chicago)
How can we usher in a future of data-driven decision making that is characterized by more—not less—accountability and accessibility? Brett Goldstein discusses the imperative to couple new developments in data science with a renewed commitment to transparency and open source—with a particular focus on open source models to optimize deployment of policing resources.
4:35pm-5:15pm (40m) Law, ethics, governance
The personalization spectrum
Sara Watson (Digital Asia Hub)
How are users meant to interpret the influence of big data and personalization in their targeted experiences? What signals do we have to show us how our data is used, how it improves or constrains our experience? Sara Watson explains that in order to develop normative opinions to shape policy and practice, users need means to guide their experience—the personalization spectrum.
5:25pm-6:05pm (40m) IoT & real-time
Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid
Tony Ng (WeWork)
Enterprises are increasingly demanding real-time analytics and insights. Tony Ng offers an overview of Pulsar, an open source real-time streaming system used at eBay. Tony explains how Pulsar integrates Kafka, Kylin, and Druid to provide flexibility and scalability in event and metrics consumption.
11:20am-12:00pm (40m) Data innovations
File format benchmark: Avro, JSON, ORC, and Parquet
Owen O'Malley (Cloudera)
Picking the best data format depends on what kind of data you have and how you plan to use it. Owen O'Malley outlines the performance differences between formats in different use cases and offers an overview of the advantages and disadvantages of each to help you improve the performance of your applications.
1:15pm-1:55pm (40m) Hadoop internals & development
Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop
Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera)
Performance tuning your SQL-on-Hadoop deployment may seem overwhelming at times, especially for BI workloads that need interactive response times with high concurrency. Marcel Kornacker and Mostafa Mokhtar simplify the process and cover top performance optimizations for Apache Impala (incubating), from schema design and memory optimization to query tuning.
2:05pm-2:45pm (40m) Hadoop internals & development
Elastic data services on Mesos via Mesosphere’s DC/OS
Adam Bordelon (Mesosphere), Mohit Soni (Mesosphere)
Adam Bordelon and Mohit Soni demonstrate how projects like Apache Myriad (incubating) can install Hadoop on Mesosphere DC/OS alongside other data center-scale applications, enabling efficient resource sharing and isolation across a variety of distributed applications while sharing the same cluster resources and hence breaking silos.
2:55pm-3:35pm (40m) Hadoop internals & development
Debunking HDFS erasure coding performance myths
Zhe Zhang (LinkedIn), Uma Maheswara Rao G (Intel)
The new erasure coding feature in Apache Hadoop (HDFS-EC) reduces the storage cost by ~50% compared with 3x replication. Zhe Zhang and Uma Maheswara Rao G present the first-ever performance study of HDFS-EC and share insights on when and how to use the feature.
4:35pm-5:15pm (40m) Data science & advanced analytics
Using parallel graph-processing libraries for cancer genomics
Crystal Valentine (MapR Technologies)
Crystal Valentine explains how the large graph-processing frameworks that run on Hadoop can be used to detect significantly mutated protein signaling pathways in cancer genomes through a probabilistic analysis of large protein-protein interaction networks, using techniques similar to those used in social network analysis algorithms.
5:25pm-6:05pm (40m) Data science & advanced analytics
Unlocking unstructured text data with summarization
Mike Lee Williams (Cloudera Fast Forward Labs)
Our ability to extract meaning from unstructured text data has not kept pace with our ability to produce and store it, but recent breakthroughs in recurrent neural networks are allowing us to make exciting progress in computer understanding of language. Building on these new ideas, Michael Williams explores three ways to summarize text and presents prototype products for each approach.
11:20am-12:00pm (40m) Hadoop use cases
How the largest US healthcare dataset in Hadoop enables patient-level analytics in near real time
Navdeep Alam (IMS Health)
The need to find efficiencies in healthcare is becoming paramount as our society and the global population continue to grow and live longer. Navdeep Alam shares his experience and reviews current and emerging technologies in the marketplace that handle working with unbounded, de-identified patient datasets in the billions of rows in an efficient and scalable way.
1:15pm-1:55pm (40m) Hadoop use cases
Planning your SQL-on-Hadoop cluster for a multiuser environment with heterogeneous and concurrent query workloads
Jun Liu (Intel), Zhaojuan Bian (Intel)
Many challenges exist in designing an SQL-on-Hadoop cluster for production in a multiuser environment with heterogeneous and concurrent query workloads. Jun Liu and Zhaojuan Bian draw on their personal experience to address these challenges, explaining how to determine the right size of your cluster with different combinations of hardware and software resources using a simulation-based approach.
2:05pm-2:45pm (40m) Hadoop use cases
Creating real-time, data-centric applications with Impala and Kudu
Marcel Kornacker (Cloudera), Todd Lipcon (Cloudera)
Todd Lipcon and Marcel Kornacker explain how to simplify Hadoop-based data-centric applications with the CRUD (create, read, update, and delete) and interactive analytic functionality of Apache Impala (incubating) and Apache Kudu (incubating).
2:55pm-3:35pm (40m) Hadoop use cases
How a Spark-based feature store can accelerate big data adoption in financial services
Kaushik Deka (Novantas), Phil Jarymiszyn (Novantas)
Kaushik Deka and Phil Jarymiszyn discuss the benefits of a Spark-based feature store, a library of reusable features that allows data scientists to solve business problems across the enterprise. Kaushik and Phil outline three challenges they faced—semantic data integration within a data lake, high-performance feature engineering, and metadata governance—and explain how they overcame them.
4:35pm-5:15pm (40m) Hadoop use cases
Zillow: Transforming real estate through big data and data science
Jasjeet Thind (Zillow)
Zillow pioneered providing access to unprecedented information about the housing market. Long gone are the days when you needed an agent to get comparables and prior sale and listing data. And with more data, data science has enabled more use cases. Jasjeet Thind explains how Zillow uses Spark and machine learning to transform real estate.
5:25pm-6:05pm (40m) Hadoop use cases
Hadoop and Spark at ING: An overview of the architecture, security, and business cases at a large international bank
Bas Geerdink (Aizonic)
Bas Geerdink offers an overview of the evolution that the Hadoop ecosystem has taken at ING. Since 2013, ING has invested heavily in a central data lake and data management practice. Bas shares historical lessons and best practices for enterprises that are incorporating Hadoop into their infrastructure landscape.
11:20am-12:00pm (40m) Data science & advanced analytics
Why should I trust you? Explaining the predictions of machine-learning models
Carlos Guestrin (Apple | University of Washington )
Despite widespread adoption, machine-learning models remain mostly black boxes, making it very difficult to understand the reasons behind a prediction. Such understanding is fundamentally important to assess trust in a model before we take actions based on a prediction or choose to deploy a new ML service. Carlos Guestrin offers a general approach for explaining predictions made by any ML model.
1:15pm-1:55pm (40m) Data science & advanced analytics
Data science at eHarmony: A generalized framework for personalization
Jonathan Morra (ZEFR)
Data science has always been a focus at eHarmony, but recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them.
2:05pm-2:45pm (40m) Data science & advanced analytics
Iterative supervised clustering: A dance between data scientists and machine learning
June Andrews (Wise / GE Digital)
Clustering algorithms produce vectors of information, which are almost surely difficult to interpret. These are then laboriously translated by data scientists into insights for influencing product and executive decisions. June Andrews offers an overview of a human-in-the-loop method used at Pinterest and LinkedIn that has lead to fast, accurate, and pertinent human-readable insights.
2:55pm-3:35pm (40m) Data science & advanced analytics
How the Washington Post uses machine learning to predict article popularity
Eui-Hong Han (The Washington Post), Shuguang Wang (The Washington Post)
Predicting which stories will become popular is an invaluable tool for newsrooms. Eui-Hong Han and Shuguang Wang explain how the Washington Post predicts what stories on its site will be popular with readers and share the challenges they faced in developing the tool and metrics on how they refined the tool to increase accuracy.
4:35pm-5:15pm (40m) Data science & advanced analytics
Conditional recurrent neural nets, generative AI Twitter bots, and DL4J
Josh Patterson (Patterson Consulting), Dave Kale (Skymind)
Can machines be creative? Josh Patterson and David Kale offer a practical demonstration—an interactive Twitter bot that users can ping to receive a response dynamically generated by a conditional recurrent neural net implemented using DL4J—that suggests the answer may be yes.
5:25pm-6:05pm (40m) Data science & advanced analytics
Removing complexity from scalable machine learning
Martin Wicke (Google)
Much of the success of deep learning in recent years can be attributed to scale—bigger datasets and more computing power—but scale can quickly become a problem. Distributed, asynchronous computing in heterogenous environments is complex, hard to debug, and hard to profile and optimize. Martin Wicke demonstrates how to automate or abstract away such complexity, using TensorFlow as an example.
11:20am-12:00pm (40m) Spark & beyond
The state of Spark and what's next after Spark 2.0
Ram Sriharsha (Databricks)
Ram Sriharsha reviews major developments in Apache Spark 2.0 and discusses future directions for the project to make Spark faster and easier to use for a wider array of workloads, with an emphasis on API evolution, single-node performance (Project Tungsten Phase 3), and Structured Streaming.
1:15pm-1:55pm (40m) Spark & beyond
Top five mistakes when writing Spark applications
Ted Malaska (Capital One), Mark Grover (Lyft)
Ted Malaska and Mark Grover cover the top five things that prevent Spark developers from getting the most out of their Spark clusters. When these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters and the same data, using just a different approach.
2:05pm-2:45pm (40m) Spark & beyond
Tuning Spark machine-learning workloads
Raj Krishnamurthy (IBM)
Spark's efficiency and speed can help reduce the TCO of existing clusters. This is because Spark's performance advantages allow it to complete processing in drastically shorter batch windows with higher performance per dollar. Raj Krishnamurthy offers a detailed walk-through of an alternating least squares-based matrix factorization workload able to improve runtimes by a factor of 2.22.
2:55pm-3:35pm (40m) Hadoop use cases
Big data processing with Hadoop and Spark, the Uber way
Praveen Murugesan (Uber Technologies Inc)
Praveen Murugesan explains how Uber leverages Hadoop and Spark as the cornerstones of its data infrastructure. Praveen details the current data architecture at Uber and outlines some of the unique challenges with data processing Uber faced as well as its approach to solving some key issues in order to continue to power Uber's real-time marketplace.
4:35pm-5:15pm (40m) Spark & beyond
Delivering near real-time mobility insights at Swisscom
Francois Garillot (Swisscom)
Swisscom, the leading mobile service provider in Switzerland, also provides data-driven intelligence through the analysis of its mobile network. Its Mobility Insights team works to help administrators understand the flow of people through their location of interest. François Garillot explores the platform, tooling, and choices that help achieve this service and some challenges the team has faced.
5:25pm-6:05pm (40m) Spark & beyond
Breaking Spark: The top five mistakes to avoid when using Apache Spark in production
Neelesh Salian (Stitch Fix)
Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments.
11:20am-12:00pm (40m) Sponsored
Create advanced analytic models with open source
Kyle Ambert (Intel)
Creating production-ready analytical pipelines can be a messy, error-prone undertaking. Kyle Ambert explores the Trusted Analytics Platform, an open source-based platform that enables data scientists to ask bigger questions of their data and carry out principled data science experiments—all while engaging in iterative, collaborative development of production solutions with application developers.
1:15pm-1:55pm (40m) Sponsored
Getting it right exactly once: Principles for streaming architectures
Darryl Smith (Dell)
Hear the Chief Data Platform Architect of Dell Technologies outline streaming principles.
2:05pm-2:45pm (40m) Sponsored
Citi, Standard Charter Bank, and Polaris: The modern information pipeline that fuels investigations of money laundering, fraud, and human trafficking
Shankar Ganapathy (Paxata), Mark Nelson (Standard Chartered Bank), Veronica Liwak (Polaris )
Join data experts from Citi, Standard Charter Bank, and Polaris for a panel discussion moderated by Shankar Ganapathy. Learn about the principles, technologies, and processes they have used to design a highly efficient information management pipeline architected around the Hadoop ecosystem.
2:55pm-3:35pm (40m) Sponsored
Data science for executives
Jeremy Achin (DataRobot), Tom de Godoy (DataRobot)
In today's world, executives need to be the drivers for data science solutions. Data analysis has moved from the domain of data scientists to the forefront of core strategic initiatives. Are you empowering your team to identify and execute on every opportunity to optimize business with machine learning? In this session, you will learn how executives are transforming business with machine learning.
4:35pm-5:15pm (40m) Sponsored
Beyond the numbers: Expanding the size of your analytic discovery team
Edd Wilder-James (Google), Maksim Pecherskiy (City of San Diego), Robert Stratton (Neustar), Chris Kakkanatt (Pfizer)
Analytic discovery is a team sport; the lone hero data scientist is a thing of the past. John Akred of Silicon Valley Data Science leads a panel of analytics and data experts from Pfizer, the City of San Diego, and Neustar that explores how these businesses were changed through analytic collaboration.
5:25pm-6:05pm (40m) Sponsored
Making real-time analytics on the data lake a reality
Amit Vij (Kinetica), Mark Brooks (Kinetica DB, Inc.)
Data lakes provide large-scale data processing and storage at low cost but struggle to deliver real-time analytics without investment in large clusters. If you need subsecond analytic response on streaming data, consider a GPU database. Amit Vij and Mark Brooks outline the dramatic performance benefits a GPU database offers and explain how to integrate it with Hadoop.
11:20am-12:00pm (40m) Sponsored
The keys to an event-based microservices application
Crystal Valentine (MapR Technologies)
Crystal Valentine draws on lessons learned from companies like Uber and Ericsson to outline the key principles to developing a microservices application. Along the way, Crystal describes how certain next-gen application areas—such as machine learning—are particularly well suited to implementation in a microservices architecture rather than a legacy application paradigm.
1:15pm-1:55pm (40m) Sponsored
Five ways to modernize your BI tools and make them work on more data
Scott Anderson (ClearStory Data), Andrew Yeung (ClearStory Data)
More data exists than ever before and in more disparate silos. Getting the insights you need, sifting through data, and answering new questions have all been complex, hairy tasks that only data jocks have been able to do. Andrew Yeung and Scott Anderson explore new ways to challenge the status quo and speed insights on diverse sources and demonstrate real customer use cases.
2:05pm-2:45pm (40m) Sponsored
Building a modern data architecture
Ben Sharma (Zaloni)
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma offers lessons learned from the field to get you started.
2:55pm-3:35pm (40m) Sponsored
Unified integration for data lakes and modern data applications
Jonathan Gray (Cask)
Building, running, and governing a data lake on Hadoop is often a difficult process filled with slow development cycles and painful operations. Jonathan Gray proposes a modern, unified integration architecture that helps IT mitigate these issues while enabling businesses to reduce time to insights and make decisions faster through a modern self-service environment.
4:35pm-5:15pm (40m) Sponsored
Successful open data science on Hadoop: From sandbox to production
Peter Wang (Anaconda)
Although Python and R promise powerful data science insights, they can also be complex to manage and deploy with Hadoop infrastructure. Peter Wang distills the vast array of Hadoop and data science tools and architectures down to the essentials that deliver a powerful and lightweight stack quickly so that you can accelerate time to value while meeting your data science, governance, and IT needs.
5:25pm-6:05pm (40m) Sponsored
Why is this disruption different from all other disruptions? Hadoop as a game changer in financial services
Matt Turck (FirstMark Capital), Einat Burshtine (Credit Suisse), Shui Cheung yip (Pershing LLC (Bank of New York Mellon)), Alasdair Anderson (Nordea)
What's the point at which Hadoop tips from a Swiss-army knife of use cases to a new foundation that rearranges how the financial services marketplace turns data into profit and competitive advantage? This panel of expert practitioners looks into the near future to see if the inflection point is at hand.
11:20am-12:00pm (40m) Sponsored
Big data meets the IoT
Cheryl Wiebe (Think Big, a Teradata Company)
The IoT is fundamentally transforming industries and reconfiguring the technology landscape, but challenges exist for enterprises to effectively realize the value from this next wave of information and opportunity. Cheryl Wiebe explores how leading companies harness the IoT by putting IoT data in context, fostering collaboration between IT and OT and enabling a new breed of scalable analytics.
1:15pm-1:55pm (40m) Sponsored
Accelerating time to analytical value in the enterprise with data lake management
Viral Shah (Asurion Services )
Viral Shah explains how enterprises like Asurion Services are leveraging big data management solutions to accelerate enterprise data lake initiatives for business value.
2:05pm-2:45pm (40m) Sponsored
Trusted IoT and big data ecosystems
Reiner Kappenberger (HPE Security–Data Security)
Reiner Kappenberger explores the new standards and innovations enabling architects and developers to take a “build it in” approach to security in early design phases for big data and IoT systems, explaining why emerging technologies such as format-preserving encryption are rapidly delivering more trusted big data and IoT ecosystems without altering application behavior or device functionality.
2:55pm-3:35pm (40m) Sponsored
Turning petabytes of data into millions in cost savings for the world’s biggest retailers
Jonathon Whitton (PRGX USA)
Jonathon Whitton details how PRGX is using Talend and Cloudera to load two million annual client flat files into a Hadoop cluster and perform recovery audit services in order to help clients detect, find, and fix leakage in their procurement and payment processes.
4:35pm-5:15pm (40m) Sponsored
Virtualizing big data: Effective approaches derived from real-world deployments
Martin Yip (VMware)
The trend of deploying Hadoop on virtual infrastructure is rapidly increasing. Martin Yip explores the benefits of virtualizing Hadoop through the lens of three real-world examples. You'll leave with the confidence to deploy your Hadoop clusters using virtualization.
5:25pm-6:05pm (40m) Sponsored
From lake to reservoir: Harnessing big data’s power for the enterprise
Thomas Place (First Data)
Thomas Place explores the big data journey of the world’s biggest payment processor, which came dangerously close to building a data swamp before pivoting to embrace governance and quality-first patterns. This case study includes patterns, partners, successes, failures, and lessons learned to date and reviews the journey ahead.
11:20am-12:00pm (40m) Sponsored
Achieve richer insights and business outcomes with Dell EMC big data and analytics
Carey James (BlueTalon)
Big data and analytics is a team sport empowering companies of all kinds to achieve business outcomes faster and with greater levels of success. Carey James explains how the formation of Dell Technologies and Dell EMC can help you on your data analytics journey and how you can turn actionable insights into new business opportunities.
1:15pm-1:55pm (40m) Sponsored
The flux capacitor of machine learning: Turn data garbage into 1.21 gigawatt-powered acceleration
Ingo Mierswa (RapidMiner)
The flux capacitor was the core component that made time travel possible in Back to the Future, processing garbage as a power source. Did you know that you can achieve the same affect in machine learning? Ingo Mierswa demonstrates how you can power through your analytics faster than ever before using the knowledge of 250K data scientists.
2:05pm-2:45pm (40m) Sponsored
Filling the data lake
Chuck Yarbrough (Pentaho)
It’s hard to get data into a data lake. Organizations hand-code their way through this, but with hundreds of data sources, it soon becomes unmanageable. Chuck Yarbrough offers a solution that uses metadata to autogenerate ingestion processes. Teams can drive hundreds of Hadoop onboarding processes through just a few templates, reducing development time and risk.
2:55pm-3:35pm (40m) Sponsored
Enhancing the customer experience when driving Hadoop adoption
Anthony Dina (Dell EMC)
Mastercard's Nick Curcuru hosts an interactive fireside chat with Anthony Dina from Dell to explore how the flexibility, scalability, and agility of Hadoop big data solutions allow one of the world’s leading organizations to innovate, enable, and enhance the customer experience while still expanding emerging opportunities.
4:35pm-5:15pm (40m) Sponsored
Open source operations: Building on Apache Spark with InsightEdge, TensorFlow, Apache Zeppelin, and your own project
Antonio Rosales (Canonical)
Antonio Rosales offers an overview of Juju, an open source method to distill the best practices and operations needed to use interconnected big data solutions. By providing an open source means to describe services and solutions, users can focus on using the science, and developers can focus on delivering best practices.
5:25pm-6:05pm (40m) Sponsored
Changing the landscape with deep learning and accelerated analytics
Jim McHugh (NVIDIA)
Customers are looking to extend the benefits beyond big data with the power of the deep learning and accelerated analytics ecosystems. Jim McHugh explains how customers are leveraging deep learning and accelerated analytics to turn insights into AI-driven knowledge and covers the growing ecosystem of solutions and technologies that are delivering on this promise.
11:20am-12:00pm (40m) Sponsored
Future-proofing BI: An unexpected journey to leverage in-chip analytics in the IoT and AI
Guy Levy-Yurista, PhD (Sisense)
Guy Levy-Yurista explains the unexpected consequences of making big data processing significantly more agile than ever before and the impact it's having on human insight consumption.
1:15pm-1:55pm (40m) Sponsored
A new “Sparkitecture” for modernizing your data warehouse
Jack Gudenkauf (Hewlett Packard Enterprise)
Jack Gudenkauf explores how organizations have successfully deployed tiered hyperscale architecture for real-time streaming with Spark, Kafka, Hadoop, and Vertica and discusses how advancements in hardware technologies such as nonvolatile memory, SSDs, and accelerators are changing the role of big data and big analytics platforms in an overall enterprise-data-platform strategy.
2:05pm-2:45pm (40m) Sponsored
Top data wrangling use cases in enterprise analytics
Connor Carreras (Trifacta)
Connor Carreras offers an in-depth review of the most popular use cases for data wrangling solutions among enterprise organizations, drawing on real customer deployments to explain how data wrangling has enabled them to accelerate analysis and uncover new sources of business value.
2:55pm-3:35pm (40m) Sponsored
From data to insights using analytics
Johan Bjerke (Splunk Inc)
Machine data is growing at an exponential rate, and a key driver for this growth is the Internet of Things (IoT) revolution. Johan Bjerke explains how to find value in and make use of the unstructured machine data that plays an important role in the new connected world.
4:35pm-5:15pm (40m) Sponsored
Gaining extreme agility and performance using a Spark-free approach to data management
Jake Dolezal (McKnight Consulting Group Global Services)
Jake Dolezal shares research into the performance of data quality and data management workloads on Hadoop clusters. Jake discusses a YARN-based approach to data management and outlines highly effective IT resource utilization techniques to achieve extreme agility for organizations and performance gains in Hadoop.
5:25pm-6:05pm (40m) Sponsored
Data warehouse augmentation and modernization using Hadoop
Amar Arsikere (infoworks.io)
Current data warehouse technologies are increasingly challenged to handle the growth in data volume, new data types, and multiple analytics types. Hadoop has the potential to address these issues, but you need to solve several complexities before you can realize its full benefits. Amar Arsikere showcases the business and technical aspects of augmenting and modernizing data warehouses on Hadoop.
7:05pm-8:00pm (55m)
Break: Dinner
8:50am-9:00am (10m)
Wednesday keynotes
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Strata + Hadoop World program chairs Roger Magoulas, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes.
9:00am-9:15am (15m)
The new dynamics of big data
Mike Olson (Cloudera)
Since its inception, big data solutions have best been known for their ability to master the complexity of the volume, variety, and velocity of data. But as we enter the era of data democratization, there’s a new set of concerns to consider. Mike Olson discusses the new dynamics of big data and how a renewed approach focused on where, who, and why can lead to cutting-edge solutions.
9:15am-9:25am (10m) Sponsored keynote
Decision 2016: What is your data platform?
Jack Norris (MapR Technologies)
During election season, we’re tasked with considering the next four years and comparing platforms across candidates. What’s good for the country is good for your data. Consider what the next four years will look like for your organization. How will you lower costs and deliver innovation? Jack Norris reviews the requirements for a winning data platform, such as speed, scale, and agility.
9:25am-9:40am (15m)
US venture: Risk, values, founder outcomes
Susan Woodward (Sand Hill Econometrics)
Susan Woodward discusses venture outcomes—what fraction make lots of money, which just barely return capital, and which fraction fail completely. Susan uses updated figures on the fraction of entrepreneurs who succeed, including some interesting details on female founders of venture companies.
9:40am-9:45am (5m) Sponsored keynote
Collaboration and openness drive innovation in artificial intelligence
Martin Hall (Intel)
The power of artificial intelligence and advanced analytics emerges from the ability to analyze and compute large, disparate datasets from varied devices and locations, such as predictive medicine and automated cars, at lightning-fast speed. Martin Hall explains why collaboration and openness are the key elements driving innovation in AI.
9:45am-9:50am (5m) Sponsored keynote
Driving open source adoption within the enterprise
Ron Bodkin (Google)
There’s been much discussion on open source versus commercial; CIOs and CTOs are increasingly interested in solutions that blend the benefits of both worlds. Ron Bodkin explains how Teradata drives open source adoption inside enterprises through a range of initiatives: direct contributions to open source projects, building orchestration software, and providing technical expertise.
9:50am-10:05am (15m)
The art and science of serendipity
Pagan Kennedy (Inventology: How We Dream Up Things That Change the World)
How do we discover what we're not looking for? In the age of big data and bioinformatics, the answer is more relevant than ever. We develop new tools to help us spot clues in mountains of information, and yet, serendipity remains a very human art. Pagan Kennedy discusses the origins of the word serendipity and qualities of mind that lead to successful searches in the deep unknown.
10:05am-10:10am (5m) Sponsored keynote
Modern analytics with Dell EMC
Patricia Florissi (Dell EMC)
Data, your most precious commodity, is increasing at an alarming rate. At the same time, an emerging business imperative has made this data a component of your deepest insights, allowing you to focus on your business outcomes. Patricia Florissi explains why the recent formation of Dell EMC ensures that your analytics capabilities will be stronger than ever.
10:10am-10:20am (10m)
Transforming healthcare through precision data science: Myths and facts
Sriram Vishwanath (Accordion Health Inc. | University of Texas, Austin)
Healthcare, a $3 trillion industry, is ripe for disruption through data science. However, there are many challenges in the journey to make healthcare a truly transparent, consumer-centric, data-driven industry. Sriram Vishwanath shares some myths and facts about data science's impact on healthcare.
10:20am-10:40am (20m)
The trouble with polls
Jill Lepore (Harvard University | The New Yorker)
American politics is adrift in a sea of polls. This year, that sea is deeper than ever before—and darker. Data science is upending the public opinion industry. But to what end? In a brief, illustrated history of the field, Jill Lepore demonstrates how pollsters rose to prominence by claiming that measuring public opinion is good for democracy and asks, "But what if it’s bad?"
10:50am-11:20am (30m)
Break: Morning Break sponsored by Intel
3:35pm-4:35pm (1h)
Break: Afternoon Break sponsored by Teradata
6:05pm-7:05pm (1h) Event
Booth Crawl
Quench your thirst with vendor-hosted libations (and snacks) while you check out all the exhibitors in the Expo Hall.
7:30am-8:45am (1h 15m)
Break: Coffee Break sponsored by Basho
12:00pm-1:15pm (1h 15m) Event
Wednesday BoF Tables
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics.
6:30am-7:30am (1h) Event
Data Dash
Please join Cloudera and O'Reilly Media for the Data Dash run/walk. Meet fellow data enthusiasts, find a new pal to run with, and enjoy the fresh air as you run along the river and catch the beautiful sunrise over New Jersey and New York City.
8:00pm-11:00pm (3h) Event
Data After Dark: Aboard the City at Sea (sponsored by Cisco, Cloudera, and ThoughtSpot)
Don’t miss Data After Dark, the social highlight of Strata + Hadoop World, happening at the Intrepid Sea, Air & Space Museum.