The global movement Vision Zero aims to reduce traffic fatalities and severe injuries to zero. Erin Akred and Michael Dowd explore a partnership between Microsoft, a team of DataKind data scientists, government officials, and researchers that has been working to leverage newly available datasets to inform cities’ efforts nationwide to reduce traffic-related deaths and severe injuries to zero.
Roy Ben-Alta explores the Amazon Kinesis platform in detail and discusses best practices for scaling your core streaming data ingestion pipeline as well as real-world customer use cases and design pattern integration with Amazon Elasticsearch, AWS Lambda, and Apache Spark.
Time series and event data form the basis for real-time insights about the performance of businesses such as ecommerce, the IoT, and web services, but gaining these insights involves designing a learning system that scales to millions and billions of data streams. Ira Cohen outlines a system that performs real-time machine learning and analytics on streams at massive scale.
The need to quickly acquire, process, prepare, store, and analyze data has never been greater. The need for performance crosses the big data ecosystem too—from the edge to the server to the analytics software, speed matters. Raghunath Nambiar shares a few use cases that have had significant organizational impact where performance was key.
We're likely just at the beginning of data science. The people and things that are starting to be equipped with sensors will enable entirely new classes of problems that will have to be approached more scientifically. Mike Stringer outlines some of the issues that may arise for business, for data scientists, and for society.
GE Oil & Gas is at the forefront of leveraging the Industrial Internet and advanced analytics to drive profitability and growth. Jolene Jeffries and Tara Prakriya explain how subject-matter experts are using advanced analytics and machine learning to directly contribute to the profitability of the business unit.
In the realm of predictive maintenance, the event of interest is an equipment failure. In real scenarios, this is usually a rare event. Unless the data collection has been taking place over a long period of time, the data will have very few of these events or, in the worst case, none at all. Danielle Dean and Shaheen Gauher discuss the various ways of building and evaluating models for such data.
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their fair share. Ted Dunning presents a demo of how data streaming can be applied to the analytics problems posed by modern motorsports. Although he won't be bringing Formula 1 cars to the talk, Ted demonstrates a physics-based simulator to analyze realistic data from simulated cars.
Opportunities in the industrial world are expected to outpace consumer business cases. Time series data is growing exponentially as new machines get connected. Venkatesh Sivasubramanian and Luis Ramos explain how GE makes it faster and easier for systems to access (using a common layer) and perform analytics on a massive volume of time series data by combining Apache Apex, Spark, and Kudu.
Yaron Haviv explains how to design real-time IoT and FSI applications, leveraging Spark with advanced data frame acceleration. Yaron then presents a detailed, practical use case, diving deep into the architectural paradigm shift that makes the powerful processing of millions of events both efficient and simple to program.
Kentucky's Transportation Cabinet is integrating streaming data—crowdsourced from Waze, Twitter, weather reports, sensors, and snow truck status—to improve public safety, reduce congestion, and enhance operations. Vineet Kumar shares how the data is processed using GeoEvent Processor, ArcServer, SDE, and Hadoop.
Sridhar Alla and Kiran Muglurmath explain how real-time analytics on Comcast Xfinity set-top boxes (STBs) help drive several customer-facing and internal data-science-oriented applications and how Comcast uses Kudu to fill the gaps in batch and real-time storage and computation needs, allowing Comcast to process the high-speed data without the elaborate solutions needed till now.
Radish Lab teamed up with science news nonprofit Climate Central to transform temperature data from 1,001 US cities into a compelling, simple interactive that received more than 1 million views within three days of launch. Alana Range and Brian Kahn offer an overview of the process of creating a viral, interactive data visualization with teams that regularly produce powerful data stories.
Smart data allows fire services to better protect the people they serve and keep their firefighters safe. The combination of open and nonpublic data used in a smart way generates new insights both in preparation and operations. Bart van Leeuwen discusses how the fire service is benefiting from open standards and best practices.
Moty Fania shares Intel’s IT experience implementing an on-premises IoT platform for internal use cases. The platform was designed as a multitenant platform with built-in analytical capabilities and based on open source big data technologies and containers. Moty highlights the lessons learned from this journey with a thorough review of the platform’s architecture.
Reiner Kappenberger explores the new standards and innovations enabling architects and developers to take a “build it in” approach to security in early design phases for big data and IoT systems, explaining why emerging technologies such as format-preserving encryption are rapidly delivering more trusted big data and IoT ecosystems without altering application behavior or device functionality.
With the advent of smart grid technology, the quantity of data collected by electrical utilities has increased by 3–5 orders of magnitude. To make full use of this data, utilities must expand their analytical capabilities and develop new analytical techniques. Kim Montgomery discusses some ways that big data tools are advancing the practice of preventative maintenance in the utility industry.