Presented by O'Reilly and Cloudera
Make Data Work
July 12-13, 2017: Training
July 13-15, 2017: Tutorials & Conference
Beijing, China

英文讲话 (Presented in English)

09:00–12:30 Thursday, 2017-07-13
Location: 多功能厅3(Function Room 3) 观众水平 (Level): Beginner
Yufeng Guo (Google)
TensorFlow is a popular open source machine learning library that is especially well-suited for deep learning. Yufeng Guo introduces machine learning and deep learning with concrete examples, walking you through hands-on exercises using TensorFlow and TensorBoard. Read more.
09:00–12:30 Thursday, 2017-07-13
Location: 多功能厅5B(Function Room 5B) 观众水平 (Level): Beginner
Ted Malaska (Capital One)
The recent advancement in distributed processing engines, from Spark to Impala to Spark Streaming and Storm, has proved exciting. Ted Malaska explains why, if your design only focuses on the processing layer to get speed and power, you may be missing half the story and leaving a significant amount of optimization untapped. Read more.
13:30–17:00 Thursday, 2017-07-13
Location: 多功能厅5B(Function Room 5B) 观众水平 (Level): Intermediate
Ted Malaska (Capital One)
平均得分:: *****
(5.00, 1 次得分)
Ted Malaska walks you through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark. Read more.
09:05–09:20 Friday, 2017-07-14
Location: 紫金大厅A(Grand Hall A)
Mick Hollison (Cloudera), Jien Zhou (UnionPay)
平均得分:: ***..
(3.00, 1 次得分)
Mick Hollison and Jien Zhou discuss how organizations are applying machine learning and advanced analytics to improve customer service and reduce the threat of fraud and cyberattack and explain how China UnionPay is using big data to deliver a better customer experience and manage risk. Read more.
11:15–11:55 Friday, 2017-07-14
Location: 紫金大厅B(Grand Hall B) 观众水平 (Level): Non-technical
Feng Cheng (Grab), Edwin Law (Grab)
平均得分:: **...
(2.00, 2 次得分)
Grab is sitting at the junction of the digital and physical worlds. Its vision is to drive Southeast Asia forward and transform the way people travel and pay across the region. Feng Cheng and Edwin Law explain Grab's data architecture and offer a history of its data platform migration and stream-processing apps. Read more.
11:15–11:55 Friday, 2017-07-14
Location: 多功能厅2(Function Room 2) 观众水平 (Level): Beginner
Andrew Wang (Cloudera), Daniel Templeton (Cloudera)
Apache Hadoop 3.0 has made steady progress toward a planned release this year. Andrew Wang and Daniel Templeton offer an overview of new features, including HDFS erasure coding, YARN Timeline Service v2, and MapReduce task-level optimization, and discuss current release management status and community testing efforts dedicated to making Hadoop 3.0 the best Hadoop major release yet. Read more.
13:10–13:50 Friday, 2017-07-14
Location: 多功能厅2(Function Room 2) 观众水平 (Level): Intermediate
Jimmy Zhigang Su (JD.COM), Tony Lee (
平均得分:: ***..
(3.50, 2 次得分) is one of the largest B2C online retailers in the world. Its mission is to provide a safe and secure marketplace for its 226M active users and 120K third-party vendors. Jimmy Zhigang Su and Tony Lee discuss the transformations big data has enabled at JD, including threat intelligence, account security, and end-point security. Read more.
14:00–14:40 Friday, 2017-07-14
Location: 多功能厅2(Function Room 2) 观众水平 (Level): Intermediate
Ted Malaska (Capital One)
It's one thing to write an Apache Spark application that gets you to an answer. It’s another thing to know you used all the tricks in the book to make it run as fast as possible. Ted Malaska shares some of those tricks. Read more.
14:00–14:40 Friday, 2017-07-14
Location: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): Advanced
Yu Li (Alibaba), Ramkrishna Vasudevan (Intel)
平均得分:: ***..
(3.00, 1 次得分)
Yu Li explains how Alibaba met the challenge of tens of millions requests per second to its Alibaba-Search HBase cluster on 2016 Singles' Day. With read-path off-heaping, Alibaba improved the throughput by 30% and achieved a predicable latency. Read more.
14:00–14:40 Friday, 2017-07-14
Location: 多功能厅8A+8B(Function Room 8A+8B)
Yifeng Jiang (Hortonworks)
Yifeng Jiang offers an overview of HDF 3.0, the open source IoT platform that everyone can easily start using right now. HDF supports data collection from the edge, flow management to send data to the data center and the cloud, real-time processing, and visualization and analytics with open source technology and can be used with simple drag-and-drop operations. Read more.
14:50–15:30 Friday, 2017-07-14
Location: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): Intermediate
Mathieu Dumoulin (McKinsey & Company), Mateusz Dymczyk (
Mathieu Dumoulin and Mateusz Dymczyk walk you step by step through building a scalable, real-time anomaly detection pipeline applied to an industrial robot. You'll learn how to gather data from a wireless movement sensor, process it with H2O on a MapR cluster, and visualize the output through an AR headset by an operator. Read more.
14:50–15:30 Friday, 2017-07-14
Location: 多功能厅3B(Function Room 3B) 观众水平 (Level): 中级 (Intermediate)
Adam Gibson (Konduit)
Adam Gibson offers a high-level overview of jumpy, a better Python interface for deep learning applications, and explains why Spark's Py4J interface for deep learning makes it impractical for deep learning applications. Read more.
09:05–09:15 Saturday, 2017-07-15
Location: 紫金大厅A(Grand Hall A)
Amr Awadallah (Cloudera)
Amr Awadallah explains how data science and machine learning methods are evolving to bring a more comprehensive, secure, and enterprise-grade data science experience to the enterprise. Read more.
10:05–10:20 Saturday, 2017-07-15
Location: 紫金大厅A(Grand Hall A)
Lukas Biewald (Weights & Biases)
As companies take machine learning out of R&D and into production, they face a whole new set of challenges. Lukas Biewald explains why human in the loop, active learning, and transfer learning are all essential design patterns for making deep learning real. Read more.
11:15–11:55 Saturday, 2017-07-15
Location: 报告厅(Auditorium) 观众水平 (Level): Intermediate
Lukas Biewald (Weights & Biases)
平均得分:: ***..
(3.00, 2 次得分)
Training data collection strategies are often the most important and overlooked part of deploying real-world machine learning algorithms. Lukas Biewald explains why active learning is the best way to collect training data and can make the difference between a failed research project and a deployed production algorithm. Read more.
13:10–13:50 Saturday, 2017-07-15
Location: 报告厅(Auditorium) 观众水平 (Level): Beginner
Yufeng Guo (Google)
Machine learning has traditionally been performed only on servers and high-performance machines, but on-device machine learning on mobile devices can be very valuable. Yufeng Guo uses TensorFlow to implement a deep learning model for image classification on an Android device, tailored to a custom dataset. You'll leave ready to get started on your own mobile deep learning solutions. Read more.
14:00–14:40 Saturday, 2017-07-15
Location: O’Reilly展位A桌 (Table A in O'Reilly Booth)
Lukas Biewald (Weights & Biases)
Best practices in training data collection and human-in-the-loop computing to make it possible to deploy imperfect machine learning algorithms for mission critical application. Lukas Biewald explains how you can make the best possible use of training data and why it is essential to making your machine learning work well. Read more.
16:20–17:00 Saturday, 2017-07-15
Location: 多功能厅2(Function Room 2) 观众水平 (Level): 中级 (Intermediate)
Andrew Wang (Cloudera), 郑锴 (Intel)
Hadoop3.0 引入了纠删码技术。在常见配置下,纠删码相对于传统数据3备份模式可以降低50%的存储成本,同时提高数据的可靠性。在本次演讲中,我们首先会简短的介绍HDFS纠删码技术, 然后深入了解在Hadoop 3.0 GA 前我们为保证纠删码功能稳定性做的工作,以及分享Hadoop生态系统中重要成员Spark, Hive,Impala, Kylin等等在HDFS 纠删码上的性能表现。最后,我们会给出在生产环境中部署使用纠删码技术的一些考虑和建议。 Read more.

Connect with O'ReillyData

Use the QR Code to follow OReillyData and get the latest conference information and browse data articles.

WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

Read the latest ideas on big data.

ORB Data Site