Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

Schedule: Data, Big and Small sessions

Data is literally everywhere you look and our devices and computers are working with bigger and more diverse sets of data than ever before. How do you manage this deluge? How do you tackle big data’s continued and growing influence over the entire business world? How you can make it work for you? How do you show others what you’ve collected in a way that is digestible?

Add to your personal schedule
1:30pm–5:00pm Monday, May 8, 2017
Location: Ballroom E
Level: Intermediate
Barbara Fusinska (Microsoft)
Machine learning is growing increasingly popular. R is an open source platform that offers numerous libraries and implementations of machine-learning algorithms. Barbara Fusinska demonstrates how to use R to prepare data, create a predictive model, and display the results. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, May 8, 2017
Location: Meeting Room 10 A/B
Level: Beginner
Jeremy Wilken (VMware)
Understanding data as it streams is vital today. Using Angular and D3, Jeremy Wilken demonstrates how to build out an example visualization application that consumes a live stream and shows meaningful metrics that could help businesses make critical, real-time decisions. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, May 9, 2017
Location: Meeting Room 9
Level: Intermediate
William Lyon (Neo Technology)
William Lyon explains how to use a graph database to generate real-time recommendations using real-world data. William introduces graph data modeling and querying concepts using Neo4j and Cypher, the query language for graphs to import and query data, before demonstrating how to apply graph algorithms and NLP using Python data science tools to enhance your recommendations. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Beginner
Vida Williams (Axis Partners, Inc)
Vida Williams offers an overview of a project that transmuted qualitative indicators of risk and success in foster care to quantitative indicators using real-life child welfare datasets and shares the lessons about capturing, assembling, and sharing datasets learned along the way. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Beginner
New York City has released its taxi dataset to the public. Ana Sa explains how she used Python to determine areas of frequent pick-ups and drop-offs within a time frame and superimposed those hotspots atop a map of the subway system to identify taxi hotspots that fall within or outside of a particular radius of established subway stops—and used this data as the basis for a proposed bus route. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Mita Mahadevan (Intuit)
Many leading tech companies (Uber, Netflix, etc.) are building scalable, in-house product testing data platforms from the ground up to enable experimentation and engender a data-driven mentality. Mita Mahadevan explores how these companies are developing in-house A/B testing frameworks using open source tools and shares dos and don’ts for those in the midst of their journey to become data driven. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Beginner
Taras Matyashovsky explains how to use Apache Spark MLlib to build a supervised learning NLP pipeline to distinguish pop music from heavy metal—and have fun in the process. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Alena Hall (Microsoft Research), Natallia Dzenisenka (Independent Contractor)
Alena Hall and Natallia Dzenisenka explore the set of algorithms behind distributed systems, including snapshot algorithms, traversal algorithms, election algorithms, and reliable broadcast, giving you a clear understanding of how those systems work. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, May 10, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Yufeng Guo (Google)
Deep learning has already revolutionized machine-learning research, but it remains opaque to many developers. Yufeng Guo explains just how easy it is to get started with advanced machine learning by live-coding a wide and deep learning model using TensorFlow, training it using TensorFlow's tf.learn library, and evaluating it. You'll leave ready to use deep learning on your own data. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Beginner
Jonathan Morgan (New Knowledge)
Jonathon Morgan explores computer vision, deep learning, and natural language processing techniques for uncovering communities of white nationalists and neo-Nazis on social media and identifying which ones are on the path to radicalization. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Heather Nelson (Silicon Valley Data Science), Mark Mims (Silicon Valley Data Science)
Configuring a data platform and data science environment can be a tedious, error-prone process. Heather Nelson and Mark Mims explain how to create a cloud-agnostic environment combining cloud platforms such as AWS or Azure with Terraform and Ansible that spins up quickly and is easy to configure as required. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Beginner
Edward Finkler (Graph Story)
Most of us have worked with relational databases like MySQL or PostgreSQL, but they aren't the best option for many use cases. Graph databases have a simpler, more powerful model for handling complex, related data. Edward Finkler uses Neo4j to explore the advantages of graph databases, showing how graphs work and how they give you the power to do things that are difficult or impossible in SQL. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Tim Ellison (IBM UK)
Private information retrieval techniques enable you to perform searches while keeping secret not only the results from the data controller but also the questions you are asking. Tim Ellison explores practical private information retrieval through homomorphic encryption—an efficient crypto-calculus procedure that provides a provably secure mechanism for executing private queries over data. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Sean Mackrory (Cloudera)
Sean Mackrory offers an overview of and best practices for filesystems in public cloud infrastructures as they relate to traditional filesystems. Many of the examples will relate to Hadoop, namely moving from HDFS to S3. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, May 11, 2017
Location: Meeting Room 18 C/D
Level: Intermediate
Barbara Fusinska (Microsoft)
Data science and machine learning are growing increasingly popular. R is an open source platform that offers numerous libraries and implementations of machine-learning algorithms. Barbara Fusinska explains how to use R as a tool for data analysis, performing machine-learning computations, and displaying the results of predictions. Read more.