Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore
 

Click buttons to filter by type

  • Events
  • Tutorials
  • Training
  • Keynotes
  • Office Hours
  • 321-322
    11:50am Graph data analytics in finance: The Mitsubishi experience Yuichi Kuroda (Mitsubishi UFJ Information Technology (MUIT))
    1:30pm Scaling the Python data experience Wes McKinney (Two Sigma Investments)
    4:00pm Petascale genomics Uri Laserson (Cloudera)
    4:50pm Enterprise Deep Learning Workflows with DL4J Josh Patterson (Skymind)
    324
    11:00am Huawei advanced data science with Spark Streaming Albert Bifet (Télécom ParisTech), Silviu Maniu (Huawei)
    1:30pm Next-generation platforms for IoT-driven contextual awareness Markus Kirchberg (Deep Labs Pte. Ltd.)
    4:50pm How are your morals? Ethics in algorithms and IoT Majken Sander (Majken Sander), Joerg Blumtritt (Datarella)
    331
    4:00pm How to run Neural Nets on GPUs Melanie Warrick (Google)
    4:50pm Leveraging data analytics for high performance design Rakesh Menon (McLaren Applied Technology)
    334-335
    11:00am 12 steps to cloud security Vettrivel Viswanathan (Nephos)
    11:50am Computational privacy: The privacy bounds of human behavior Yves-Alexandre de Montjoye (Imperial College London | MIT Media Lab)
    2:20pm Data visualizations decoded Julia Rodriguez (Eagle Investment Systems)
    4:00pm Visualising multi-dimensional data Amit Kapoor (narrativeVIZ)
    4:50pm Customer record deduplication using Spark and Reifier Dave Chan (UBM Asia), Sonal Goyal (Nube)
    328-329
    11:00am GDELT + BigQuery: Understanding global society through SQL Felipe Hoffa (Google), Kalev Leetaru (GDELT Project (http://gdeltproject.org/))
    11:50am Patterns and paradigms: Managing semi-structured data with high velocity change for large scale e-commerce Utkarsh B (Flipkart Internet Private Limited), Vinod Venkatraman (Flipkart Internet Private Limited)
    4:00pm Make Tachyon ready for next-gen data center platforms with NVM Mingfei Shi (Intel), Bin Fan (Alluxio)
    4:50pm Estimating financial risk with Spark Sandy Ryza (Cloudera)
    332
    2:20pm TBC
    333
    11:00am HPE Big Data Analytics platform optimized for hadoop Avind Shrivastava (Hewlett Packard Enterprise)
    11:50am Innovation Powering Digital Transformation Paul Marriott (SAP Asia Pacific Japan)
    1:30pm Hadoop everywhere: Geo-distributed storage for big data Nikhil Joshi (EMC, Advanced Software Division), Priya Lakshminarayanan (EMC Corporation)
    7:30am Coffee Break
    Room: Summit 1-2
    8:45am Plenary
    Room: Summit 1-2
    Thursday Keynotes Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
    8:50am Plenary
    Room: Summit 1-2
    Challenges for the Data Ecosystem Doug Cutting (Cloudera)
    9:00am Plenary
    Room: Summit 1-2
    Keynote with Dr. Balakrishnan Vivian Balakrishnan (Government of Singapore )
    9:15am Plenary
    Room: Summit 1-2
    Big data: The way forward, sponsored by Fusionex Ivan Teh (Fusionex)
    9:25am Plenary
    Room: Summit 1-2
    Toward Big Data driven network, sponsored by Huawei Sanqi Li (Huawei)
    9:35am Plenary
    Room: Summit 1-2
    Drive value faster: New optimizations for Big Data and analytics, sponsored by Intel 马子雅 (Ziya Ma) (Intel)
    9:40am Plenary
    Room: Summit 1-2
    Deep Learning Melanie Warrick (Google)
    9:55am Plenary
    Room: Summit 1-2
    State of Spark, and where it is going Reynold Xin (Databricks)
    10:05am Plenary
    Room: Summit 1-2
    How To Stop Worrying and Learn to Love Qualitative Data Farrah Bostic (The Difference Engine)
    10:30am Morning Break Sponsored by Intel
    Room: Concourse 1-4 (Sponsor Pavilion)
    12:30pm Lunch Sponsored by Fusionex
    Room: Concourse 1-4 (Sponsor Pavilion)
    Lunch / Thursday Industry Tables
    3:00pm Afternoon Break
    Room: Concourse 1-4 (Sponsor Pavilion)
    11:00am-11:40am (40m) Data Science and Advanced Analytics
    Building and deploying real time big data prediction models
    Deepak Agrawal (24[7] Inc.)
    This talk is about an application of big data predictive analytics to improve the online customer experience. The application is built using big data infrastructure with Hadoop, Cassandra, and machine learning algorithms using R and Python, that predict customer intent and take actions in real time to deliver an enhanced experience. Key challenges and lessons learned are also discussed.
    11:50am-12:30pm (40m) Data Science and Advanced Analytics
    Graph data analytics in finance: The Mitsubishi experience
    Yuichi Kuroda (Mitsubishi UFJ Information Technology (MUIT))
    In this session, attendees will learn the concepts underlying graph data analytics based on MUFG's experiences. Moreover, it will cover how to analyze huge graph data with Apache Spark GraphX. Finally, it will explore what type of data tends to cause problems and how to solve them.
    1:30pm-2:10pm (40m) Data Science and Advanced Analytics
    Scaling the Python data experience
    Wes McKinney (Two Sigma Investments)
    Many data applications are written in Python or R, but developing and deploying these applications at scale or in production is a pain point for many users. We will discuss our new efforts to bridge the gap between familiar in-memory data tools and distributed data systems. In particular, we are working to enable users to streamline interactions with Hadoop and scalable query engines like Impala.
    2:20pm-3:00pm (40m) Data Science and Advanced Analytics
    The revolution of location: Geospatial applications in marketing research
    Whye Loon Tung (Nielsen)
    Geospatial data is revolutionising the marketing research industry. In this talk, Nielsen researchers will describe how such information is being used by the company to improve internal processes and to give new insights into client behaviour. The goal is to give clients an analytic edge, as will be illustrated through key methodology and insights of recent projects.
    4:00pm-4:40pm (40m) Data Science and Advanced Analytics
    Petascale genomics
    Uri Laserson (Cloudera)
    The advent of next-generation DNA sequencing technologies is revolutionizing life sciences research by routinely generating extremely large data sets. Big data tools developed to handle large-scale internet data (like Hadoop) will help scientists effectively manage this new scale of data, and also enable addressing a host of questions that were previously out of reach.
    4:50pm-5:30pm (40m) Data Science and Advanced Analytics
    Enterprise Deep Learning Workflows with DL4J
    Josh Patterson (Skymind)
    In this session we will take a look at a practical review of what is deep learning and introduce DL4J. We'll look at how it supports deep learning in the enterprise on the JVM. We’ll discuss the architecture of DL4J’s scale-out parallelization on Hadoop and Spark in support of modern machine learning workflows.
    11:00am-11:40am (40m) IoT and Real-time
    Huawei advanced data science with Spark Streaming
    Albert Bifet (Télécom ParisTech), Silviu Maniu (Huawei)
    Real-time analytics are becoming increasingly important to telecommunication operators due to the large amount of data that flows through their networks. Drawing from our experience at Huawei, we present StreamDM, a new open source data mining and machine learning library on top of Spark Streaming. We will present its implemented advanced methods, and demonstrate its ease of use and extensibility.
    11:50am-12:30pm (40m) IoT and Real-time
    Sketching big data with Spark: Randomized algorithms for large-scale data analytics
    Reynold Xin (Databricks)
    In this talk, we introduce a recent effort in Spark to employ randomized algorithms for a number of common, expensive methods: membership testing, cardinality, stratified sampling, frequent items, quantile estimation.
    1:30pm-2:10pm (40m) IoT and Real-time
    Next-generation platforms for IoT-driven contextual awareness
    Markus Kirchberg (Deep Labs Pte. Ltd.)
    In this talk, we will first take a look at current IoT standards, solutions, and common challenges; change management; and near real-time decision-making capabilities that are yet to be adequately addressed.
    2:20pm-3:00pm (40m) IoT and Real-time
    Modeling the smart and connected city of the future with Kafka and Spark
    Eric Frenkiel (MemSQL)
    Eric Frenkiel, CEO/cofounder, MemSQL, will demonstrate a prototype of a futuristic smart city where all household energy devices are tracked in real-time. He will show the challenges, design choices & architecture required to enable urban planners/energy companies to see what is possible for efficient energy consumption through a real-time data pipeline combining Kafka+Spark+an in-memory database.
    4:00pm-4:40pm (40m) IoT and Real-time
    How to improve mobile radio network planning based on a new big data structure analysis
    Vianney Martinez Alcantara (Datameer)
    The fast evolution of services and mobile terminals combined with the aggressive competition between mobile operators is driving a continuous upgrade of the radio access network (RAN). This upgrade process is expensive and time consuming, and it scales with the number of base stations. This talk stresses the importance of the customer and proposes a new methodology for an efficient RAN upgrade.
    4:50pm-5:30pm (40m) IoT and Real-time
    How are your morals? Ethics in algorithms and IoT
    Majken Sander (Majken Sander), Joerg Blumtritt (Datarella)
    Algorithms are what make things "smart." More or less arbitrary, subjective decisions are regularly built into our connected things, when we choose a certain method or set parameters. These underlying value judgments imposed on users are hardly present in the privacy discussion or business point of view. However, they may be more important than the more obvious data collection and security.
    11:00am-11:40am (40m) Data-driven Business
    Pro bono data science in action - rallying the globe to fight global warming
    Oliver Chen (DataKind Singapore)
    Private sector companies are becoming more data-driven, but what does it take to help the social sector become data-driven? DataKind is a global nonprofit that harnesses the power of data science in the service of humanity. Learn about two DataKind Singapore projects that brought together data science volunteers and nonprofit organizations to move the needle in the fight against climate change.
    11:50am-12:30pm (40m) Data-driven Business
    Innovative use of big data technologies to optimise and simplify the IT landscape
    Ken Medlock (ANZ Banking Group)
    ANZ has adopted an innovative approach to drive continuous business and identification of business value opportunities using disruptive and new big data technologies.
    1:30pm-2:10pm (40m) Data-driven Business
    How Uber is using data science to make better strategic financial decisions
    Prakhar Mehrotra (Uber)
    Using data science to make better corporate, financial, and strategic decisions.
    2:20pm-3:00pm (40m) Data-driven Business
    How to tell compelling data stories: Why stories are still important in a data-driven world
    Selene Chew (Adatao)
    In "The Power of Myth," Joseph Campbell distilled the modern story form as "A Hero's Journey." This talk presents the relevance of the story form in data analysis, and shows examples of how to tell insightful data stories.
    4:00pm-4:40pm (40m) Data Science and Advanced Analytics
    How to run Neural Nets on GPUs
    Melanie Warrick (Google)
    This talk will briefly explain what neural nets are and why they’re important, as well as give context about GPUs. Then we will walk through code and launch a neural net on a GPU. I will cover key pitfalls you may hit and techniques to diagnose and troubleshoot. You will walk away understanding how to start using GPUs and where to go for additional help.
    4:50pm-5:30pm (40m) Data-driven Business
    Leveraging data analytics for high performance design
    Rakesh Menon (McLaren Applied Technology)
    High performance design requires the ability to optimally measure a system, understand its working, predict its performance, and continuously refine it. Central to this process is leveraging the data in an intelligent way. This talk explains how this can be achieved.
    11:00am-11:40am (40m) Security & Governance
    12 steps to cloud security
    Vettrivel Viswanathan (Nephos)
    This talk introduces a 12-step guide to help secure a data deployment in the cloud. Using the help of open source solutions and security best practices, you will be familiarized with a simple yet effective framework that can be used to fortify your own data-driven deployment in the cloud against accidental and malicious data breaches.
    11:50am-12:30pm (40m) Security & Governance
    Computational privacy: The privacy bounds of human behavior
    Yves-Alexandre de Montjoye (Imperial College London | MIT Media Lab)
    We're living in an age of big data, a time when metadata about most of our movements and actions are collected and stored in real time. These data offer unprecedented insights on how we behave. Mathematical analysis of metadata, however, reveals how unique our behavior is and how this behavior puts fundamental constraints on our privacy.
    1:30pm-2:10pm (40m) Security & Governance
    How to avoid building a "data swamp": Case studies in data management and governance
    Mark Donsky (Okera), Naren Koneru (Cloudera)
    Find out how the world's most sophisticated Hadoop deployments are addressing data governance challenges head-on, while preserving Hadoop's flexibility, through an integrated data management and governance approach.
    2:20pm-3:00pm (40m) Design, User Experience, Visualization
    Data visualizations decoded
    Julia Rodriguez (Eagle Investment Systems)
    Designing data visualizations presents us with unique and interesting challenges: how to tell a compelling story; how to deliver important information in a forthright, clear format; and how to make visualizations beautiful and engaging. In this talk, Julie will share a few disruptive designs and connect those back to vizipedia, her compiled data visualization library.
    4:00pm-4:40pm (40m) Design, User Experience, Visualization
    Visualising multi-dimensional data
    Amit Kapoor (narrativeVIZ)
    Understand techniques to effectively visualise multi-dimensional data to aid exploratory data analysis. We will look at standard 2D/3D, geometric transformations, glyph-based, pixel-based, and stacking-based approaches to visualise this data, and also explore the interactive approaches needed to make them work.
    4:50pm-5:30pm (40m) Hadoop & Beyond
    Customer record deduplication using Spark and Reifier
    Dave Chan (UBM Asia), Sonal Goyal (Nube)
    UBM Asia is the largest trade show organizer in Asia. To deal with duplicate customer records and ensure clean marketing data, UBM Asia has built an end to end solution using Reifier from Nube Technologies built atop Spark. This talk will discuss UBM's use case and our use of Reifier fuzzy matching engine, Spark and machine learning. We will also cover Reifier's architecture and usage of Spark.
    11:00am-11:40am (40m) Hadoop & Beyond
    GDELT + BigQuery: Understanding global society through SQL
    Felipe Hoffa (Google), Kalev Leetaru (GDELT Project (http://gdeltproject.org/))
    The GDELT Project is a real-time open data global graph over human society, inventorying the world’s events, emotions, and narratives in 65 languages, used by organizations from the UN to Wall Street. Google BigQuery enables real-time querying and whole-of-data analysis of GDELT, such as exploring the cycles of world history through mass cross-correlation.
    11:50am-12:30pm (40m) Hadoop & Beyond
    Patterns and paradigms: Managing semi-structured data with high velocity change for large scale e-commerce
    Utkarsh B (Flipkart Internet Private Limited), Vinod Venkatraman (Flipkart Internet Private Limited)
    Have you faced the challenge of storing and optimally serving multibillion-row EAV modeled data out of a traditional data store? Monolithic data stores fall short, even with fast storage like SSDs for a large online marketplace, quantified here as 3 billion catalog entries and 100 million catalog updates in a day. This talk is about paradigms and patterns we adopted to address this problem.
    1:30pm-2:10pm (40m) Hadoop & Beyond
    Breakthrough OLAP performance on Cassandra and Spark
    Evan Chan (Tuplejump)
    This talk will show architectures and techniques for combining Apache Cassandra and Spark to yield a 10-1000x improvement in OLAP analytical performance, and introduce a new open source database that takes advantage of these techniques.
    2:20pm-3:00pm (40m) IoT and Real-time
    Druid: Power Applications to Analyze Sensor Data
    Fangjin Yang (Imply)
    Organizations frequently rely on dedicated query layers, such as relational databases and key/value stores, for faster query latencies; but these technologies suffer many drawbacks for analytic use cases. In this session, we discuss examine using Druid to power applications designed to analyze sensor data, and why the architecture is well suited for different use cases in “smart cities”.
    4:00pm-4:40pm (40m) Hadoop & Beyond
    Make Tachyon ready for next-gen data center platforms with NVM
    Mingfei Shi (Intel), Bin Fan (Alluxio)
    Current memory size is far from enough to host data sets. NVM has emerged to respond to this need. However, how to integrate NVM to support a modernized big data system is a challenge. In this talk, we present our efforts to make a tiered store in Tachyon, which provided a software solution for next-gen data center platforms with NVM.
    4:50pm-5:30pm (40m) Hadoop & Beyond
    Estimating financial risk with Spark
    Sandy Ryza (Cloudera)
    This talk will cover Spark design patterns in time series analysis, visualizing data, and Monte Carlo simulation; and will show you what it is like to approach financial modeling with Spark.
    11:00am-11:40am (40m) Sponsored
    Reinventing your business using Big Data Analytics
    Isaac Jacob (Fusionex)
    As organization strive to survive and breakthrough other markets, discover how Big Data Analytics can be the key to unlock new business opportunities or simply exploit a company’s full potential in its own competitive space. Companies are increasingly swimming in more and more data so developing the abilities to breathe data will be crucial to stay ahead of competition.
    11:50am-12:30pm (40m) Sponsored
    Advanced analytics with large scale distributed machine learning on Apache Spark
    Shengsheng Huang (Intel)
    In this talk, we will present our efforts on building large scale distributed ML on Apache Spark with many "web-scale" companies, including very complex and advanced analytics applications / algorithms (e.g., topic modelling, deep neural network, etc.), as well as massively scalable learning system/platform leveraging both application and infrastructure specific optimizations.
    1:30pm-2:10pm (40m) Sponsored
    Achieving business transformation with Open Enterprise Hadoop
    Jeff Markham (Hortonworks)
    Attend this session to: *Learn about the key drivers behind the shift to Hadoop based platforms *Understand the common steps in the journey to adopting Hadoop *Hear real-word case studies of business transformation using Hadoop *Get insight into the core components that make up Open Enterprise Hadoop
    2:20pm-3:00pm (40m)
    Session
    To be confirmed
    11:00am-11:40am (40m) Sponsored
    HPE Big Data Analytics platform optimized for hadoop
    Avind Shrivastava (Hewlett Packard Enterprise)
    How different Industry Verticals are using HPE Platform for Hadoop and Big Data Analytics. The session Covers various use cases across Industry Verticals which are implemented by customer's across APJ and why HPE has a Unique Value Proposition for Hadoop Solutions.
    11:50am-12:30pm (40m) Sponsored
    Innovation Powering Digital Transformation
    Paul Marriott (SAP Asia Pacific Japan)
    SAP HANA Vora is a new in-memory query engine that leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop.
    1:30pm-2:10pm (40m) Sponsored
    Hadoop everywhere: Geo-distributed storage for big data
    Nikhil Joshi (EMC, Advanced Software Division), Priya Lakshminarayanan (EMC Corporation)
    This session focuses on strategies and technologies you can use to build a global Hadoop cloud with geo-distributed access and protection for analytics in various use-cases like IoT - handling billions of small files or multi-terabyte files in the same system.
    7:30am-8:45am (1h 15m)
    Break: Coffee Break
    8:45am-8:50am (5m)
    Thursday Keynotes
    Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
    Strata + Hadoop World Program Chairs Roger Magoulas, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
    8:50am-9:00am (10m)
    Challenges for the Data Ecosystem
    Doug Cutting (Cloudera)
    The data century is upon us and Apache Hadoop has emerged as the platform for managing your big data opportunity. The path to success is not without its perils, however, and without a thoughtful approach progress can be hindered by the impact of change, trust and security.
    9:00am-9:15am (15m)
    Keynote with Dr. Balakrishnan
    Vivian Balakrishnan (Government of Singapore )
    Dr. Vivian Balakrishnan is the Singapore Minister for Foreign Affairs and the Minister-in-charge of the Smart Nation Programme Office.
    9:15am-9:25am (10m) Sponsored
    Big data: The way forward, sponsored by Fusionex
    Ivan Teh (Fusionex)
    If you could simulate the results of your business decisions, wouldn't that change the way you manage your business? The availability of big data solutions today introduces new management principles, opportunities as well as challenges.
    9:25am-9:35am (10m) Sponsored
    Toward Big Data driven network, sponsored by Huawei
    Sanqi Li (Huawei)
    With the recent advances of big data and machine learning technologies, there has never been a better time for developing telecom data products. However there are various challenges associated with researching and developing telecom data products at scale.
    9:35am-9:40am (5m) Sponsored
    Drive value faster: New optimizations for Big Data and analytics, sponsored by Intel
    马子雅 (Ziya Ma) (Intel)
    The high volume, velocity, variety and veracity of big data have been pushing for more comprehensive solutions and services to enable decision-making and insight discovery across business market segments. Ziya Ma, General Manager of Big Data Software Technologies in Intel's Software and Services Group, will discuss Intel’s software enabling role for making this possible and easier.
    9:40am-9:55am (15m)
    Deep Learning
    Melanie Warrick (Google)
    Deep Learning is taking hold as a popular machine learning modeling technique because of its real world applications especially with regards to image, signal and language datasets (e.g. medical diagnosis, self-driving cars, real-time language translation). This talk provides an overview of what deep learning is especially around recent applications.
    9:55am-10:05am (10m)
    State of Spark, and where it is going
    Reynold Xin (Databricks)
    In this talk, Reynold will look back and review Spark’s growth in adoption, use cases, and development. He will then look forward and discuss both technical initiatives and the evolution of the Spark community for 2016.
    10:05am-10:20am (15m)
    How To Stop Worrying and Learn to Love Qualitative Data
    Farrah Bostic (The Difference Engine)
    This talk will highlight the top 5 mistakes we make in collecting and analyzing qualitative data, how to do it better, and how it can inspire your next big thing.
    10:30am-11:00am (30m)
    Break: Morning Break Sponsored by Intel
    12:30pm-1:30pm (1h) Event
    Lunch / Thursday Industry Tables
    Industry Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
    3:00pm-4:00pm (1h)
    Break: Afternoon Break