Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Strata + Hadoop World Speakers

New speakers are added continuously. Please check back to see the latest updates to the program.

Search Speakers

Michael Abbott
Michael Abbott (Stanford University)

Mike Abbott is a general partner at Kleiner Perkins Caufield & Byers, where he focuses on investments in the firm’s digital practice, helping entrepreneurs in the social, mobile, and cloud computing sectors rapidly scale teams and ventures. Mike serves as an expert resource on enterprise infrastructure, cloud computing, and big data. He also helps entrepreneurs win the race for talent in a hypercompetitive recruitment environment. Mike is an engineering leader, entrepreneur, and investor and an expert in big data businesses. Formerly the vice president of engineering at Twitter, Mike led the team to rebuild and solidify Twitter’s infrastructure, growing the... Read More.

Joseph Adler
Joseph Adler (Facebook), @jadler

Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, Verisign, and LinkedIn. Currently, he is director of product management and data science at Confluent. He is the holder of several patents for computer security and cryptography and the author of Baseball Hacks and R in a Nutshell. He graduated from MIT with a BSc and MEng in computer science and electrical engineering.

Sarah Aerni
Sarah Aerni (Pivotal)

Sarah Aerni has a background in the field of bioinformatics, developing tools to help biomedical researchers understand their data. She holds a B.S. in biology with a specialization in bioinformatics and minor in French literature from the University of California-San Diego, and an M.S. and Ph.D in biomedical informatics from Stanford University. During her time as a researcher she focused her efforts on building computational models enabling research for a broad range of fields in biomedicine. She also co-founded a startup providing informatics services to researchers and small companies. At Pivotal she works with customers in life science and healthcare,... Read More.

Nidhi Aggarwal
Nidhi Aggarwal (Tamr, Inc.)

Nidhi Aggarwal leads strategy and marketing at Tamr. Prior to joining Tamr, Nidhi founded Cloud vLab, makers of qwikLAB, a software-learning platform used to create and deploy on-demand lab environments. In the years before Cloud vLab, Nidhi worked at McKinsey & Company, advising Fortune 150 companies on big data strategy. Nidhi holds a PhD in computer science from the University of Wisconsin-Madison.

Jaipaul Agonus
Jaipaul Agonus (FINRA)

Jaipaul Agonus is a director in the Market Regulation Technology Department at FINRA. Jaipaul is a big data engineering leader with nearly 18 years of IT industry experience, specializing in big data analytics and cloud-based solutions. He’s currently involved in building next-generation big data market analytic platforms with machine learning, advanced visualization, and contextual access across applications.

John Akred
John Akred (Silicon Valley Data Science), @BigDataAnalysis

With over 15 years in advanced analytical applications and architecture, John Akred is dedicated to helping organizations become more data driven. As CTO of Silicon Valley Data Science, John combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.

Sridhar Alla
Sridhar Alla (BlueWhale)

Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance, with solutions such as predictive modeling and analytics, capacity planning, forecasting, anomaly detection, advanced NLP, chatbot development, SAS to Python migration, and deep learning-based model building and operationalization. Sridhar is also the author of three books and an avid presenter at... Read More.

David Alves

David Alves is a software engineer at Cloudera and a PhD student at UT Austin. He is a committer at the Apache Foundation and in the past has contributed to several open source projects, such as Apache Cassandra and Apache Drill.

Anima Anandkumar
Anima Anandkumar (UC Irvine)

Anima Anandkumar is a principal scientist at Amazon Web Services. Anima is currently on leave from UC Irvine, where she is an associate professor. Her research interests are in the areas of large-scale machine learning, nonconvex optimization, and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. Previously, she was a postdoctoral researcher at MIT and a visiting researcher at Microsoft Research New England. Anima is the recipient of several awards, including the Alfred. P. Sloan fellowship, the Microsoft faculty fellowship, the Google research award, the ARO and AFOSR Young Investigator... Read More.

Jesse Anderson
Jesse Anderson (Big Data Institute), @jessetanderson

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He’s taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He’s widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and... Read More.

Amar Arsikere
Amar Arsikere (, @amar500

Amar Arsikere has a large scale data infrastructure background with 18 years of experience in building software products at several companies including Google and Zynga. He is currently a co-founder and CEO at Amar founded the Systems Engineering Group at Zynga and led the design and deployment of one of the largest in-memory databases there. At Google he pioneered the development of a data warehousing platform on BigTable. This platform successfully replaced the Informatica/Oracle/Microstrategy/QlikView technology stack. Amar is a recipient of the InfoVision award from IEC and the Jars Top 25 award. He holds several patents in... Read More.

Astrid Atkinson

Astrid Atkinson is director of software engineering at Google, where she leads development frameworks. During her 10+ years at Google, Astrid has built infrastructure and managed a variety of engineering teams and spent more than five years on call for She has led teams across the infrastructure map, from the team responsible for running and building Google’s web-serving layer to App Engine and cloud systems to core search.

Fredrik Backner
Fredrik Backner (Telia Company )

Fredrik Backner is Vixe President of Data & Analytics at TeliaSonera, a leading Nordic operator with headquarters in Stockholm, Sweden. In his role Fredrik has globalresponsibility for enabling business value from Data & Analytics across six countries and is tasked with ensuring that big data capabilities are provided to the countries as internal cloud services, ranging from data lakes, advanced analytics and data visualization. Fredrik’s organization also provides data science, analytics and data visualization services and business consultancy to the countries and business units.

Fredrik has a solid entrepreneurial background from initiating and driving large change and improvement programs within... Read More.

Paige Bailey
PyData at Strata Tutorial

Paige Bailey is a senior cloud developer advocate at Microsoft specializing in machine learning and artificial intelligence. Previously, Paige was a data scientist and machine learning engineer in the energy industry (drilling and completions optimization, subsurface characterization). Paige has over a decade of experience doing data analysis with Python and five years of building predictive models with R. She serves on the core committee for JupyterCon and SciPy, is a Python instructor for EdX, founded PyLadies-HTX in Houston, and is currently writing both an introductory children’s book on machine learning and a technical cookbook for machine learning at scale... Read More.

Vishal Bamba
Vishal Bamba (Transamerica), @vishalbamba

Vishal Bamba is vice president of strategy and architecture at Transamerica Technology, where he leads a team focusing on innovation initiatives within the enterprise. Vishal has over 15 years of experience in distributed systems and has led many innovation projects. He has consulted and worked for several companies including Disney, Getty, Northrop, and AIG/SunAmerica. Vishal holds an MS in computer science from the University of Southern California.

Lauralea Banks Edwards
Lauralea Banks Edwards (Washington State University)

Lauralea Banks Edwards is a systems-oriented data analyst who works at the intersection of business and technology. Her research investigates and challenges the ways data creation, storage, and analytics reinforce oversimplified ideas of our social reality. While Ms. Edwards currently wrangles project teams and data within higher education, her previous experience includes co-founding a non-profit, lobbying for the restaurant industry, and building data models for the United States Military Academy at West Point. She holds a BS in behavioral science, a Masters of international affairs from Columbia University, and is currently pursuing a Ph.D. in cultural studies and social thought... Read More.

Cecile Barbaroux
Cecile Barbaroux (Schibsted Classified Media)

Cécile Barbaroux is head of data and insight at Schibsted Classified Media, where she leads a central team of data scientists and engineers with a clear mission to facilitate and inspire data-driven product development. Since joining the company in 2012, she has focused on democratizing access to data and evolving the group data strategy. Before working at Schibsted, Cécile worked as a marketing analyst at AirFrance and Shell, where she initiated a passion for data and business intelligence.

Alexander Barclay
Alexander Barclay (UnitedHealthcare Shared Services)

To be updated

Nenshad  Bardoliwalla

Nenshad Bardoliwalla is the founding vice president of products at Paxata, where he is responsible for product strategy, product management, and product marketing. Nenshad is an executive and thought leader with a proven track record of success leading product strategy, product management, and development in business analytics. Previously, he cofounded Tidemark Systems, Inc., where he drove the market, product, and technology efforts for its next-generation analytic applications built for the cloud through its series C funding; served as vice president for product management, product development, and technology at SAP, where he helped to craft the business analytics vision, strategy,... Read More.

Marie Beaugureau
Marie Beaugureau (O'Reilly Media, Inc. )
Data 101 Tutorial

Marie Beaugureau is the lead data editor for O’Reilly Media.

Alexander Behm (Cloudera)

Alex Behm is a software engineer at Cloudera, working on the Impala team. He holds a PhD in computer science from UC Irvine.

Roy Ben Alta
Roy Ben Alta (Amazon Web Services), @benalt

Roy Ben-Alta is a solution architect and principal business development manager at Amazon Web Services, where he focuses on AI and real-time streaming technologies and working with AWS customers to build data-driven products (whether batch or real time) and create solutions powered by ML in the cloud. Roy has worked in the data and analytics industry for over a decade and has helped hundreds of customers bring compelling data-driven products to the market. He serves on the advisory board of Applied Mathematics and Data Science at Post University in Connecticut. Roy holds a BSc in information systems and an... Read More.

Tim Berglund
Tim Berglund (Confluent), @tlberglund
Data 101 Tutorial

Tim Berglund is the senior director of developer experience with Confluent, where he serves as a teacher, author, and technology leader. Tim can frequently be found speaking at conferences internationally and in the United States. He’s the copresenter of various O’Reilly training videos on topics ranging from Git to distributed systems and is the author of Gradle Beyond the Basics. He tweets as @tlberglund, blogs very occasionally at, and is the cohost of the DevRel Radio podcast. He lives in Littleton, Colorado, with the wife of his youth and their youngest child, the other two having... Read More.

Albert Bifet
Albert Bifet (Télécom ParisTech), @abifet

Albert Bifet is a professor and head of the Data, Intelligence, and Graphs (DIG) Group at Télécom ParisTech and a scientific collaborator at École Polytechnique. A big data scientist with 10+ years of international experience in research, Albert has led new open source software projects for business analytics, data mining, and machine learning at Huawei, Yahoo, the University of Waikato, and UPC. At Yahoo Labs, he cofounded Apache scalable advanced massive online analysis (SAMOA), a distributed streaming machine learning framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning Group,... Read More.

Mikhail Bilenko
Mikhail Bilenko (Microsoft)

Misha Bilenko is the principal researcher leading the Machine Learning Algorithms team in the Cloud+Enterprise division of Microsoft. Before that, he worked for seven years in the Machine Learning Group at Microsoft Research, where he collaborated with a number of product groups on applied ML algorithms, systems, and tools. Misha joined Microsoft in 2006 after receiving his Ph.D. in computer science from the University of Texas at Austin. He co-edited Scaling Up Machine Learning, published by Cambridge University Press, and his work has received best paper awards from KDD and SIGIR. His research interests include parallel and distributed... Read More.

Sarah Bird
Sarah Bird (Aptivate)
PyData at Strata Tutorial

After a brief spell designing ejection seats for fighter jets, Sarah Bird’s career turned to applying technology to international development. She has worked in many sectors including mobile health and data collection in Pakistan, Peru, Haiti, and elsewhere. Having always dabbled in software in her spare time, in 2012 Sarah gave in and became a full-time software developer. She is now a full-stack web developer at Aptivate, a non-profit that builds IT solutions for the international development sector.

David Blei
David Blei (Columbia University)

David Blei is a professor of statistics and computer science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.

David earned his bachelor’s degree in computer science and mathematics from Brown University (1997) and his PhD in computer science from the University of California, Berkeley (2004). Before arriving at Columbia, he was an associate professor of computer... Read More.

Ryan Blue
Ryan Blue (Cloudera)

Ryan Blue is a software engineer at Cloudera, currently working on the Kite SDK team.

David Boardman

David Boardman is a senior interaction design lead at IDEO New York, where he guides teams designing interactions across multiple touch points that elevate people’s experiences and innovate businesses. David has contributed in bringing to life several complex digital ecosystems for a broad set of industries including healthcare, the public sector, finance, and media, for clients such as the US Department of State, WebMD, UBS, Sky Television, Telstra, Hewlett-Packard, Cisco Systems, the Clinton Global Initiative, Nokia, and Telefónica. David has also worked as a design consultant at frog, a global design innovation firm, and has been involved as... Read More.

Ron Bodkin

Ron Bodkin is a technical director on the applied artificial intelligence team at Google, where he provides leadership for AI success for customers in Google’s Cloud CTO office. Ron engages deeply with Global F500 enterprises to unlock strategic value with AI, acts as executive sponsor with Google product and engineering to deliver value from AI solutions, and leads strategic initiatives working with customers and partners. Previously, Ron was the founding CEO of Think Big Analytics, a company that provides end-to-end support for enterprise big data, including data science, data engineering, advisory, and managed services and frameworks such as... Read More.

Farrah Bostic
Farrah Bostic (The Difference Engine), @farrahbostic

Farrah Bostic is the founder of the Difference Engine, which she created based on her belief that deep understanding of customer needs is essential to growing businesses through great products and services. Farrah has honed her customer-centric insights as an advisor to some of the world’s most respected brands, including Apple, Microsoft, Disney, Samsung, and UPS. Previously, she began her career as a creative and then went on to be a strategist at leading agencies, including Wieden+Kennedy, TBWA\Chiat\Day, Mad Dogs & Englishmen, and Digitas, where she was group planning director and mobile strategy lead; she also ran innovation... Read More.

danah boyd
danah boyd (Microsoft Research | Data & Society), @zephoria

danah boyd is the founder and president of Data & Society, a research institute focused on understanding the role of data-driven technologies in society, a principal researcher at Microsoft Research, and a visiting professor in NYU’s Interactive Telecommunications Program. danah’s research focuses on the intersection of technology, society, and policy. She is currently doing work on questions related to bias in big data and artificial intelligence, how people negotiate privacy and publicity, and the social ramifications of using data in education, criminal justice, labor, and public life. For over a decade, she examined how American youth incorporate social media into... Read More.

David Boyle
David Boyle (Audience Strategies), @beglen

David Boyle is passionate about helping businesses to build analytics-driven decision making to help them make quicker, smarter, and bolder decisions. Previously, he built global analytics and insight capabilities for a number of leading global entertainment businesses covering television (the BBC), book publishing (HarperCollins Publishers), and the music industry (EMI Music), helping to drive each organization’s decision making at all levels. He builds on experiences working to build analytics for global retailers as well as political campaigns in the US and UK, in philanthropy, and in strategy consulting.

Mary Yoko Brannen
Mary Yoko Brannen (CLIA Consulting), @MaryYokoBrannen

Mary Yoko Brannen is the Jarislowsky East Asia (Japan) chair at the Centre for Asia Pacific Initiatives, professor of international business and research director at the University of Victoria Gustavson School of Business, and holds a visiting professorship of strategy and management at INSEAD in Fontainebleau, France. She is also deputy editor of the Journal of International Business Studies—the highest ranked journal in the field of IB. She received her M.B.A. with emphasis in international business and Ph.D. in organizational behavior with a minor in cultural anthropology from the University of Massachusetts at Amherst, and a B.A. in comparative... Read More.

Richard Brath
Richard Brath (Uncharted Software), @rkbrath

Richard Brath is a partner at Uncharted Software. Richard has been designing and building innovative information visualizations for 20 years, ranging from one of the first interactive 3D financial visualizations on the web in 1995 to visualizations embedded in financial data systems used every day by thousands of market professionals. Richard is pursuing a PhD in new data visualization techniques at LSBU.

Jenelle Bray
Jenelle Bray (LinkedIn)

Jenelle Bray is a staff data scientist at LinkedIn on the Security team, where she builds models to detect and prevent fraudulent and abusive behavior, including scraping and fake accounts. Jenelle has a PhD in computational chemistry from Caltech, where she developed methods to predict membrane protein structures. She then moved to Stanford University as a postdoctoral fellow, where she designed algorithms to study large-scale protein motion and to predict small molecule binding in proteins.

Eric Brewer

Eric Brewer is a vice president of infrastructure at Google. He pioneered the use of clusters of commodity servers for internet services based on his research at Berkeley. His CAP theorem covers basic trade-offs required in the design of distributed systems and followed from his work on a wide variety of systems from live services to caching and distribution services and to sensor networks. He’s a member of the National Academy of Engineering and winner of the ACM Infosys Foundation award for his work on large-scale services. Eric was named a “Global Leader for Tomorrow” by the World... Read More.

Peter Brodsky
Peter Brodsky (HyperScience), @brodsky13

Peter Brodsky is a middle school dropout, a college graduate, and a PhD dropout. Peter built and sold his first company and is now building second company.

Kurt Brown
Kurt Brown (Netflix)

Kurt Brown leads the data platform team at Netflix, which architects and manages the technical infrastructure underpinning the company’s analytics, including various big data technologies like Hadoop, Spark, and Presto, machine learning infrastructure for Netflix data scientists, and traditional BI tools including Tableau.

Andrew Brust
Andrew Brust (Datameer)

Andrew Brust is senior director, technical product marketing and evangelism at Datameer, and writes a blog for ZDNet called Big on Data. Andrew is co-author of Programming Microsoft SQL Server 2012 (Microsoft Press); an advisor to NYTECH, the New York Technology Council; and writes the Redmond Review column for

Michael Bui
Michael Bui (Adatao, Inc.)

Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.

Dan Burkert (Cloudera)

Dan Burkert is a software engineer at Cloudera. Previously he worked at WibiData and Near Infinity.

Călin-Andrei Burloiu

Călin-Andrei Burloiu has worked at Avira since 2013 as a big data engineer. His interest in this area started in 2012 during an internship at the National University of Singapore, where he first made contact with the Hadoop ecosystem and big data while working on a source code search engine. Călin-Andrei has a master in computer science. He has a passion for distributed systems and recently became interested in data science.

Joe Caserta
Joe Caserta (Caserta Concepts), @CasertaConcepts

Joe Caserta is president of Caserta Concepts, an award-winning New York-based innovation consulting and technology implementation firm specializing in big data analytics, data warehousing, business intelligence solutions, and helping clients maximize data value. A recognized big data strategy consultant, author, and educator, Joe is coauthor of the best-selling book The Data Warehouse ETL Toolkit (Wiley, 2004), a contributor to industry publications, and frequent keynote speaker and expert panelist at industry conferences and events. He also serves on the advisory boards of financial and technical institutions and is the organizer and host of the Big Data Warehousing Meetup group in... Read More.

Maciej  Ceglowski
Maciej Ceglowski (, @pinboard
Haunted by data Keynote

Maciej Ceglowski is the founder and sole employee of Pinboard, a personal web archive and bookmarking site with an emphasis on user privacy. He’s been an outspoken advocate of small pay-for-service websites as an alternative to the hype and impermanance of Silicon Valley startup culture. He has also spoken extensively about the dangers of universal surveillance as a business model and the need to decentralize the Internet. Before founding Pinboard in 2009, Ceglowski worked as an engineer at a variety of tech companies, most notably Yahoo. He lives and works in San Francisco.

Jagdish Chand
Jagdish Chand (Apigee)

Jagdish Chand is VP technology for big data predictive analytics at Apigee. Previously, he served as director of engineering at Yahoo! before co-founding predictive analytics company InsightsOne, where he served as VP engineering until it was acquired by Apigee in 2013. At Apigee, he successfully integrated the InsightsOne engineering team and renamed the product as Apigee Insights. Jagdish continues to drive Apigee Insights adoption with customers and leads advanced product development for big data predictive analytics.

Jennifer Chayes
Jennifer Chayes (Microsoft Research), @jenniferchayes

Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Before joining Microsoft in 1997, Chayes was for many years professor of mathematics at UCLA. Chayes is the author of over 125 academic papers and holds over 30 patents. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, graph algorithms and algorithmic game theory.

Chayes received her B.A. in biology and physics at Wesleyan University, where... Read More.

Jerry Chen
Jerry Chen (Greylock)

Jerry Chen is a partner at Greylock where he invests in new enterprise applications and in all aspects of cloud and application infrastructure. Prior to joining Greylock, Jerry was vice president of cloud and application services at VMware, where he was part of the executive team that scaled the company from 400 to over 15,000 employees and $5B in revenue. During his nine years at VMware, he launched dozens of products including several “1.0” releases, and started two new business units for VMware including the Cloud Application Platform and the Enterprise Desktop business units. In particular, Jerry enjoys the challenge... Read More.

Roger Chen
Roger Chen (Computable), @rgrchen

Roger Chen is cofounder and CEO of Computable and program chair for the O’Reilly Artificial Intelligence Conference. Previously, he was a principal at O’Reilly AlphaTech Ventures (OATV), where he invested in and worked with early-stage startups primarily in the realm of data, machine learning, and robotics. Roger has a deep and hands-on history with technology. Before startups and venture capital, he was an engineer at Oracle, EMC, and Vicor. He also developed novel nanoscale and quantum optics technology as a PhD researcher at UC Berkeley. Roger holds a BS from Boston University and a PhD from UC... Read More.

Ewen Cheslack-Postava

Ewen Cheslack-Postava is an engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. Ewen received his PhD from Stanford University, where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity and described a system for efficiently processing these queries at scale.

Anant Chintamaneni

Anant Chintamaneni is vice president of products at BlueData, where he is responsible for product management and focuses on helping enterprises deploy big data technologies such as Hadoop and Spark. Anant has more than 15 years’ experience in business intelligence, advanced analytics, and big data infrastructure. Previously, Anant led the product management team for Pivotal’s big data suite.

Alan Choi
Alan Choi (Cloudera)

Alan Choi is a software engineer at Cloudera working on the Impala project. Previously, he worked at Greenplum on the Greenplum-Hadoop integration and worked extensively on PL/SQL and SQL at Oracle.

Tanzeem Choudhury
Tanzeem Choudhury (Cornell and HealthRhythms)

Tanzeem Choudhury received her Ph.D. from the Media Laboratory at the Massachusetts Institute of Technology. As part of her doctoral work, she created the sociometer and conducted the first experiment that uses mobile sensors to model social networks, which led to a new field of research referred to as Reality Mining. She holds a B.S. in electrical engineering from the University of Rochester, and an M.S. from the MIT Media Laboratory.

Miklos Christine
Miklos Christine (Databricks), @Miklos_C

Miklos Christine is a solutions engineer for Databricks. Miklos was previously a system engineer at Cloudera where he helped strategic customers deploy and use the Apache Hadoop ecosystem in production. He has contributed to several projects in the open source community, previously worked on the design and implementation of the system infrastructure for the OS that runs on Cisco’s routers and switches, and holds a BS in electrical engineering and computer sciences from the University of California-Berkeley.

Phil  Cloud
Phil Cloud (Continuum)
PyData at Strata Tutorial

Phillip Cloud is a software engineer at Continuum Analytics. He started doing open source work by contributing heavily to Pandas. Now he works mostly on Blaze and its associated libraries, along with a bit of consulting. He enjoys building data-related tools that help people get their jobs done.

Chris Colbert
Chris Colbert (Anaconda Powered by Continuum Analytics)
PyData at Strata Tutorial

Chris is a software architect for Continuum Analytics, and is based in the
New York City area. He has worked previously for top Wall St. firms and was
the lead designer of the UI framework for a front office trading platform.
He is the creator of the PhosphorJS and Nucleic projects which provide
libraries for developing enterprise quality applications on the desktop and in
the browser. He received his MS in Mechanical Engineering from the University
of South Florida.

Raymond Collins
Raymond Collins (TE Connectivity), @raymondcollins3

Raymond Collins has led and implemented data integration projects and analytics projects for companies like Sony, Veterans Affairs, Bausch & Lomb, TE Connectivity, and Rolls Royce.

Ben Collins-Sussman
Debugging teams Cultivate

Ben Collins-Sussman is the engineering site lead for Google’s Chicago office. A founding developer of the Subversion version control system, he co-authored O’Reilly’s Version Control with Subversion book as well as Team Geek. Since joining Google in 2005, he has led engineering teams for Google Code, Google Affiliate Network, the DFP advertising platform, and now manages teams working on the serving stack for Google Search.

Ben collects hobbies that explore the tension between art and science. He has given numerous conference talks about the social challenges of software development. He writes interactive fiction games and tools, and was the... Read More.

Jacomo Corbo
Jacomo Corbo (QuantumBlack), @jacomocorbo

Jacomo Corbo is the chief scientist for QuantumBlack, a visual analytics firm that helps clients meet the analysis challenges of big data to make better decisions. Corbo is also the Canada Research Chair in Information and Performance Management at the University of Ottawa, and a Wharton Clayright Scholar at the University of Pennsylvania’s Wharton School of Business. His research has been funded by grants from the National Research Council, the Alfred P. Sloan Foundation, the Wharton Mack Center for Technological Innovation, the Wharton Customer Analytics Initiative, as well as by companies such as GE Finance and IBM.

Between January... Read More.

Elliott Cordo
Elliott Cordo (Caserta Concepts, LLC)

Elliott is a big data, data warehouse, information management and technology innovation expert with a passion for helping transform data into powerful information. He has more than a decade of experience in implementing tailored big data and data warehouse solutions with hands-on experience in every component of the data warehouse software development lifecycle. At Caserta Concepts, Elliott oversees large-scale major technology projects, including those involving cloud, business intelligence, data analytics, big data and data warehousing.
Elliott is recognized for his many successful Big Data projects ranging from Big Data Warehousing, Machine Learning, with his personal favorite, Recommendation Engines. His... Read More.

Samuel Cozannet is a technology enthusiast, solution-oriented get-things-done professional, with a track record in product and program management. He has a passion for innovation and believes technology can make the world a better place. He spends most of his time and energy driving the adoption of IoT and big data technologies by companies and enterprises of all sizes and industries.


  • Graduate of the Ecole Polytechnique (Engineering degree, majors in economics and physics)
  • Graduate of Ecole des Mines de Paris / Ecole des Ponts & Chaussées (Master’s degree in management of innovation, ranked 1st)
Jim Crist (Continuum Analytics)
PyData at Strata Tutorial
Charlie Crocker
Charlie Crocker (Autodesk)

Charlie Crocker is a data geek with 20 years of experience bringing data out of the shadows to drive business value and optimize operational costs. At Autodesk, he is currently working across divisions to identify and validate potential reliable data sources and access mechanisms, while also focusing on delivering real-time analytics to stakeholders. Prior to Autodesk, Charlie was a partner in a startup focused on spatial databases and web-based tools for state and local government agencies and utility companies.

Alistair Croll
Alistair Croll (Solve For Interesting), @acroll
Closing remarks Keynote

Alistair Croll is an entrepreneur with a background in web performance, analytics, cloud computing, and business strategy. In 2001, he cofounded Coradiant (acquired by BMC in 2011) and has since helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and several other early-stage companies. He works with startups on business acceleration and advises a number of larger companies on innovation and technology. A sought-after public speaker on data-driven innovation and the impact of technology on society, Alistair has founded and run a variety of conferences, including Cloud Connect, Bitnorth, and the International Startup Festival, and is the chair of O’Reilly’s Strata Data... Read More.

Michael Crutcher
Michael Crutcher (Cloudera)

Michael Crutcher is the director of product management at Cloudera, where he is responsible for the direction of Cloudera’s storage products, which include HDFS, HBase, and Parquet. He’s also responsible for managing strategic partnerships with storage vendors.

JD Cryans
JD Cryans (Cloudera)

JD Cryans is a software engineer at Cloudera and an Apache HBase PMC member.

Kristi Cunningham
Kristi Cunningham (Capital One)

Kristi Cunningham leads the Enterprise Data Management (EDM) group within the Risk Management organization at Capital One. Her responsibilities include setting policy and standards for effective data quality management across the enterprise, monitoring compliance to standards, building data management competency, and providing effective data management solutions for the organization. A primary responsibility involves being a change leader for the organization in building effective data management practices into everyone’s day-to-day responsibilities and job functions.

Nick Curcuru
Nick Curcuru (Mastercard)

Nick Curcuru is vice president of enterprise information management at Mastercard, where he’s responsible for leading a team that works with organizations to generate revenue through smart data, architect next-generation technology platforms, and protect data assets from cyberattacks by leveraging Mastercard’s information technology and information security resources and creating peer-to-peer collaboration with their clients. Nick brings over 20 years of global experience successfully delivering large-scale advanced analytics initiatives for such companies as the Walt Disney Company, Capital One, Home Depot, Burlington Northern Railroad, Merrill Lynch, Nordea Bank, and GE. He frequently speaks on big data trends and data security strategy... Read More.

Doug Cutting
Doug Cutting (Cloudera), @cutting
Closing remarks Keynote

Doug Cutting is the chief architect at Cloudera and the founder of numerous successful open source projects, including Lucene, Nutch, Avro, and Hadoop. Doug joined Cloudera from Yahoo, where he was a key member of the team that built and deployed a production Hadoop storage-and-analysis cluster for mission-critical business analytics. Doug holds a bachelor’s degree from Stanford University and sits on the board of the Apache Software Foundation.

Timothy Danford
Timothy Danford (Tamr, Inc.)

Timothy Danford is a computer scientist working on advanced automation approaches to big data variety in the pharmaceutical and healthcare industries. Previously, Timothy was a software architect, engineer, and founding team member for Genome Bridge LLC, a Broad Institute subsidiary organized to develop cloud-based SaaS genomic analysis pipelines. He has experience in developing data-management services, applications, and ontologies for bioinformatics and genomics systems at Novartis and Massachusetts General Hospital. As a PhD student in computer science at MIT CSAIL, he focused on computational functional genomics. He is a contributor to ADAM, an open source project... Read More.

Tathagata Das
Tathagata Das (Databricks)

Tathagata Das is an Apache Spark committer and a member of the PMC. He is the lead developer behind Spark Streaming, which he started while a PhD student in the UC Berkeley AMPLab, and is currently employed at Databricks. Prior to Databricks, Tathagata worked at the AMPLab, conducting research about data-center frameworks and networks with Scott Shenker and Ion Stoica.

Michael Dauber
Michael Dauber (Amplify Partners), @dauber

Michael Dauber is a general partner at Amplify Partners. Previously, Mike spent over six years at Battery Ventures, where he led early-stage enterprise investments on the West Coast, including Battery’s investment in a stealth security company that is also in Amplify’s portfolio. Mike has served on the boards of a number of companies, including Continuuity, Duetto, Interana, and Platfora. Mike’s investments include Splunk and RelateIQ, which was recently acquired by Salesforce. Mike began his career as a hardware engineer at a startup and held product, business development, and sales roles at Altera and Xilinx. Mike is a frequent speaker at... Read More.

Margaret Dawson

A 20-year tech industry veteran, Margaret leads global product marketing for the Integrated Solutions business unit at Red Hat. She is a frequent author and speaker on cloud computing, big data, open source, women in tech, and the intersection of business and technology. Margaret is a proven entrepreneur and intrapreneur, having led successful programs and teams at several startups, such as Aventail and Hubspan, and Fortune 500 companies, including Amazon, Microsoft, and HP. Prior to Red Hat, she was VP of Product Marketing and Cloud Evangelist for HP Helion, the cloud computing division of Hewlett-Packard. Her passions include agile marketing,... Read More.

Vincent Dell'Anno
Vincent Dell'Anno (Accenture)

Based in Denver, Vincent Dell’Anno is managing director, Information Management-Data Supply Chain, Accenture Analytics, now a part of Accenture Digital. He also serves as a member of the Accenture Analytics global leadership team. As Accenture’s Data Supply Chain lead, Vincent manages a global team of technologists and data scientists that leverage new and emerging technologies to help clients manage large volumes of data to drive high performing analytic-driven outcomes, cost effectively, at scale. Vince has a BA in economics from Dickinson College and an MBA from the George Washington University School of Business.

Michael DePrizio (Akamai Technologies)

Senior Architect at Akamai Technologies

Matthew Derda
Matthew Derda (Pepsi)

Matt Derda is a CPFR analyst with PepsiCo Customer Supply Chain. CPFR, which stands for collaborations, planning, forecasting, and replenishment, is a new program in PepsiCo Customer Supply Chain, and Matt has had the opportunity to be a part of the piloting group. Through CPFR, Matt and his team have delivered improved forecast accuracy and fill rates by expanding collaboration with customers and leveraging shared data to provide best-in-class service. Matt’s team has built multiple “CPFR Tools” that use large datasets to drive the program forward.

Adam Devine
Adam Devine (WorkFusion), @workfusion

Adam Devine leads product marketing for WorkFusion, a SaaS platform for collecting, cleansing, and controlling data. Adam has 15 years of experience growing businesses through product marketing, including product positioning, market intelligence, messaging, and content creation. He began his career in management consulting at BearingPoint’s Banking & Capital Markets practice. Adam speaks frequently about human-machine collaboration, machine learning, and automation at conferences, including FIMA, FISD, Massolution, MarketTech, NAFIS, NFAIS, and SIIA.

Vasant Dhar

Vasant Dhar is professor, Stern School of Business and Center for Data Science at New York University, and founder of SCT Capital Management. He created the Adaptive Quant Trading (AQT) program, a data-driven learning machine that trades the world’s most liquid futures contracts systematically. Dhar has written over 100 research articles and dozens of opinion editorials in media including the Financial Times, Wall Street Journal, Forbes, and Wired Magazine. He is editor-in-chief of the Big Data journal.

Robby Dick
Robby Dick (BMC Software), @robbydbmc

Robby Dick has been working with the workload automation discipline in various capacities since 1994.

Anthony Dina
Anthony Dina (Dell EMC)

Anthony Dina serves as the director of enterprise technologists at Dell, Inc., where he leads a team of solutions architects with expertise in big data and application acceleration to work with customers on how to transform IT into better business outcomes. Anthony has 17 years in the IT industry and has held a number of executive director of strategy and director of solutions marketing titles. Some of his successes include ramping the blades business to number one, launching the first Opteron server, and championing virtual IO solutions, all within 10 years. Anthony holds a masters of business administration from the... Read More.

Sheetal Dolas
Sheetal Dolas (Hortonworks)

Sheetal Dolas is a principal architect working with Hortonworks with strong expertise in the Hadoop ecosystem and rich field experience. He helps small to large enterprises solve their business problems strategically and functionally as well as at scale by using big data technologies. Sheetal has over 14 years of strong IT experience and has served in key positions as lead big data architect, SOA architect, and technology architect in multiple large and complex enterprise programs. He has extensive knowledge of big data/NoSql technologies including Hadoop, Hive, Pig, HBase, Storm, Kafka etc., and has been working in this space for... Read More.

Mark Donsky
Mark Donsky (Okera)

Mark Donsky leads product management at Okera, a software provider that provides discovery, access control, and governance at scale for today’s modern heterogeneous data environments. Previously, Mark led data management and governance solutions at Cloudera, and he’s held product management roles at companies such as Wily Technology, where he managed the flagship application performance management solution, and Silver Spring Networks, where he managed big data analytics solutions that reduced greenhouse gas emissions by millions of dollars annually. He holds a BS with honors in computer science from the Western University, Ontario, Canada.

Allen Downey
Allen Downey (Olin College of Engineering), @allendowney

Allen Downey is a professor at Olin College and the author of Think Python, Think Stats, Think Bayes, and more. He writes about statistics in his blog Probably Overthinking It.

Michael Droettboom
Michael Droettboom (Space Telescope Science Institute)
PyData at Strata Tutorial

Michael Droettboom is a main contributor to matplotlib, the premier plotting library in the scientific Python ecosystem. He is the creator of “airspeed velocity” for benchmarking Python projects over time, the author of Understanding JSON Schema, and a primary contributor to astropy.

Chris DuBois

Chris DuBois is a data scientist focused on building tools for other data scientists. At Dato, Chris has helped design and implement tools for creating recommendation systems and for large-scale text analysis. His current work makes it simpler to train models that generalize well. After studying applied mathematics at Pomona College, he earned a PhD in statistics from the University of California, Irvine, where he researched latent variable models for social-network data occurring over time.

Vladimir Dubovskiy
Vladimir Dubovskiy (

Vlad is a Chief Data Scientist at Aside from working with “datasets that change mindsets”, Vlad likes good design, nature and backpacking. Before he was a co-founder at The Unreasonable Institute and Startup Festival, India. He’s currently learning construction by building a DIY tiny house on wheels.

Ted Dunning

Ted Dunning is the chief technology officer at MapR. He’s also a board member for the Apache Software Foundation; a PMC member and committer of the Apache Mahout, Apache Zookeeper, and Apache Drill projects; and a mentor for various incubator projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect... Read More.

Gary Dusbabek
Gary Dusbabek (Silicon Valley Data Science)

An Apache Cassandra committer and PMC member, Gary Dusbabek specializes in building distributed systems. His recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.

Jana Eggers
Jana Eggers (Nara Logics), @jeggers

Jana Eggers is CEO of Nara Logics, a neuroscience-inspired artificial intelligence company providing a platform for recommendations and decision support. A math and computer nerd who took the business path, Jana has had a career that’s taken her from a three-person business to fifty-thousand-plus-person enterprises. She opened the European logistics software offices as part of American Airlines, dove into the internet in ’96 at Lycos, founded Intuit’s corporate Innovation Lab, helped define mass customization at Spreadshirt, and researched conducting polymers at Los Alamos National Laboratory. Her passions are working with teams to define and deliver products customers love, algorithms... Read More.

Mike Emerick

Mike Emerick is currently acting as the healthcare industry architect for MapR technologies. Prior to this Michael was the co-founder of the IBM Healthcare Transformation Lab, where he focused on the buildout of large healthcare infrastructures and healthcare data analytics. His work included federal level health information exchanges for Australia, China, and Canada, and regional health information exchanges throughout the U.S. Mike worked on building out business and technical models for health benefit exchanges and accountable care organizations. He also worked on early solutions for genomic optimized care for patients with HIV/AIDS using HPC architectures. Mike’s... Read More.

Tim Estes
Tim Estes (Digital Reasoning)

Tim Estes is the president and founder of Digital Reasoning, a leader in trusted cognitive computing. Driven by the belief that all software can learn and that all people should have access to it, Tim and his team work closely with leaders in government and industry to solve extraordinarily valuable and morally compelling problems in national security, finance, healthcare, and other markets by automating the understanding of human communication.

Robert Eve

Bob leads Technical Marketing for Cisco’s Data Virtualization (formerly Composite Software) and Analytics Business Units. In this role Bob guides thought leadership, analyst relations and new market penetration efforts.
Bob was the EVP of Marketing at Composite Software for seven years prior to its acquisition by Cisco in 2013. At Composite, Bob established data virtualization as a category and Composite as the market leader including co-authoring the first book on Data Virtualization, Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility.
Bob’s has driven multiple market transitions including creation of the Data Virtualization, and... Read More.

Hossein Falaki
Hossein Falaki (Databricks Inc.)

Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).

Vivek Farias
Vivek Farias (Celect), @celect

Vivek Farias is chief technology officer and co-founder of Celect. He is the Robert N. Noyce Professor of Management at MIT’s Sloan School. His research has led to numerous innovations in operations, supply-chain, and yield management. Prior to academia he worked in algorithmic finance. He received his PhD in electronic engineering at Stanford.

Bob Filbin
Bob Filbin (Crisis Text Line), @bobfilbin

Bob Filbin is chief data scientist at Crisis Text Line, the first large-scale 24/7 national crisis line for teens on the medium they use and trust most: texting. Bob specializes in the application of behavioral psychology to questions of data collection, analysis, and reporting, to make sure data leads to good behavioral change. Bob has given lectures on using data to drive behavioral change at places including MIT, the University of Pennsylvania, and the North American International Auto Show, and has authored several articles in the Harvard Business Review on data. He runs in Prospect Park.

Andrew First
Andrew First (Lean Plum)

Andrew is the CTO and Co-founder of Leanplum, based in San Francisco. Leanplum is solving personalization on mobile by empowering companies to engage with their users via targeted messages and user experience optimization. Before Leanplum, Andrew was a Software Engineer at Google, working on optimizing video ad revenue. He graduated from Duke University with a BS in Electrical and Computer Engineering and Computer Science.

Brian Fitzpatrick
Debugging teams Cultivate

Brian started Google’s Chicago engineering office in 2005 and led several of Google’s global engineering efforts, including the Data Liberation Front, and Transparency Engineering. He also served as internal advisor for Google’s open data efforts, having previously led the Google Code and Google Affiliate Network teams. Prior to joining Google, Brian worked as an engineer at CollabNet, Apple, and a local Chicago development shop.

Brian first started contributing to open source software in 1998 and was a core Subversion developer from 2000 to 2005 as well as the lead developer of the cvs2svn utility. He is a member of the... Read More.

Camille Fournier
Camille Fournier (Independent), @skamille

Camille Fournier is the former head of engineering at Rent the Runway. She was previously a vice president at Goldman Sachs. Camille is an Apache ZooKeeper committer and PMC member and a Dropwizard framework PMC member.

Martin Fowler
Martin Fowler (ThoughtWorks), @martinfowler

Martin Fowler is an author, speaker, consultant, and self-described loud-mouthed pundit on the topic of software development. He works for ThoughtWorks, a software delivery company, where he has the exceedingly inappropriate title of chief scientist. Martin has written half-a-dozen books on software development, including Refactoring and Patterns of Enterprise Application Architecture. He writes regularly about software development on Martin’s main interest is to understand how to design software systems to maximize the productivity of development teams, which includes both the patterns of good software design and the processes that support software design. He has become a big fan... Read More.

Mimi Fox Melton

The proud offspring of Haitian immigrants and Kentucky farmers, Mimi’s work for social, racial, and economic justice meets at the intersection of immigrants, women of color, & low-income communities. In her role at CODE2040, she oversees student-facing programming including the annual Fellows Program and the Technical Applicant Prep suite of programs and tools. Before CODE2040, Mimi was the Executive Director at Code for Progress, a non-profit coding bootcamp that pays adults of color to learn to code and helps them start careers in tech.

Mimi grew up in New York, and is a recent transplant to the Bay Area by... Read More.

Bill Franks
Bill Franks (Teradata Corporation)

Bill Franks is chief analytics officer for Teradata, providing insight on trends in the analytics and big data space, and helping clients understand how Teradata and its analytic partners can support their efforts. In addition, Bill is a faculty member of the International Institute for Analytics and the author of the book Taming the Big Data Tidal Wave (John Wiley & Sons, Inc., April 2012).

He is also an active speaker and blogger. Bill’s focus has always been to help translate complex analytics into terms that business users can understand, and to then help an organization implement the results effectively... Read More.

Michael Freeman
Michael Freeman (University of Washington), @mf_viz

Michael Freeman is a senior lecturer at the Information School at the University of Washington, where he teaches courses on data science, data visualization, and web development. With a background in public health, Michael works alongside research teams to design and build interactive data visualizations to explore and communicate complex relationships in large datasets. Previously, he was a data visualization specialist and research fellow at the Institute for Health Metrics and Evaluation, where he performed quantitative global health research and built a variety of interactive visualization systems to help researchers and the public explore global health trends. Michael is interested... Read More.

Chris Fregly
Chris Fregly (PipelineAI), @cfregly

Chris Fregly is an AWS Technical Evangelist for Machine Learning and AI based in San Francisco. He is founder of the Advanced KubeFlow Meetup and author of the O’Reilly Video Series titled, “High Performance TensorFlow in Production.” Previously, Chris was Founder and Product Manager at PipelineAI where he worked with many small startups and large enterprises to optimize and tune their ML/AI pipelines.

Calvin French-Owen
Calvin French-Owen (Segment)

Co-Founder at Segment

Eric Frenkiel

Eric Frenkiel is the cofounder and CEO of MemSQL, an in-memory distributed database that combines real-time and historical big data analytics. MemSQL is a Y Combinator company that has raised more than $45M in venture capital. Prior to MemSQL, Eric worked at Facebook on partnership development. He has worked in various engineering and sales engineering capacities at both consumer and enterprise startups. Eric is a graduate of Stanford University’s School of Engineering. In 2011 and 2012, Eric was named to Forbes’s 30 under 30 list of technology innovators.

Venky Ganti
Venky Ganti (Alation)

Venky Ganti has been a data enthusiast since graduate school, and has enjoyed working at various levels of the data analysis stack. At Google, he was an avid data consumer who helped engineer innovative data products that now generate over one billion dollars in yearly revenue. At Microsoft, he worked on advanced data quality infrastructure in ETL platforms. Venky started out working on advanced data analysis and mining technology during his PhD at the University of Wisconsin-Madison. Venky thoroughly enjoys spending time with his family, going on walks, and roller-blading, when he feels adventurous.

Yael Garten
Yael Garten (LinkedIn)
Data 101 Tutorial

Yael Garten is director of data science at LinkedIn, where she leads a team that focuses on understanding and increasing growth and engagement of LinkedIn’s 400 million members across mobile and desktop consumer products. Yael is an expert at converting data into actionable product and business insights that impact strategy. Her team partners with product, engineering, design, and marketing to optimize the LinkedIn user experience, creating powerful data-driven products to help LinkedIn’s members be productive and successful. Yael champions data quality at LinkedIn; she has devised organizational best practices for data quality and developed internal data tools to democratize data... Read More.

Alan Gates
Alan Gates (Hortonworks)

Alan Gates is a co-founder at Hortonworks, and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in mathematics from Oregon State University and an MA in theology from Fuller Theological Seminary. He is also the author of Programming Pig from O’Reilly Press.

Matthew Gee
Matthew Gee (Impact Lab/University of Chicago )
Data 101 Tutorial

Matthew Gee is cofounder and principal at the Impact Lab, a data-analytics company focused exclusively on developing scalable data science solutions to social-sector problems. He is also a senior research scientist at the University of Chicago’s Center for Data Science and Public Policy and a research fellow at the Urban Center for Computation and Data. Matt is the cofounder of the Eric and Wendy Schmidt Data Science for Social Good fellowship, which in its first three years has paired 126 fellows with over 40 national, state, and local government organizations and NGOs to build data-driven solutions to social problems.

Matt’s... Read More.

Ari Gesher
Ari Gesher (Kairos Aerospace), @alephbass

Ari Gesher is the founding director of software engineering at Kairos Aerospace, a startup building and operating the next-generation of airborne and spaceborne sensors for monitoring oil and gas infrastructure. Ari also serves as consulting architect for Jupiter, a company productizing high-quality datasets that describe the long-term effects of climate change. Previously, he was a very early engineer at Palantir Technologies and later served as Palantir’s engineering ambassador to the tech community at large; before Palantir, he was the maintainer of the open source archive. Ari is the coauthor of The Architecture of Privacy, which explains... Read More.

Charles Givre
Charles Givre (Deutsche Bank), @cgivre

Charles Givre is an unapologetic data geek who is passionate about helping others learn about data science and become passionate about it themselves. For the last five years, Charles has worked as a data scientist at Booz Allen Hamilton for various government clients and has done some really neat data science work along the way, hopefully saving US taxpayers some money. Most of his work has been in developing meaningful metrics to assess how well the workforce is performing. For the last two years, Charles has been part of the management team for one of Booze Allen Hamilton’s largest... Read More.

Michele Goetz (Forrester Research), @Mgoetz_FORR

Michele Goetz is principal analyst at Forrester Research, serving enterprise and data architecture professionals. She is a leading expert on data management, artificial intelligence, data governance, master data management, and data quality. Michele helps enterprises leverage data assets more effectively by improving the availability and accuracy of the information that businesses use in processes and analytics.

Prior to joining Forrester, Michele managed the business intelligence and data management programs at PTC. During her tenure, she developed and led the global consolidation of customer data across multiple customer relationship management (CRM) platforms to support a single view of the... Read More.

Brett Goldstein
Brett Goldstein (University of Chicago), @bjgol

Brett Goldstein is a leader in enterprise architecture, big data analytics, and government technology with 15 years of experience in operations, management, and leadership in technical environments in both the public and private sector. Brett was recently named the inaugural recipient of the Fellowship in Urban Science at the University of Chicago’s Harris School of Public Policy. As a senior fellow in urban science, he will focus on issues of computation and public policy to inform better decision making in government. Previously, Brett was the commissioner and chief information officer of the Chicago Department of Innovation and Technology (DoIT), appointed... Read More.

Micha Gorelick
Micha Gorelick (Fast Forward Labs), @mynameisfiber

Micha Gorelick was the first man on Mars in 2023 and won the Nobel Prize in 2046 for his contributions to time travel. He then went back to the 2000s to study astronomy, teach scientific computing, and work on data at bitly. After writing a book on High Performance Python, he helped start Fast Forward Labs as a resident mad scientist. There he worked on many issues, from machine learning to performant stream algorithms. A monument celebrating his life can be found in Central Park, 1857.

Alex Gorelik
Alex Gorelik (Waterline Data), @gorelikalex

Alex Gorelik is the founder and CEO of Waterline Data, a startup focused on enhancing the value of Hadoop through data self-service and governance. Alex is a serial entrepreneur and innovator who has spent over 25 years inventing and bringing to market cutting-edge data-oriented technology.

Prior to Waterline, Alex was an EIR at Menlo Ventures. He joined Menlo from Informatica, where he held several executive roles, including GM of Informatica’s Data Quality Business Unit—driving marketing, product management, and R&D for an $80M business—and SVP of R&D for Core Technology—driving innovation in big data and social media while... Read More.

Daniel Goroff
Daniel Goroff (Alfred P. Sloan Foundation), @DGoroff

Daniel L. Goroff is vice president and program director at the Alfred P. Sloan Foundation, a grant-making philanthropy that supports breakthroughs in science, technology, and economics. He is professor emeritus of mathematics and economics at Claremont’s Harvey Mudd College, where he previously served as vice president for academic affairs and dean of the faculty.

Goroff earned his B.A.-M.A. degree in mathematics Summa Cum Laude at Harvard as a Borden Scholar, an M.Phil. in economics at Cambridge University as a Churchill Scholar, a masters in mathematical finance at Boston University, and a Ph.D. in mathematics at Princeton University as a Danforth... Read More.

Matthew Granade
Matthew Granade (Domino Data Lab), @MatthewGranade

Matthew Granade is a cofounder of Domino Data Lab, which makes a workbench for data scientists to run, scale, share, and deploy analytical models, where he works with companies such as Quantopian, Premise, and Orbital Insights. He also invests in, advises, and serves on the boards of startups in data, data analysis, finance, and
 financial tech. Previously, Matthew was co-head of research at Bridgewater Associates, where he built and managed teams that ensured Bridgewater’s understanding of the global economy, created new systems for generating alpha, produced daily trading signals, and published Bridgewater’s market commentary, and an engagement manager at McKinsey... Read More.

Jonathan Gray
Jonathan Gray (Cask)

Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Previously, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded, where he became an early adopter of Hadoop and HBase. He is now... Read More.

Garrett Grolemund
Garrett Grolemund (RStudio)
R Day Tutorial

Garrett Grolemund is the editor-in-chief of, the development center for the Shiny R package, and is the author of Hands-On Programming with R as well as Data Science with R, a forthcoming book by O’Reilly Media. Garrett works as a data scientist and chief instructor for RStudio, Inc.

Robert Grossman
Robert Grossman (University of Chicago)

Robert Grossman is a faculty member and the chief research informatics officer in the Biological Sciences Division of the University of Chicago. Robert is the director of the Center for Data Intensive Science (CDIS) and a senior fellow at both the Computation Institute (CI) and the Institute for Genomics and Systems Biology (IGSB). He is also the founder and a partner of the Open Data Group, which specializes in building predictive models over big data. Robert has led the development of open source software tools for analyzing big data (Augustus), distributed computing (Read More.

Jason Grout
Jason Grout (Bloomberg LP)
PyData at Strata Tutorial

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive Jupyter widgets library. He has also been a major contributor to the open source Sage mathematical software system and co-organizes the PyDataNYC Meetup. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. He holds a PhD in mathematics from Brigham Young University.

Mark Grover

Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMCmember on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Peter Guerra
Peter Guerra (Booz Allen Hamilton)

Peter Guerra is Chief Data Scientist and Vice President leading Booz Allen Hamilton’s Data Science commercial team. He has 15 years of experience in creating big data and data science solutions for government and commercial clients. He was responsible for the architecture and implementation of one of the world’s largest Hadoop clusters for the Federal Government. He has consulted with Fortune 500 companies and federal government organizations throughout his career. Recently he has focused on data governance and security of large data systems, working on a book for O’Reilly titled “Data Security for Modern Enterprises”. He is a frequent speaker... Read More.

Carlos Guestrin
Carlos Guestrin (Apple | University of Washington )

Carlos Guestrin is the director of machine learning at Apple and the Amazon Professor of Machine Learning in Computer Science and Engineering at the University of Washington. Carlos was the cofounder and CEO of Turi (formerly Dato and GraphLab), a machine-learning company acquired by Apple. A world-recognized leader in the field of machine learning, Carlos was named one of the 2008 Brilliant 10 by Popular Science. He received the 2009 IJCAI Computers and Thought Award for his contributions to artificial intelligence and a Presidential Early Career Award for Scientists and Engineers (PECASE).

Ankur Gupta
Ankur Gupta (Bitwise Inc.), @unamigo

Meet us at Booth #105 and Checkout our Open Source ETL on Hadoop Utility developed in partnership with Capital One..

Ali Habib (Northwestern Feinberg School of Medicine)

Ali Habib is a medical student at Northwestern University’s Feiberg School of Medicine. In addition to beginning a radiology residency, Ali is also interested in data science applied toward analysis in financial markets.

Jon Haddad
Jon Haddad (The Last Pickle), @rustyrazorblade

Jon Haddad has almost 20 years professional experience with open source software, and is currently is the Principal Consultant at The Last Pickle. In the open source world, he’s a committer and PMC member for Apache Cassandra. Prior to The Last Pickle, Jon was a technical evangelist at DataStax. He has worked on dozens of Cassandra clusters across a wide variety of hardware, both on-prem and in the cloud.

Alan Hannaway
Alan Hannaway (7digital)

Alan Hannaway is the product owner for data at 7digital, where he is responsible for ensuring the company is developing and extracting value from its line of data products. Prior to 7digital, Alan worked in a variety of roles, most recently providing data to the entertainment industry through his own startup. Alan started his career working as a researcher in computer science, focusing his interests on the application of technology to measure the scale and distribution of content consumption on large Internet networks.

Ian Hansen (Digital Ocean)

Software Engineer at DigitalOcean

Ben Harden
Ben Harden (CapTech Consulting), @benjaminhharden

Ben Harden leads the Big Data Practice at CapTech, and has over 17 years of enterprise software development experience in the areas of data warehousing, metadata management, data governance, business intelligence, and enterprise scale Hadoop data ingestion and refinement. He is a certified IBM Cognos Specialist, Business Objects Report Designer, Certified Scrum Master, Scaled Agilist, and Project Management Professional.

Rob Harper
Rob Harper (Uncharted), @rdharper

Rob Harper is partner, lead product architect at Uncharted, and has been building technical platforms and products in the visualization industry for a decade. Over the past number of years Rob has been focusing on development of web-based HTML5 technology approaches for big data.

Michael Hausenblas

Michael Hausenblas is a developer advocate at AWS, part of the container service team, focusing on container security. Michael shares his experience around cloud native infrastructure and apps through demos, blog posts, books, and public speaking engagements as well as contributes to open source software. Previously, was at Red Hat, Mesosphere, MapR, and in two research institutions in Ireland and Austria.

Jeffrey Heer
Jeffrey Heer (Trifacta | University of Washington), @jeffrey_heer

Jeffrey Heer is Trifacta’s chief experience officer and cofounder as well as a professor of computer science at the University of Washington, where he directs the Interactive Data Lab. Jeff’s passion is the design of novel user interfaces for exploring, managing, and communicating data. The data visualization tools developed by his lab (D3.js, Protovis, Prefuse) are used by thousands of data enthusiasts around the world. In 2009, Jeff was named to MIT Technology Review’s list of “top innovators under 35.”

Joe Hellerstein

Joseph M. Hellerstein is the Jim Gray Chair of Computer Science at UC Berkeley and cofounder and CSO at Trifacta. Joe’s work focuses on data-centric systems and the way they drive computing. He is an ACM fellow, an Alfred P. Sloan fellow, and the recipient of three ACM-SIGMOD Test of Time awards for his research. He has been listed by Fortune among the 50 smartest people in technology, and MIT Technology Review included his work on their TR10 list of the 10 technologies most likely to change our world.

Sam Heywood
Sam Heywood (Cloudera)

Sam Heywood is responsible for driving Cloudera’s portfolio of security technologies. He is a seasoned product and marketing executive with leadership experience at several notable technology startups and is well versed in systems management, online CRM platforms, consumer eCommerce, and security technologies. Prior to Cloudera, Sam was VP products and marketing for Gazzang, leading global product innovation and delivery, corporate marketing, and demand generation programs. Sam was senior director of products at uShip, driving the company’s expansion into multiple product lines spanning the consumer retail and commercial freight markets. Sam also held product and marketing management roles at Convio,... Read More.

Andrew Hill

Andrew Hill is cofounder and CEO of Textile, where he is building technology to help data scientists create the future of predictive models from personal location and behavior data. Textile provides an SDK to access over 200+ features extracted in real-time and designed for machine learning. Previously, Andrew was chief science officer at CARTO. He holds a PhD from the University of Colorado, Boulder.

Eva Ho
Eva Ho (Susa Ventures), @eva_ho

Eva Ho is a General Partner at Susa Ventures, an early stage technology fund investing in companies that leverage the power of data to create market-leading platforms, tools, and analytics with inherent network effects. Eva is a serial entrepreneur and founder, most recently a founding executive at Factual, a leading location data provider in Los Angeles. Prior, she was a Senior Product Marketing Manager at Google and Youtube for 5 years. Prior to Google, she was the head of marketing for Applied Semantics, a company that sold to Google in 2003. She also co-founded Navigating Cancer, a health startup, in... Read More.

Jeff Holoman

Jeff Holoman is a systems engineer at Cloudera. Jeff is a Kafka contributor and has focused on helping customers with large-scale Hadoop deployments, primarily in financial services. Prior to his time at Cloudera, Jeff worked as an application developer, system administrator, and Oracle technology specialist.

Juliet Hougland
Juliet Hougland (Cloudera)

Juliet Hougland is a data scientist at Cloudera and contributor/committer/maintainer for the Sparkling Pandas project. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal and designing and building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an MS in applied mathematics from the University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in math-physics.

Tim Howes
Tim Howes (ClearStory Data), @howes28

Dr. Timothy Howes, co-inventor of LDAP and holder of numerous patents, leads innovation on ClearStory’s Spark-based data analysis platform. A respected entrepreneur and computer scientist, he was a co-founder of Loudcloud/Opsware and Rockmelt, and previously served as VP of engineering at Yahoo and CTO of HP Software and Netscape’s Server Products Division. He holds a bachelor of science degree in aerospace engineering, a master of science in computer science and engineering, and a Ph.D. in computer science, all from the University of Michigan.

Jonathan Hsieh
Jonathan Hsieh (Cloudera, Inc), @jmhsieh

Jonathan Hsieh is a software engineer at Cloudera. He is an Apache HBase committer, and Apache Flume founder.

Juan Huerta
Juan Huerta (Dow Jones)

Juan M. Huerta is the Head of Data Science at Dow Jones where he and his team focus on bringing the most innovative data and algorithmic approaches to the analysis of Dow Jones news and information, as well as toward the transformation of our business. Previous to Dow Jones, Juan’s work has focused on developing algorithms to decode, understand and extract information from location data, financial and banking data, as well as natural language, dialog, and speech signals. Working in premiere R&D organizations like the IBM Research Division, Carnegie Mellon University, and Dragon Systems, as well as leading financial... Read More.

Ignacio Hwang

Ignacio Hwang is the senior product manager responsible for Hadoop initiatives at the Hewlett Packard Enterprise Big Data Software division, with over 15 years of IT infrastructure experience finding innovative solutions for real enterprise applications. His professional background covers storage, cloud, virtualization, and Hadoop technologies, giving him a deep insight in what is required to build robust products to help drive today’s high performance analytics operations. He received his Bachelor degree at Tufts University and M.B.A at Boston College.

Bar Ifrach
Bar Ifrach (Airbnb), @bifrach

Bar Ifrach completed his BA in economics at Tel Aviv university in 2007 and continued on to graduate school at Columbia Business School in New York. He completed his PhD in operations research and economics in 2012. His academic research focused on learning and pricing in online marketplaces and game theory. Following that, Bar completed a postdoc at Stanford, where he researched visibility and ranking for mobile applications. He joined Airbnb as a data scientist in the search team in September 2013, and is currently leading a team of data scientists on the conversion team.

Ihab Ilyas
Ihab Ilyas (University of Waterloo), @ihabilyas

Ihab Ilyas is a professor in the Cheriton School of Computer Science at the University of Waterloo, where his research focuses on the areas of big data and database systems, with special interest in data quality and integration, managing uncertain data, rank-aware query processing, and information extraction. Ihab is also a cofounder of Tamr, a startup focusing on large-scale data integration and cleaning. He’s a recipient of the Ontario Early Researcher Award (2009), a Cheriton faculty fellowship (2013), an NSERC Discovery Accelerator Award (2014), and a Google Faculty Award (2014), and he’s an ACM Distinguished Scientist. Ihab is... Read More.

Michał Iwanowski
Michał Iwanowski (

Michał Iwanowski holds the position of product director at He graduated from the Warsaw University of Technology, specializing in software engineering and machine learning. He gained experience at IBM while working with big data exploration, predictive analytics, and data warehouses. At IBM he’s been developing an analytical toolkit for machine learning and data mining, while authoring a number of invention disclosures and a patent claim. He has collaborated with medical researchers, performed statistical analyses of medical research results, and created systems for computer-aided experiment design.

Anand Iyer
Anand Iyer (Cloudera)

Anand Iyer is a senior product manager at Cloudera, the leading vendor of open source Apache Hadoop. His primary areas of focus are platforms for real-time streaming, Apache Spark, and tools for data ingestion into the Hadoop platform. Before joining Cloudera, Anand worked as an engineer at LinkedIn, where he applied machine-learning techniques to improve the relevance and personalization of LinkedIn’s Feed. Anand has extensive experience leveraging big data platforms to deliver products that delight customers. He holds a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.

Jeff Jarrell
Jeff Jarrell (American Airlines), @@Magicamel424

Jeff Jarrell is a data architect at American Airlines on both the Big Data and the Web Analytics teams. He’s been through all the battles with the team in getting Hadoop into Production and is now working with the various business groups gaining insights from their Big Data system.

Stefanie Jegelka
Stefanie Jegelka (M.I.T.)

Stefanie Jegelka is the X-Consortium career development assistant professor at the Department of Electrical Engineering and Computer Science at MIT, and a member of CSAIL and the Institute for Data, Systems and Society. Before joining MIT in Spring 2015, she was a postdoctoral scholar in the AMPLab at UC Berkeley, working with Michael Jordan and Trevor Darrell. She earned her PhD from ETH Zurich in collaboration with the Max Planck Institutes in Tuebingen, Germany, and a Diplom from the University of Tuebingen. She has been a fellow of the German National Academic Foundation, and has received... Read More.

Rahel Jhirad

Rahel holds a PhD in Economics from Princeton, and MS in Mathematics from NYU. She is Director of Data Science at Hearst. She is passionate about big data and cross-disciplinary literacy. She has consulted with Fortune 100 companies leveraging machine learning, domain expertise, modeling, time series and econometrics tools to solve and address business challenges. She has worked in the space of Big Data since 2010 and worked with data and analytics throughout her career starting in financial investments and trading, and now working with content and digital media. She runs the popular meetup: Economics and Big Data and... Read More.

Weihua Jiang
Weihua Jiang (Intel)

Weihua Jiang is the engineering manager at Intel for big data enabling. He has worked on big data since 2011. He was the release manager for Intel’s Hadoop distribution from 2011 to 2014. Currently he is focusing on big data enabling, including optimizing the software stack for better performance and to make the ecosystem enterprise ready.

Ann Johnson
Ann Johnson (Interana)

Ann Johnson is cofounder and CEO of Interana, the experts in event data analytics, where she has created a community of all-star talent working to make data-informed decisions a natural extension of everyone’s workflow. Previously, Ann served as a new product manager and integration engineer at Intel. Ann received an MS in electrical engineering from Caltech, where she was selected for the Intel Scholarship program and subsequently offered a leadership position at Intel.

Joy Johnson
Joy Johnson (AudioCommon), @joyjohnson

Joy Johnson leads mobile at music technology startup AudioCommon, a team of MIT musicians and PhD hackers revolutionizing the way music is created, organized, and shared in today’s interconnected world. Through AudioCommon’s cloud-based collaboration platform, musicians and the greater industry can collaborate in new ways during the very early stages of the creative process (capturing data that has never been captured before) and share a new type of content to engage fans with a new interactive experience, giving artists a new way to monetize and thrive in today’s Industry.

Joy is a recent graduate of the Massachusetts Institute of... Read More.

Jeff Jonas

Jeff Jonas is an IBM Fellow and chief scientist of Context Computing. His work in context-aware computing was originally developed at Systems Research & Development (SRD), founded by Jonas in 1985, and acquired by IBM in January, 2005.

Prior to SRD’s acquisition, Jonas spearheaded the design and development of a number of innovative systems, including technology used by the Las Vegas gaming industry. One such innovation played a pivotal role in protecting that industry from aggressive card count teams. The most notable, known as the “MIT team,” was featured in the book Bringing Down the House, and... Read More.

Hajkan Jonsson
Hajkan Jonsson (Sony Mobile Communications), @hajons

Håkan Jonsson is a data scientist in the Lifelog Insights team at Sony Mobile. He is a PhD student at Lund University with context awareness, mobile sensing, and social computing as his subjects.

Anthony D. Joseph
Anthony D. Joseph (UC Berkeley | Databricks)

Anthony D. Joseph is a Professor in Electrical Engineering and Computer Science at UC Berkeley. He received his B.S., S.M., and Ph.D. Degrees in Computer Science from MIT. He joined the UC Berkeley faculty in 1998, where he is developing adaptive techniques for: cloud computing, network and computer security, and security defenses for machine learning-based decision systems. He also co-leads the DETERlab testbed, a secure scalable testbed for conducting cybersecurity research, and he is a Technical Advisor at Databricks.

Sven Junkergard
Sven Junkergard (Zephyr Health), @zephyrhealth

Sven Junkergård is the Chief Technology Officer of Zephyr Health, the Insights-as-a-Service leader harnessing the power of global healthcare data to address critical business and patient needs.

Sven specializes in identifying new technologies, partnerships and data sources that further advance Zephyr Health’s insights focused on product lifecycle success for BioPharma and Medical Device companies.

Russell Jurney
Russell Jurney (Data Syndrome), @rjurney

Russell Jurney is principal consultant at Data Syndrome, a product analytics consultancy dedicated to advancing the adoption of the development methodology Agile Data Science, as outlined in the book Agile Data Science 2.0 (O’Reilly, 2017). He has worked as a data scientist building data products for over a decade, starting in interactive web visualization and then moving towards full-stack data products, machine learning and artificial intelligence at companies such as Ning, LinkedIn, Hortonworks and Relato. He is a self taught visualization software engineer, data engineer, data scientist, writer and most recently, he’s becoming a teacher. In addition to helping companies... Read More.

Ritu Kama
Ritu Kama (Intel)

Ritu Kama is the director of product management for big data at Intel. She has over 15 years of experience in building software solutions for enterprises. She led engineering, QA, and solution delivery organizations within data center software divisions for security and identity products. She has led the product and program management responsibilities for Intel’s distribution of Hadoop and big data solutions. Prior to joining Intel, Ritu led technical and architecture teams at IBM and Ascom. She has an M.B.A. degree from the University of Chicago and a bachelor’s degree in computer science.

Reiner Kappenberger
Reiner Kappenberger (HP Security Voltage)

Reiner Kappenberger has over 20 years of computer software industry experience focusing on encryption and security for big data environments. His background ranges from device management in the telecommunications sector to GIS and database systems. He holds a diploma from the FH Regensburg, Germany in computer science.

Holden Karau
Holden Karau (Independent), @holdenkarau

Holden Karau is a transgender Canadian software working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work,... Read More.

Ron Kasabian
Ron Kasabian (Intel)

Ronald E. Kasabian is vice president in the Data Center Group and general manager of big data solutions at Intel Corporation. He has overall responsibility for Intel’s strategy and plans in the big data arena, spanning hardware platforms, software solutions, services, strategic business partnerships, and paths to market.

Ron joined Intel in 1984 and spent his first 14 years at the company developing and managing software solutions for various enterprise applications. For several years beginning in 1998, he led product development at Pandesic LLC, a joint venture formed by Intel and SAP to deliver e-commerce solutions.

Over the... Read More.

Jim Kaskade

Jim is the CEO of Janrain, a Digital Identity Cloud provider. We believe that your identity is the most important thing you own, and that your identity should not only be easy to use, but it should be safe to use when accessing your digital world.

Janrain is an Identity Cloud servicing Global 3000 enterprises providing a consistent, seamless, and safe experience for end-users when they access their digital applications with Registration aaS, Login aaS, Single-Sign-On aaS, and preference/consent management aaS.

Jim led DXC’s Digital Applications BU, and global Big Data & Analytics (BD&A) business unit, their fastest growing... Read More.

Adam Kelleher
Adam Kelleher (Buzzfeed), @akelleh

Adam Kelleher is a data scientist and mathemagician. He has a physics PhD from the University of North Carolina-Chapel Hill.

Kyle Kelley
Kyle Kelley (Netflix), @rgbkrk
PyData at Strata Tutorial

Kyle Kelley is a senior software engineer at Netflix, a maintainer on, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone, from small teams to massive scale.

Alex Kelly
Alex Kelly (General Motors), @pros599

Alex Kelly is currently a software development manager for General Motors who is very passionate about Big Data, IoT, Cars, Planes, and many other things. Before General Motors, Alex worked for Microsoft as a Product Manager on the Power BI team where he became familiar with many Big Data Tools. He passionately believes technology is the gateway to changing the world, and his goal is to empower everyone with technology.

In his free time, he usually can be found acting as a consultant for startups where he focuses on UX, UI, team/culture building, and service technologies.

Jake Kendall
Jake Kendall (Bill & Melinda Gates Foundation)

Jake Kendall leads the research and innovation initiative of the Financial Services for the Poor team at the Bill & Melinda Gates Foundation. Jake’s team manages FSP’s major research grants, data collection activities, and technology innovation projects. Previous to joining the Foundation, he spent time as an economist with the Consultative Group to Assist the Poor (CGAP) housed in the World Bank. Jake holds a PhD in development economics from UC Santa Cruz and a BS in physics from MIT. Jake has also worked as a Peace Corps volunteer, a brand analyst for a major advertising firm, in... Read More.

Katie Kent
Katie Kent (Galvanize), @k80kent
Data 101 Tutorial

Katie Kent is the Product Manager for Galvanize Enterprise, the learning community for technology. In this role she builds executive and contributor training in software development, data science, and data engineering. Katie was part of the founding of data science training startup Zipfian Academy, where she was responsible for growth of the business from concept to acquisition. Previously Katie worked in venture capital, working with startups building data- and design-driven products. Katie’s academic background is in environmental social science research at the University of Michigan.

Paul Kent

Paul Kent is vice president of big data initiatives at SAS, where he divides his time between customers, partners, and the research and development teams discussing, evangelizing, and developing software at the confluence of big data and high-performance computing. Previously, Paul was vice president of the Platform R&D Division at SAS, where he led groups responsible for the SAS foundation and mid-tier technologies—teams that develop, maintain, and test Base SAS, as well as related data access, storage, management, presentation, connectivity, and middleware software products. Paul has contributed to the development of SAS software components including... Read More.

Jooseong Kim
Jooseong Kim (Pinterest)

Jooseong is a software engineer at Pinterest on the data engineering team. He has worked on various components of the offline data stack including Pinalytics (analytics and visualization engine), A/B experiments framework, and platforms for processing pipelines. Before joining Pinterest, Jooseong was a software engineer at Oracle in the kernel service team, where he worked on the cpu scheduler and parallel statement scheduler for data warehouses.

Phil Kim
Phil Kim (Capital One Labs)

Phil Kim leads the Data Lab, a passionate group of creative thinkers and builders who blend data science, engineering, product, and design to develop breakthrough solutions for Capital One. The Data Lab works to deliver more intuitive and intelligent experiences to help its customers succeed, as well as develop new approaches to high-value analytical problems, such as risk prediction and fraud detection.

Phil has an extensive background in designing and building technology-driven products and businesses. Most recently, he co-founded Bundle (personal finance analytics), where he was the CTO and head of product. Phil joined Capital One in November 2012... Read More.

Aaron Kimball
Aaron Kimball (Zymergen, Inc.)

Aaron Kimball is the CTO of Zymergen, Inc. Zymergen uses high-throughput techniques, combined with big data analysis, to improve genetic strains for microbial chemical production. Aaron has been working with Hadoop since 2007. In 2008 he was Cloudera’s first employee, where he wrote Apache Sqoop and MRUnit, as well as performed a lot of Hadoop training. In 2010, Aaron founded WibiData and assumed the role of chief architect. WibiData helps organizations build big data applications. Aaron holds a BS in computer science from Cornell University and an MS in computer science from the University of Washington.

Jeremy King
Jeremy King (Walmart Global eCommerce), @jeremybking

As chief technology officer and senior vice president of global e-commerce at Walmart, Jeremy King leads product, engineering, and the web ops teams charged with developing Walmart’s online business on a global scale as the company moves to the next generation of e-commerce. Jeremy received a B.S. in information technology from San Jose State University.

Martin Kleppmann
Martin Kleppmann (University of Cambridge), @martinkl

Martin Kleppmann is a researcher in distributed systems at the University of Cambridge. Previously, he cofounded and sold two startups and worked on large-scale data infrastructure at internet companies including LinkedIn. Martin is the author of Designing Data-Intensive Applications from O’Reilly.

Joe Klobusicky (Geisinger Health System)

Joe Klobusicky is an applied mathematics/predictive analyst at Geisinger Health System. His interests include recommender systems, natural language processing, and Markov theory with a slant toward bioinformatics and pharmacoeconomics.

Maria Konnikova
Maria Konnikova (The New Yorker | Mastermind)

Maria Konnikova writes about human behavior, science, and psychology, most notably for her weekly blog at The New Yorker. In her bestseller, Mastermind: How to Think Like Sherlock Holmes, she offers tips and advice for improving cognitive ability. And in all her work, she displays a flare for finding new angles through which to explore popular topics such as motivation, performance, and the brain. Maria’s breakout book, Mastermind, has been translated into 16 languages. In it, she explores the famous detective’s signature methods of observation, logical deduction, and mindfulness, showing readers how to apply his techniques in everyday situations. Her... Read More.

Marcel Kornacker
Marcel Kornacker (Cloudera)

Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.

Balaji Krishna has been with SAP for over 16 years, with customer-facing experience as support consultant, RIG, solution management, and currently product management. He has been a trusted advisor to customers in architecting and implementing the best end-to-end EDW and analytics solutions. In his current role, Balaji is responsible for SAP Vora and HANA/Hadoop integration topics.

Chris Kudelka
Chris Kudelka (Riot Games)

Chris Kudelka has worked on the big data team at Riot Games since 2011. In his current role, Chris is product lead and engineering manager for the Insights Tech team (Riot’s big data initiative). His team enables core-game and backend-platform, and other feature teams integrate with Riot’s data ecosystem and analytics tools so they can focus on the player from both behavior and performance perspectives.

Prior to Riot, Chris was a researcher and developer at Washington University’s Cognitive Aging Lab. He received his degree in philosophy-neuroscience-psychology from Washington University in St Louis, with a focus on linguistics. He used to... Read More.

Lenni Kuff
Lenni Kuff (Facebook)

Lenni Kuff is an engineering manager at Facebook within Core Systems Infrastructure. Before joining Facebook, he worked at Cloudera for 5 years on Impala, Hive, and Sentry. Prior to Cloudera, Lenni was a Software Engineer at Microsoft on a number of projects including SQL Server storage engine, SQL Azure, and Hadoop on Azure. Lenni graduated from the University of Wisconsin-Madison with degrees in computer science and computer engineering.

Scott Kurth
Scott Kurth (Silicon Valley Data Science)

Scott Kurth is the vice president of client solutions at Silicon Valley Data Science, where he helps clients define and execute the strategies and data architectures that enable differentiated business growth. Building on 20 years of experience making emerging technologies relevant to enterprises, he has advised clients on the impact of technological change, typically working with CIOs, CTOs, and heads of business. Scott has helped clients drive global technology strategy, conduct prioritization of technology investments, shape alliance strategy based on technology, and build solutions for their businesses. Previously, Scott was director of the Data Insights R&D practice within the Accenture... Read More.

Haden Land
Haden Land (Lockheed Martin IS&GS), @hadenaland

Haden Land is vice president of research and technology for Lockheed Martin IS&GS, with 30 years of professional experience. He serves numerous U.S. government agencies, allied nations, and regulated commercial industries. Haden is responsible for technical solutions, strategic partnerships, global innovation centers, research and development, and emerging technology planning. His areas of expertise include cloud computing, big data, cyber security, enterprise mobility, complex adaptive systems, enterprise architecture, and advanced concepts. He has domain knowledge within government, space, energy, law enforcement, transportation, and healthcare.

Previously, Haden was vice president of solutions engineering for Lockheed Martin IS&GS, vice president of engineering and... Read More.

Philip Langdale
Philip Langdale (Cloudera)

Philip Langdale is the engineering lead for cloud at Cloudera. He joined the company as one of the first engineers building Cloudera Manager and served as an engineering lead for that project until moving to working on cloud products. Previously, Philip worked at VMware, developing various desktop virtualization technologies. Philip holds a bachelor’s degree with honors in electrical engineering from the University of Texas at Austin.

Uri Laserson
Uri Laserson (Cloudera), @laserson

Uri Laserson is a data scientist at Cloudera. Previously, he obtained his PhD from MIT where he developed applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012, he was selected to Forbes’s list of 30 under 30.

Rachel Laycock
Rachel Laycock (ThoughtWorks), @rachellaycock

Rachel Laycock is a market technical principal at ThoughtWorks in New York, where she has played the role of coach, trainer, technical lead, architect, and developer, coaching teams on Agile and continuous delivery technical practices. She is now a member of the Technical Advisory Board to the CTO, which regularly produces the ThoughtWorks Technology Radar. Rachel has over 10 years of experience in systems development and has worked on a wide range of technologies and the integration of many disparate systems. She is fascinated by problem solving and has discovered that people problems are often more difficult to solve... Read More.

Kim Le
Kim Le (General Motors), @kmle21

Kim Le is currently a Program Manager for General Motors with a proven track record in leading large scales initiatives in the Sales and Marketing & Finance space. She holds a Masters in Management Information System and is passionate about providing technology that will help drive to better utilization of data for everyone.

John Leach
John Leach (Splice Machine), @JleachJohn

John lead the development of Splice Machine receiving several patents in distributed transaction processing and focusing on the development of Splice Machine’s dual engine architecture. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. At Incite Retail, he built custom big data systems (leveraging HBase and Hadoop) for Fortune 500 companies.

Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. John was a key subject matter expert for Blue Martini Software in many strategic implementations across the world.... Read More.

Raphael Lee
Raphael Lee (Airbnb)

Raph Lee manages the Data Tools team at Airbnb, which is responsible for lowering barriers toward data-informed decision-making through automation, education, data visualization, and storytelling. A full-stack engineer by training and a four-year-plus veteran of Airbnb, he’s worked on everything from host-facing features to database tuning to SEO.

Mike Lee Williams
Mike Lee Williams (Cloudera Fast Forward Labs), @mikepqr

Mike Lee Williams is a research engineer at Cloudera Fast Forward Labs, where he builds prototypes that bring the latest ideas in machine learning and AI to life and helps Cloudera’s customers understand how to make use of these new technologies. Mike holds a PhD in astrophysics from Oxford.

Matt LeMay
Matt LeMay (Constellate Data)

Matt LeMay is the co-founder of Constellate Data, where he designs human-centered systems for contextualizing and collaborating around data. In his work as a technology communicator, Matt has designed and led workshops about product management and data strategy for companies including Pfizer, Visa, McCann, and Johnson & Johnson. Previously, Matt worked as Senior Product Manager at music startup Songza (acquired by Google), and Head of Consumer Product and Platform Manager at Bitly. Matt is also a musician, recording engineer, senior contributor to music website, and the author of a book about singer-songwriter Elliott Smith.

Haoyuan Li
Haoyuan Li (Alluxio), @haoyuan

Haoyuan (H.Y.) Li is the founder, chairman, and CTO of Alluxio. He holds a PhD in computer science from UC Berkeley’s AMPLab, where he created the Alluxio (formerly Tachyon) open source data orchestration system, cocreated Apache Spark Streaming, and became an Apache Spark founding committer. He also holds an MS from Cornell University and a BS from Peking University, both in computer science.

Nong Li
Nong Li (Cloudera)

Nong Li is a software engineer at Cloudera working on the RecordService and Impala projects. Before joining Cloudera, he worked at Microsoft developing new APIs for the Windows graphics system (DirectX). Nong holds a Sc.B. in computer science from Brown University.

Todd Lipcon
Todd Lipcon (Cloudera), @tlipcon

Todd Lipcon is an engineer at Cloudera, where he primarily contributes to open source distributed systems in the Apache Hadoop ecosystem. Previously, he focused on Apache HBase, HDFS, and MapReduce, where he designed and implemented redundant metadata storage for the NameNode (QuorumJournalManager), ZooKeeper-based automatic failover, and numerous performance, durability, and stability improvements. In 2012, Todd founded the Apache Kudu project and has spent the last three years leading this team. Todd is a committer and PMC member on Apache HBase, Hadoop, Thrift, and Kudu, as well as a member of the Apache Software Foundation. Prior to Cloudera, Todd worked... Read More.

Alex Loffler
Alex Loffler (TELUS)

Alex Loffler is a principal technology architect at TELUS, one of Canada’s largest providers of cellular, fixed-line, and cable television services. He has nearly 20 years of experience in architecting enterprise software solutions. Alex holds several patents in the U.S. and Europe. Alex has an MSc from University College London and a BSc, with honors, from the University of Sheffield.

Ben Lorica
Ben Lorica (O'Reilly), @bigdata

Ben Lorica is the chief data scientist at O’Reilly. Ben has applied business intelligence, data mining, machine learning, and statistical analysis in a variety of settings, including direct marketing, consumer and market research, targeted advertising, text mining, and financial engineering. His background includes stints with an investment management company, internet startups, and financial services.

Mike Loukides
Mike Loukides (O'Reilly Media), @mikeloukides
Closing remarks Cultivate
Closing remarks Cultivate
Welcome Cultivate
Welcome Cultivate

Mike Loukides is vice president of content strategy for O’Reilly Media. He’s edited many highly regarded books on technical subjects that don’t involve Windows programming. He’s particularly interested in programming languages, Unix and what passes for Unix these days, and system and network administration. Mike is the author of System Performance Tuning and a coauthor of Unix Power Tools. Most recently, he’s been fooling around with data and data analysis, languages like R, Mathematica, and Octave, and thinking about how to make books social.

Jason Loveland
Jason Loveland (Lockheed Martin)

Jason Loveland is a software architect at Lockheed Martin IS&GS, with 12 years of professional experience. Jason is responsible for leading research and development programs applying big data applications and advanced analytics for space systems and cyber security domains. He has expertise in enterprise architecture, software engineering, modeling and simulation, big data, and cloud solutions.

Jason holds a Bachelors in Computer Engineering from Villanova University. He also holds a Masters in Engineering from Old Dominion University.


Brandon MacKenzie is the Data Science on Hadoop leader on IBM’s Worldwide Technical Sales team for Information Management Software. He is an expert on statistical processing in Hadoop and HPC environments. Brandon earned his master’s degree from The University of Edinburgh.

Jock Mackinlay
Jock Mackinlay (Tableau)

Jock D. Mackinlay is an American information visualization expert and vice president of visual analysis at Tableau Software. Jock has a Ph.D. in computer science from Stanford University, where he pioneered the automatic design of graphical presentations of relational information. He joined Xerox PARC in 1986, where he collaborated with the User Interface Research Group to develop many novel applications of computer graphics for information access, coining the term Information Visualization. Much of the fruits of this research can be seen in his book, Readings in Information Visualization: Using Vision to Think (Morgan Kauffman, written and edited with Stuart... Read More.

Mark Madsen
Mark Madsen (Teradata), @markmadsen

Mark Madsen is a fellow at Teradata, where he’s responsible for understanding, forecasting, and defining the analytics ecosystem and architecture. Previously, he was CEO of Third Nature, where he advised companies on data strategy and technology planning and vendors on product management. Mark has designed analysis, machine learning, data collection, and data management infrastructure for companies worldwide.

Roger Magoulas
Roger Magoulas (O'Reilly Media), @rogerm
Closing remarks Keynote

Roger Magoulas is the vice president of O’Reilly Radar. Previously, Roger was the research director at O’Reilly, where he and his team built the company’s analysis infrastructure and provided analytic services and insights on technology-adoption trends to business decision makers at O’Reilly and beyond. He and his team found what excites key innovators and use those insights to gather and analyze faint signals from various sources to make sense of what others may adopt and why.​

Rajiv Maheswaran
Rajiv Maheswaran (Second Spectrum), @RajivMaheswaran

Rajiv Maheswaran is CEO of Second Spectrum, an innovative sports analytics and data visualization startup located in Los Angeles, California. His work spans the fields of data analytics, data visualization, real-time interaction, spatiotemporal pattern recognition, artificial intelligence, decision theory, and game theory. Previously, Rajiv served as a research assistant professor within the University of Southern California’s Department of Computer Science and a project leader at the Information Sciences Institute at the USC Viterbi School of Engineering. He and Second Spectrum COO Yu-Han Chang codirected the Computational Behavior Group at USC. Rajiv has received numerous awards and... Read More.

Ted Malaska
Ted Malaska (Capital One), @TedMalaska

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

Rishi Malhotra

Rishi Malhotra is co-founder and CEO of Saavn, India’s leading music streaming service. As CEO, Rishi has led the company through significant and rapid user growth, while helping to secure partnerships with companies like Twitter, Facebook, Google, Shazam, T-Mobile, and Sonos. He is focused on driving the global Saavn team to deliver award-winning mobile products, big data systems for media, and industry-defining business innovation. Saavn is on path to become one of the largest streaming music companies in the world by 2017. Rishi has led the team in raising more than $125MM in funding from leading institutional investors,... Read More.

Silviu Maniu (Huawei)

Silviu Maniu is a researcher at Noah’s Ark Lab, Huawei Technologies. He holds a PhD degree in Computer Science from Telecom ParisTech. His main research interests are social and uncertain data management databases, and stream machine learning.

Sriranjan Manjunath (Saavn Inc)

Sri Manjunath is a cofounding engineer at Saavn and former lead engineer at Yahoo! He has 10+ years of experience building scalable websites and back-end systems. He currently heads the engineering team at Saavn.

Adam Marcus

Adam Marcus is a cofounder and CTO of B12, a company building a better future of creative and analytical work, starting with design. With Orchestra, its open source project management system for experts and machines, B12 automatically generates websites for clients (algorithmic design) and then recruits wonderful designers and art directors to fill in the details from the algorithmically generated starting points. (This summer, B12 announced the close of a $12.4M Series A funding round.) Previously, Adam was director of data at Locu, a startup that was acquired by GoDaddy. He has written widely on crowdsourcing and data management... Read More.

Gary Marcus
Gary Marcus (Geometric Intelligence), @GaryMarcus

A scientist, best-selling author, and entrepreneur, Gary Marcus is currently professor of psychology and neural science at NYU and CEO and cofounder of the recently formed Geometric Intelligence, Inc. Gary’s efforts to update the Turing test have spurred a worldwide movement and his research on language, computation, artificial intelligence, and cognitive development has been published widely in leading journals such as Science and Nature. He is also the author of four books, including The Algebraic Mind, Kluge: The Haphazard Evolution of the Human Mind, and the New York Times best-seller Guitar Zero, and contributes frequently to the the... Read More.

Jay Margalus

Jayson Margalus is a demo engineer at MapR, faculty member at DePaul Unviersity, and has a background in design with a specialty in games, interactive exhibits, and data. He lives in Mokena, Illinois where he chairs the Mokena Technology Committee, the Mokena makerspace SpaceLab, and runs a Maker Faire. He also founded the Glen Ellyn makerspace Workshop 88. Some maker-related projects include the “Big Data Outbreak” project for Big Data Everywhere and Hackerspaces in Space. Jay also writes for outlets like Make, NBC Chicago, and the Mokena Messenger.

Kristi Marotta (Allstate)

Kristi Marotta received a bachelors in actuarial science from the University of Iowa. She currently uses Tableau to visualize data and solve business problems in her position as a competitive intelligence consultant at Allstate.

Hilary Mason
Hilary Mason (Cloudera Fast Forward Labs), @hmason

Hilary Mason is vice president of research at Cloudera Fast Forward Labs and data scientist in residence at Accel Partners. Previously, Hilary was chief scientist at Bitly. She cohosts DataGotham, a conference for New York’s homegrown data community, and cofounded HackNY, a nonprofit that helps engineering students find opportunities in New York’s creative technical economy. She’s on the board of the Anita Borg Institute and an advisor to several companies, including SparkFun Electronics, Wildcard, and Wonder. Hilary served on Mayor Bloomberg’s Technology Advisory Board and is a member of Brooklyn hacker collective NYC Resistor.

Murthy Mathiprakasam

Murthy Mathiprakasam is a director of product marketing for Informatica’s big data products, where he is responsible for outbound marketing activities. Murthy has a decade and a half of experience working with emerging high-growth software technologies, including roles at Mercury Interactive/HP, Google, eBay, VMware, and Oracle. Murthy holds an MS in management science from Stanford University and BS degrees in management science and computer science from the Massachusetts Institute of Technology.

Damon McDougall
Damon McDougall (UT Austin)
PyData at Strata Tutorial

Damon McDougall did his PhD in Mathematics at the University of Warwick in the UK. His research focuses are in Bayesian inverse problems, parameter estimation, learning, computational science, high-performance computing, and software engineering. Damon is a core developer of Matplotlib and contributes heavily to the open source community.

Patrick McFadin

Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.

Emma McGrattan
Emma McGrattan (Actian)

Emma McGrattan is SVP of engineering at Actian, where she leads the Actian Vector, Actian Vector Hadoop Edition, and Actian Matrix development teams. A leading authority in DBMS technologies, Emma has over 20 years’ experience managing, supporting, and developing a variety of databases, from her early days with Ingres to the cutting-edge Actian Vortex. Emma joined the original Ingres Corp. in 1992 and held a senior leadership role on the Ingres engineering team through a number of acquisitions. Born in Ireland, Emma earned a bachelor of electronic engineering from Dublin City University.

Hugh McGrory
Hugh McGrory (datavized), @mcgrory

Hugh McGrory brings expertise in film production, art, and technology to the world of immersive media. He was a partner at Culture Shock, consulting for clients including The National Film Board of Canada. In 2011 Hugh brought the partners together to create The Andy Warhol Film Digitization Project, featuring over 500 films by Warhol, developed in collaboration with The Moving Picture Company and Technicolor and described in the New York Times as “the largest effort to digitize the work of a single artist in MoMA’s collection.”

Hugh grew up in Derry, Northern Ireland. He co-founded the Belfast-based studio in... Read More.

Jim McHugh
Jim McHugh (Cisco)

Jim McHugh has over 20 years of experience as a marketing executive and leadership positions with startup, mid-sized, and high profile companies, including Sun Microsystems and Apple, prior to joining Cisco Systems. Jim is the vice president of product and solutions marketing for Unified Computing Systems at Cisco. He leads and drives marketing initiatives for UCS and partner solutions marketing (including EMC, Intel, NetApp, SAP, Microsoft, and VCE.)

Jim is focused on building a vision for organizational success and executing marketing strategies measured by achievement of UCS revenue, market share, and growth. He has a... Read More.

Wes McKinney
Wes McKinney (Two Sigma Investments), @wesmckinn

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library and a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera and was the founder and CEO of DataPad.

Eric McNulty
Eric McNulty (Richer Earth), @richerearth

Director of Research, Harvard’s National Preparedness Leadership Initiative
Contributing Editor, Strategy+Business Magazine
Contributing Editor, Business Review (China)
Contributing Editor, Center for Higher Ambition Leadership
Former Contributing Editor, Harvard Business Publishing

Hussein Mehanna
Hussein Mehanna (Facebook)

Hussein Mehanna is an engineering manager at Facebook, where he founded and manages the Applied Machine Learning platform team. Hussein started as the original developer on the team, which quickly developed from an ads-focused ML platform to a Facebook-wide platform. Prior to Facebook, Hussein worked as a software engineer for Bing, Microsoft. He is a holder of a masters degree in speech recognition from the University of Cambridge, UK.

Gian Merlino

Gian Merlino is CTO and cofounder of Imply and is one of the original committers of the Druid project. Previously, he worked at Metamarkets and Yahoo. Gian holds a BS in computer science from the California Institute of Technology.

Katherine Milkman
Katherine Milkman (Wharton School at the University of Pennsylvania), @Katy_Milkman

Katherine Milkman is a tenured associate professor at the Wharton School at the University of Pennsylvania, and the winner of numerous research and teaching awards. Her research relies heavily on big data to document various ways in which individuals systematically make counterintuitive choices. Before becoming an academic, she was a Division I collegiate athlete and one of the top 120 women’s junior tennis players in the U.S. She also worked briefly in investment banking at Goldman Sachs and equity research at Morgan Stanley.

Katherine has published over two dozen research papers in the last decade in leading social science journals.... Read More.

Prat Moghe

Prat Moghe is the founder and CEO of Cazena. Prat is a successful big data entrepreneur with nearly 20 years of experience inventing next-generation products and building strong teams in the technology sector. Prior to founding Cazena, as SVP of strategy, products, and marketing at Netezza, Prat led a 400-person team that launched the latest-generation Netezza appliance, which led the market in price and performance. Netezza was acquired by IBM for $1.7B in 2010.

Dennis Mortensen is a personal assistant who schedules meetings for you.

Bill  Moschella
Bill Moschella (Evariant)

Bill Moschella is the co-founder and chief executive officer of Evariant, Inc., a company that provides a SaaS healthcare CRM and big data platform to healthcare providers. He has used his leadership and entrepreneurial skills to build Evariant into a dominant healthcare market leader by helping these organizations execute successful patient and physician engagement strategies. The company has more than doubled its revenue for software subscriptions each year for the past two years, grown headcount year-over-year (2014 to 2015) by 144%, and was recognized as being one of the fastest growing organizations in the state of Connecticut for three... Read More.

Andreas Mueller
Andreas Mueller (NYU, scikit-learn)
PyData at Strata Tutorial

Andreas Mueller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at New York University. In the last four years, he has been maintainer and one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely-used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and... Read More.

Terry Mughan
Terry Mughan (CLIA Consulting)

Terry Mughan (PhD) is Associate Professor in the School of Business at Royal Roads University and Associate Fellow at the Centre for Global Studies, University of Victoria, both in Canada. His research interests have revolved around the place of language and cultural skills in business
internationalisation strategies, including a 1200 company study of companies in the East of England. He has authored several research reports for policy bodies such as UK Trade and Investment and the OECD (Organisation for Economic Cooperation and Development) on the internationalization of small and medium-sized companies (SMEs). He has published articles in The... Read More.

Jacques Nadeau
Jacques Nadeau (Dremio)

Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.

Neha Narkhede

Neha Narkhede is the cofounder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Previously, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s petabyte-scale streaming infrastructure built on top of Apache Kafka and Apache Samza. Neha specializes in building and scaling large distributed systems and is one of the initial authors of Apache Kafka. A distributed systems engineer by training, Neha works with data scientists, analysts, and business professionals to move the needle on results.

Paco Nathan
Paco Nathan (, @pacoid
Data 101 Tutorial
Data 101 Data 101

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and... Read More.

Jan Neumann

Jan Neumann leads Comcast’s Applied Artificial Intelligence Research Group, which combines large-scale machine learning, deep learning, NLP, and computer vision to develop novel algorithms and product concepts such as voice interfaces, virtual assistants, and video and IoT analytics that improve the experience of Comcast’s customers. Previously, Jan worked for Siemens Corporate Research on various computer vision-related projects, such as driver assistance systems and video surveillance. He has published over 20 papers in scientific conferences and journals and is a frequent speaker on machine learning and data science. He holds a PhD in computer science from the University of Maryland, College... Read More.

Billy Newport (Goldman Sachs)

Billy Newport has been at Goldman Sachs as a Technology Fellow since 2011, working on big data and graph problems at the firm. Prior to that he was a Distinguished Engineer at IBM for 10 years, where he worked primarily on distributed systems and high availability for the WebSphere platform. He graduated from Waterford Institute of Technology with a first class honor degree in industrial computing in 1989.

Christopher Nguyen

Christopher Nguyen is president and CEO of Arimo, a Panasonic company in Silicon Valley, where he leads the development of AI platforms and solutions for the enterprise. Previously, he was engineering director of Google Apps and cofounded two other successful startups. As a professor, Christopher cofounded the Computer Engineering Program at HKUST. He holds a BS (summa cum laude) from the University of California, Berkeley, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1.

Jack Norris
Jack Norris (MapR Technologies), @Norrisjack

Jack Norris is the senior vice president of data and applications at MapR Technologies, where he works with leading customers and partners worldwide to drive the understanding and adoption of new applications enabled by data and analytics. With over 25 years of enterprise software experience, he has demonstrated success from identifying new markets to defining new products to launching companies. Jack’s background includes senior executive positions with establishing analytic, virtualization, and storage companies. Jack was an early employee of MapR Technologies and held senior executive roles with EMC, Brio Technology, and Bain and Company.

Robert Novak

Robert Novak is a consulting systems engineer for big data in the Cisco Americas Partner Organization. In short, he’s told, a Big Data Unicorn. He has been a sysadmin for 20 years, a big data admin since 2003 or so, a Hadoop admin since 2009, and a Cisco UCS C-Series admin since Christmas 2011. Robert brings the viewpoint of the practitioner and customer into the sales, channel partner, and independent software partner fields at Cisco to integrate Hadoop, big data, and analytics into Cisco’s data center technologies, especially UCS.

Amy O'Connor
Amy O'Connor (Cloudera), @imamyo

Amy O’Connor is a big data evangelist and telecommunications specialist at Cloudera, the leading big data vendor. She advises customers globally as they introduce big data solutions and adopt enterprise-wide big data delivery capabilities. Amy was recently named one of Information Management’s 10 Big Data Experts to Know. Prior to joining Cloudera, Amy built and ran Nokia’s big data team, developing and managing Nokia’s data assets and leading a team of data scientists to drive insights. Previously, Amy was vice president of services marketing and also led strategy for the software and storage business units of Sun Microsystems.

John O'Duinn
John O'Duinn (Release Mechanix), @joduinn

As a software developer, systems architect, director and now founder, John O’Duinn has designed and helped build release engineering infrastructure that is practical, reliable, cross-platform, scalable and efficient. In addition to technology, John loves growing a culture where distributed teams and individuals work seamlessly together no matter where they are physically in the world. At Mozilla this involved building a tightly knit team of 18 release engineers in 14 cities, in four non-adjacent timezones working with the geo-distributed Mozilla open source project. At Hortonworks, the team was in four cities, in three non-adjacent time zones, working closely with the geo-distributed... Read More.

Cathy O'Neil
Cathy O'Neil (Weapons of Math Destruction)

Cathy O’Neil a data scientist for the startup media company Intent Media. Cathy began her career as a postdoc in MIT’s Math Department. She has been a professor at Barnard College, where she published a number of research papers in arithmetic algebraic geometry, and worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis and for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. Cathy holds a PhD in math from Harvard.

Stephen O'Sullivan
Stephen O'Sullivan (Data Whisperers), @steveos

A leading expert on big data architectures, Stephen O’Sullivan has 25 years of experience creating scalable, high-availability data and applications solutions. A veteran of Silicon Valley Data Science, @WalmartLabs, Sun, and Yahoo. Stephen is an independent adviser to enterprises on all things data..

Matthew Ocko
Matthew Ocko (Data Collective), @mattocko

Matt Ocko has three decades of experience as a technology entrepreneur and VC. Over his career, he has invested in Cotendo, Zynga, Facebook, XenSource, UltraDNS, FlashSoft, Fortinet, Aggregate Knowledge, Virtuata, DataMirror, Couchbase, Ayasdi, Kenshoo, D-Wave Systems, MetaMarkets, Uber, AngelList, and many others, including multiple acquisitions by Google, Facebook, Netapp, and other Fortune 1000 tech companies. Matt has been active in helping develop China’s venture capital and technology regulatory framework for two decades. He is the founder of Da Vinci Systems, a pioneering email software vendor with over 1 million users worldwide prior to its acquisition, and holds over 40 granted... Read More.

Andrew Odewahn
Andrew Odewahn (O'Reilly Media), @odewahn
PyData at Strata Tutorial

Andrew Odewahn is the CTO of O’Reilly Media, where he helps define and create the new products, services, and business models that will help O’Reilly continue to make the transition to an increasingly digital future. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing. Andrew holds an MBA from New York University and a degree in computer science from the University of Alabama. He’s also thru-hiked the Appalachian Trail from Georgia to Maine.

Travis Oliphant
Travis Oliphant (Anaconda)
PyData at Strata Tutorial

Travis Oliphant has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive Guide to NumPy.

Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of... Read More.

Mike Olson
Mike Olson (Cloudera), @mikeolson

Mike Olson cofounded Cloudera in 2008 and served as its CEO until 2013, when he took on his current role of chief strategy officer. As CSO, Mike is responsible for Cloudera’s product strategy, open source leadership, engineering alignment, and direct engagement with customers. Previously, Mike was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine, and he spent two years at Oracle Corporation as vice president for embedded technologies after Oracle’s acquisition of Sleepycat. Prior to joining Sleepycat, Mike held technical and business positions at database vendors Britton Lee, Illustra Information Technologies,... Read More.

Peter Olson

Peter Olson is a director and creative technologist at IDEO where he focuses on creative, practical, and human-centered applications of technology for clients and the larger design and technical community. He is passionate about using technology and data as tools for storytelling, insight, communication, and understanding.

Prior to joining IDEO, Peter was a founder of and served as a vice president of technology for Marvel Entertainment’s Digital Media Group, where he helped drive innovation and technical strategy within the larger Marvel and Disney organizations. Peter has additionally worked as a consultant for a variety of companies and as... Read More.

srowen om
srowen om (Cloudera), @sean_r_owen

Sean Owen is director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the coauthor of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.

David Paige
David Paige (Cox Automotive)

David Paige is the senior director of enterprise data platform at Cox Automotive, Inc. He has over 20 years of experience in distributed systems, having led many innovative data platform projects. His group of architects builds out and manages the technical infrastructure for the company’s analytics and data platforms. The Cox Automotive, Inc. data and analytic infrastructure includes various big data technologies (Hadoop, Hive, Spark, Pig, HBase, and others), and traditional BI tools (Netezza, MicroStrategy, SAS, etc.).

Iulia Pasov

Iulia Pasov is a machine learning engineer at Avira, the German antivirus company, where she has worked since December 2014. She likes to tackle complex machine learning and natural language processing tasks, and has experience in web development as well. Iulia holds two masters degrees in artificial intelligence, one from the Politechnic University of Bucharest, and the other from Lumiere Lyon 2 University and Polytech, Nantes.

DJ Patil
DJ Patil (White House Office of Science and Technology Policy), @dpatil
Data and Ethics Session

DJ Patil is the chief data scientist and deputy chief technology officer for data policy at the White House Office of Science and Technology Policy, where he advises on policies and practices to maintain US leadership in technology and innovation, fosters partnerships to maximize the nation’s return on its investment in data, and helps to attract and retain the best minds in data science to serve the public. Since joining OSTP, DJ has collaborated with colleagues across government, including the chief information officer and the US Digital Service as part of the Obama administration’s commitment to open data and... Read More.

Pamela Pavliscak

Pamela Pavliscak (pronounced pav-li-check) is the CEO of SoundingBox, where she advises designers, developers, and decision makers on how to create technologies with emotional intelligence. Pamela is also on the faculty at Pratt Institute’s School of Information and is leading an effort for IEEE Standards for ethics and artificial intelligence. Pamela explores our conflicted and emotional relationship with technology and often speaks on creativity in the digital age, generation Z, and emotion and technology, most recently at SXSW and Collision.

Arthur Peng (Intel)

Arthur Peng is a software engineer at Intel, where he works on applications of Intel’s CPU technology to Impala.

Mike Percy
Mike Percy (Cloudera), @mike_percy

Mike Percy is software engineer currently working at Cloudera on Kudu, a native columnar database for the Hadoop ecosystem. He is also a committer and PMC member on Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! building a content recommendation system on top of Hadoop and HBase. Mike holds an MS in Computer Science from Stanford University and a BS in Computer Science from the University of California, Santa Cruz.

Kevin Perko
Kevin Perko (Scribd)

Kevin Perko is the Data Team Lead at Scribd, the leading subscription reading service. He focuses on evaluating search engine performance, building data pipelines, and democratizing access to data through various initiatives including Reddit-style AMAs, emails, and individual outreach. With nearly a decade of analytics experience, Kevin has worked for a multitude of Bay Area startups including Eventbrite, GREE, and He has a background in Finance from Santa Clara University and has volunteered with The University of Cape Town to teach computer skills in the townships of South Africa.

Claudia Perlich

Prior to joining Dstillery (former Media6Degrees), Claudia Perlich spent five years working at the Data Analytics Research group at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She has been published in over 30 scientific publications and holds multiple patents in the area of machine learning. Claudia has won many data mining competitions, including the prestigious 2007 KDD CUP on movie ratings, the 2008 KDD CUP on breast-cancer detection, and the 2009 KDD CUP on churn and propensity predictions for... Read More.

Steven Petrevski
Steven Petrevski (First Data Corporation)

SVP/GM Security and Fraud Solutions at First Data Corporation.

Vu Pham
Vu Pham (Adatao, Inc)

Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.

Thomas Phelan
Thomas Phelan (HPE BlueData), @tapbluedata

Thomas Phelan is cofounder and chief architect of BlueData. Previously, a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially availably 64-bit file system; and an early employee at VMware, a senior staff engineer and a key member of the ESX storage architecture team where he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture as well as led teams working on many key storage initiatives such as the cloud storage gateway and vFlash.

Piotr Piotr
Piotr Piotr (

Piotr Niedzwiedz is a founder and CTO of, a big data science company based in Menlo Park, California, and Warsaw, Poland. provides machine-learning and deep learning consulting and has developed Seahorse, a scalable data analytics workbench powered by Apache Spark, which lets users build data-processing workflows without needing to write any code. Piotr is a successful entrepreneur. Prior to, he cofounded CodiLime, an IT company delivering software services in networks and security areas. Previously, he worked as a software engineer at Google and Facebook on projects related to big data and distributed systems. He supports and... Read More.

Susanna Pirttikangas
Susanna Pirttikangas (University of Oulu), @mspTW

Susanna Pirttikangas, D. Sc. (Tech.) received her PhD in embedded systems from the University of Oulu, Finland. Her post-doctoral visits were to Japan (Waseda University, 2004-2005 and Tokyo Denki University, 2008) and China (Tsinghua University, 2011). She is a co-leader of the Interactive Spaces research group within the Department of Computer Science and Engineering. The group is lead by Dean Jukka Riekki, and other co-leaders are Senior Research Fellow Mika Rautiainen and Iván Sánchez. In the team, Susanna works as a data scientist specializing in situation awareness. She has experience in developing methodology to de-noise, fuse, segment, and classify real-time... Read More.

Jeff Pollock
Jeff Pollock (Oracle)

Jeff Pollock is an expert data integration technology leader. He is currently vice president of product management for the Oracle Data Integration & Governance business unit, and previously was responsible for all IBM Information Integration & Governance products. Prior to Oracle and IBM, Jeff was an independent architect for the U.S. Defense Department, vice president of technology at Cerebra, and chief technical officer of Modulant – he has been developing data integration, semantic middleware, and inference-driven SOA platforms since 2001. Prior to that, Mr. Pollock was a principal rngineer with Modem Media and senior architect with Ernst... Read More.

Jules Polonetsky
Jules Polonetsky (Future of Privacy Forum), @JulesPolonetsky

Jules Polonetsky serves as executive director and co-chair of the Future of Privacy Forum, a Washington, D.C.-based think tank that seeks to advance responsible data practices. FPF is supported by the chief privacy officers of more than 110 leading companies, several foundations, as well as by an advisory board comprised of the country’s leading academics and advocates. FPF’s current projects focus on big data, mobile, location, apps, the internet of things, wearables, de-identification, connected cars and student privacy.

His previous roles have included serving as chief privacy officer at AOL and before that at DoubleClick; as consumer affairs... Read More.

Beate Porst
Beate Porst (IBM)

Beate Porst is the lead product manager for data integration in the information integration and governance group at IBM. Her primary focus is on setting the vision, strategy, and tactical advancement of IBM’s data preparation and integration technology. Prior to being a product manager, Beate was a solution architect in the IBM Advanced Engineering and Solution group, leading the architecture and development of reusable assets to support a richer integration amongst IBM Information Management products. Beate has more then 15 years experience in data management, virtualization, integration, and governance, in both engineering and product management roles. Beate... Read More.

Bill Porto (RedPoint Global)

Bill Porto is an expert in applying computational intelligence to solve real-world problems across various problem domains. As senior analytics engineer at RedPoint Global, he develops automated business optimization software that incorporates evolutionary optimization, neural networks, and a host of other non-traditional machine learning techniques. An applied mathematician by trade, Bill has created adaptive solutions to dynamic problems for resource allocation, pattern recognition, drug discovery, and logistics scheduling. Before RedPoint, he was president of Natural Selection, Inc. where he received the 2010 FDA Honor Award for his work on their PREDICT automated risk-assessment system.

Jake Porway
Jake Porway (DataKind)

Jake Porway is the founder and executive director of DataKind, a nonprofit that harnesses the power of data science in the service of humanity. He is an alum of the New York Times R&D Lab and has worked at Google and Bell Labs. A recognized leader in the Data for Good Movement, he has spoken at IBM, Microsoft, Google, and the White House. Jake is also a PopTech Social Innovation fellow and a National Geographic Emerging Explorer. He holds a BS in computer science from Columbia University and an MS and PhD in statistics from UCLA.

James Powell
PyData at Strata Tutorial

James Powell is a NYC-based Python programmer and master trainer with experience in quantitative finance and data science. James is very active in the Python community in NYC, where he organizes NYC Python (the world’s largest and most active Python meetup group). He also works with the numeric and scientific computing nonprofit NumFOCUS to help organize the PyData conference series. James is a frequent speaker at Python conferences and has been invited to speak at events such as PyData New York, PyData London, PyGotham, the conference For Python Quants, and PyCon Spain.

Sean Power
Sean Power (Watching Websites), @seanpower

Sean Power is a consultant, analyst, author, and speaker. He is the co-founder of Watching Websites, a boutique consulting firm focusing on early stage startups, products, and non-profits as they emerge and mature in their niches. He has built professional services organizations, and traveled across North America delivering engagements to Fortune 1000 companies. He helps executives understand their competitive landscape and the future of their industry. He has done technical editing for Troubleshooting Linux Firewall for Addison-Wesley, and co-authored Complete Web Monitoring with Alistair Croll for O’Reilly Media.

Sean has had first-hand experience creating and implemented social computing strategies with... Read More.

Arvind Prabhakar

Arvind Prabhakar is co-founder and CTO at StreamSets, a Big Data Startup based in San Francisco. He is an Apache Software Foundation member and a PMC member on Flume, Sqoop, Storm and MetaModel projects.

Prior to starting StreamSets, Arvind held many roles at Cloudera ranging from software engineer to director of engineering. Before Cloudera, Arvind was an architect in the core platform engineering team at Informatica and a staff engineer at Sun Microsystems.

Ravi Prakash
Ravi Prakash (Altiscale)

Ravi Prakash is a Hadoop committer and a senior software engineer at Altiscale. Previously, he was a senior software developer at Yahoo!, where he worked on Hadoop Core development (HDFS, MapReduce, and YARN). Ravi has also worked in software development at Tavare Research Labs and Motorola. Ravi has a BS in computer science from GGS Indraprastha University and an MS in computer science from the University of Southern California.

Peter Prettenhofer
Peter Prettenhofer (DataRobot)
PyData at Strata Tutorial

Peter Prettenhofer is a data scientist / software engineer at DataRobot. He studied computer science at Graz University of Technology, Austria and Bauhaus University Weimar, Germany, focusing on machine learning and natural language processing. He is a contributor to scikit-learn where he co-authored a number of modules such as Gradient Boosted Regression Trees, Stochastic Gradient Descent, and Decision Trees.

Randall Prium
Randall Prium (Calvin College)
R Day Tutorial

Randall Pruim is a professor of mathematics and statistics at Calvin College, author of Foundations and Applications of Statistics: An Introduction Using R, and the maintainer of several R packages, including fastR and mosaic. His research interests include statistical computing and statistics education (especially for students in the natural sciences).

Evan Prodromou
Evan Prodromou (, @evanpro

Evan Prodromou is founder and CTO of, an AI-as-a-service startup based in Montreal. His previous startups include Wikitravel, StatusNet, where he led development of StatusNet and Open Source social software, and Breather. He is chair of the W3C working group on Social Web standards.

Greg Rahn
Greg Rahn (Cloudera), @gregrahn

Greg Rahn has worked as performance engineer for over a decade on parallel RDBMS systems and Hadoop SQL engines. He spent eight years running competitive data warehouse benchmarks at Oracle as a member of the esteemed Real-World Performance Group as well as working on Impala performance while at Cloudera. Currently he is leading product at Snowflake Computing.

Karthik Ramasamy
Karthik Ramasamy (Streamlio)

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network... Read More.

Anand Ranganathan

Anand Ranganathan is the director of solutions at Unscrambl, LLC, which is a startup building solutions incorporating a variety of big data platforms and analytics for different industries. He is a data scientist, big data developer, architect, and researcher rolled into one person. He has worked with over 100 customers worldwide to design, implement, and deploy big data solutions, involving technologies such as IBM InfoSphere Streams, Hadoop, and lately, Spark.

Before joining Unscrambl, Anand was a global technical ambassador for big data in IBM’s Software Group. He evangelized IBM’s big data products and services, and led WW technical... Read More.

Jairam Ranganathan
Jairam Ranganathan (Cloudera)

Jai Ranganathan is the director of product strategy at Cloudera, where he is responsible for planning the future roadmap of Cloudera products. Before Cloudera, he spent a decade at VMware, where among other things he was one of the developers on vMotion, storage vMotion, and the distributed management framework for vSphere.

Nirmal Ranganathan
Nirmal Ranganathan (Rackspace)

Nirmal Ranganathan is a Principal Engineer working on the Data Stores Platform at Rackspace. He constantly works with various teams within Rackspace and customers alike, directing them on how best to take advantage of Big Data technologies. Nirmal plays an active role in the local Austin tech scene by volunteering for organizing meetups and other events in the Austin area. Nirmal was one of the founding members of Trove (Openstack’s Database as a Service) and has contributed to various Openstack initiatives, Cassandra, Alluxio and Thrift.

Kamalesh Rao
Kamalesh Rao (DataKind)

Kamalesh Rao is a North Carolina native who moved to New York City during the second term of Grover Cleveland. He entered the family trade because he was not cool, talented, or brave enough to attempt a career in something interesting or worthwhile like dance, parkour, or accounting. He really likes to write about himself in the third person.

Bruce Reading
Bruce Reading (VoltDB)

As president and CEO of VoltDB, Bruce Reading brings nearly 30 years of experience building teams and creating business value in a variety of strategic roles including sales, marketing, asset management, mergers & acquisitions, and operations.

Before joining VoltDB, Bruce was senior vice president and general manager for Compuware Corporation (formerly NASDAQ:CPWR). Prior to Compuware, he spent six years as president, chief operating officer, and senior vice president at Gomez, Inc. Previously, Bruce served in senior management capacities at Access International, Cayman Systems, and Dictaphone Corporation. A native Canadian, Bruce maintains an active role in the startup... Read More.

Jeff Reback (Continuum Analytics)
PyData at Strata Tutorial

Jeff Reback is a senior software developer for Continuum Analytics. As a former quant, he has lots of experiencing build financial trading systems, using Python, and working with very large data. Jeff has been a core committer to the pandas project for the past few years and currently manages the project.

Ben Recht
Ben Recht (University of California, Berkeley)

Ben Recht is an associate professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California, Berkeley. Ben’s research focuses on scalable computational tools for large-scale data analysis, statistical signal processing, and machine learning. He explores the intersections of convex optimization, mathematical statistics, and randomized algorithms. He is particularly interested in simplifying the analysis and manipulation of noisy and incomplete data by exploiting domain-specific knowledge and prior information about structure. Ben is the recipient of an NSF Career Award, an Alfred P. Sloan Research Fellowship, and the 2012 SIAM/Read More.

Harper Reed
Harper Reed (Modest), @harper

Harper Reed is a hacker/engineer who builds paradigm-shifting tech and leads others to do the same. Harper loves using the enormity of the internet to bring people together, whether as CTO of Obama for America, CTO at, or on his own projects. Harper and his team created Dashboard, a site that connects volunteer teams and acts as an online component of the field office. Harper can often be found playing with new technology, looking for something to hack, or enjoying life in Chicago with his amazing wife. Currently Harper is focusing on defining the future of commerce... Read More.

kim rees
kim rees (Periscopic), @krees

Kim Rees is a founding partner of Periscopic, an award-winning information visualization firm. Their work has been featured in the MOMA, CommArts, PRINT, Adobe Success Stories, and others.

Kim is a prominent individual in the data visualization community. She has been featured in CommArts and the Huffington Post, and has presented at several industry events including Strata, Eyeo, Visualized, and OpenVis among others. She also runs the popular Portland Data Visualization Meetup. Kim received her BA in computer science from New York University.

Alex Rice
Alex Rice (HackerOne), @senorarroz

Alex Rice is a cofounder and chief technology officer at HackerOne, which provides a platform that enables organizations to build strong relationships with a community of security experts. Alex is responsible for developing the HackerOne technology vision, driving engineering efforts, and counseling customers as they build world-class security programs. Previously, Alex worked at Facebook for over six years, where he founded the product security team, built one of the industry’s most successful security programs, and introduced new transport layer encryption used by more than a billion users. Alex also serves on the board of the Internet Bug Bounty, a nonprofit... Read More.

Henry Robinson
Henry Robinson (Cloudera), @HenryR

Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.

John Rollins
John Rollins (IBM)

John B. Rollins, Ph.D. is a data scientist in the IBM Analytics division of IBM. His background is in the fields of data mining, engineering, and econometrics in many industries. He holds seven patents, and has authored a best-selling engineering textbook and many technical papers. He holds doctoral degrees in economics and petroleum engineering from Texas A&M University.

Stephen Romanoff
Stephen Romanoff (Capital One )

Stephen Romanoff is a director in Capital One’s Technology organization. He leads teams in developing data management solutions for Capital One’s big data initiatives. Before joining Capital One, he was a consultant specializing in big data capabilities—development, architecture, and strategy—for numerous federal government agencies. He has degrees from Emory University and the University of Virginia.

Mike Rosenthal
Mike Rosenthal (Mick Management), @mikearosenthal

Mike Rosenthal spent the last six years overseeing brand partnerships and digital strategy for the band OK Go before joining Mick Management in 2015. In his role as head of strategic marketing at Mick, Mike works with a roster of artists including Walk the Moon, Of Monsters and Men, Leon Bridges, and Childish Gambino in developing new approaches to artist development and partnership strategy.

Jacques Roy
Jacques Roy (IBM)

Jacques Roy is a member of the IBM worldwide analytics platform technical team, specializing in big data streaming analytics. He has also worked in many technology areas including operating systems, databases, and application development. He is the author of multiple books, with the most recent being The Power of Now: Real-Time Analytics and IBM InfoSphere Streams. He is also a regular contributor to IBM Data magazine. Jacques has been a presenter at many conferences including IBM’s Information on Demand (IOD).

Karen Rubin
Karen Rubin (Quantopian), @KarenRubin

Karen Rubin has spent the past 10 years building products and managing product development teams. She is currently on the product team at Quantopian, building the world’s first algorithmic trading platform in the cloud. She is currently focused on a new IPython research platform that will allow quants to access curated financial data in an interactive research environment.

Before coming to Quantopian, Karen spent time working on the investing team at Matrix Partners, where she helped evaluate potential investments and supported portfolio companies. She also spent five years on the product team at HubSpot, where she was responsible for building... Read More.

Laurel Ruma
Laurel Ruma (O'Reilly Media), @laurelatoreilly
Closing remarks Cultivate
Closing remarks Cultivate
Welcome Cultivate
Welcome Cultivate

Laurel Ruma is a content director at O’Reilly Media. Laurel has chaired a number of O’Reilly conferences and workshops, including Next:Economy, Cultivate, Where 2.0, OSCON Java, and Gov 2.0 Expo.

Sandy Ryza
Sandy Ryza (Clover Health), @s_ryz

Sandy Ryza is a senior data scientist at Clover Health. He was previously at Cloudera doing engineering and data science. He is an author of O’Reilly’s Advanced Analytics with Spark, as well as a Spark committer and member of the Hadoop project management committee. He graduated Phi Beta Kappa from Brown University.

Melissa  Santos
Melissa Santos (Big Cartel), @ansate

Melissa Santos has over a decade of experience with all parts of the data pipeline, from ETLs to modeling. Her role as a data scientist at Big Cartel involves teaching both engineers and nontechnical people how to get the data they need. Melissa holds a PhD in applied math.

Rahul Saxena
Rahul Saxena (Saavn)

Rahul Saxena is the engineering lead at Saavn for Search and Recommendations. His team architects and manages search and recommendation algorithms. They work on technologies like Solr, Neo4j, and Mahout.

Peter Schlampp

Peter Schlampp is passionate about designing products that change the way users live, work, and interact with their world. He experienced first-hand the utility and complexity of big data while building products to secure enterprise networks. Peter has led Product and Marketing teams at Solera Networks, IronPort Systems, and Cisco Systems.

Bill Schmarzo
Bill Schmarzo (EMC Consulting), @schmarzo

Bill Schmarzo, author of the upcoming Big Data: Understanding How Data Powers Big Business, to be published by Wiley, is responsible for setting the strategy and defining the service line offerings and capabilities for the EMC Consulting Enterprise Information Management and Analytics service line. He’s written several white papers and is a frequent speaker on the use of big data and advanced analytics to power an organization’s key business initiatives.

Bill has more than two decades of experience in data warehousing, BI, and analytics applications. Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives... Read More.

Eric Schmidt
Eric Schmidt (Google)

Eric Schmidt is the product management lead for Cloud Dataflow on the Cloud engineering team at Google, where his primary role is to help shape the future of fully managed, large-scale data processing. Eric spends the majority of his time working with existing cloud customers and on-premises developers who are moving their MapReduce and related data processing workloads to the cloud. He led the announcement of Cloud Dataflow (as Google I/O’s 2014 keynote) with the development of a real-time sentiment analysis and results prediction framework for the 2014 World Cup. Eric has a deep passion for user interaction modeling, data... Read More.

Jim Scott
Jim Scott (NVIDIA), @kingmesal

Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).

Yonik Seeley
Yonik Seeley (Cloudera)

Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging “big search” technologies into the many components comprising the Cloudera Enterprise Data Hub (EDH). Yonik was previously chief open source architect and cofounder at LucidWorks.

Michael Segel
Michael Segel (Segel & Associates.)

Michael Segel has been working with Hadoop since 2009 at various companies as a solution architect, solving the tough challenges. He is currently globe-trotting as a principal architect with Segel & Associates, looking for the next challenging problem to solve. Michael spends his free time thinking about solutions as he walks his dogs around the River North neighborhood in Chicago. While the founder of CHUG (Chicago area Hadoop User Group), Michael is also in the process of starting a Big Data Anonymous work group for those recovering big data-holics.

Jonathan Seidman

Jonathan Seidman is a software engineer on the cloud team at Cloudera. Previously, he was a lead engineer on the big data team at Orbitz, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data Meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly.

Evan Selinger
Evan Selinger (Rochester Institute of Technology), @EvanSelinger

Evan Selinger is an associate professor of philosophy at Rochester Institute of Technology, where he is also affiliated with the Center for Media, Arts, Games, Interaction, and Creativity (MAGIC). He’s also a fellow at The Institute for Ethics and Emerging Technology, and serves on the Advisory Board of The Future of Privacy Forum. Evan’s research primarily addresses ethical issues concerning technology, science, the law, expertise, and sustainability.

A prolific academic author, Evan also cares deeply about public engagement, and regularly writes for popular magazines, newspapers, and blogs, including: Wired, The Atlantic, Slate, The Wall Street Journal, The Nation, Salon,... Read More.

Gwen Shapira
Gwen Shapira (Confluent), @gwenshap

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring... Read More.

Vin Sharma
Vin Sharma (Intel)

Vin Sharma is the director of machine learning solutions in the Data Center group at Intel, where he focuses on autonomous driving and automated trading. Vin has helped build data center infrastructure software platforms—most recently the Trusted Analytics Platform—and has helped drive enterprise adoption of open source software like Linux, KVM, OpenStack, Hadoop and analytics for over 20 years. Before joining Intel, Vin held various engineering and management roles at HP for 15 years, building enterprise software products based on Linux, Java, XML, and other open source software.

Tomer Shiran

Tomer Shiran is cofounder and CEO of Dremio. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, road map, and new feature development. As a member of the executive team, he helped grow the company from 5 employees to over 300 employees and 700 enterprise customers. Previously, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He’s the author of five US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from the Technion, the... Read More.

Shiva Shivakumar
Shiva Shivakumar (Urban Engines)

Shiva is CEO/Co-Founder of Urban Engines, a startup focused on improving urban mobility. Prior to Urban Engines, from 2001 through 2010, Shiva was a Vice President and Distinguished Entrepreneur at Google, helping to build AdSense, Cloud Apps & ‘big data’ infrastructure such as

Dremel/WebIQ, and research and development centers across the world. With a deep interest in open data, Shiva created to surface Web data with an industry-first collaboration between Google, Yahoo!, and Microsoft. Shiva has a Ph.D. in Computer Science from Stanford University, where he was awarded the Samuel Thesis Award.

Gary Short
Gary Short (Microsoft), @garyshort

Gary Short is a data solution architect for Microsoft, where he specializes in machine learning and big data on the Azure Platform. Gary is interested in data science in all forms, especially computational linguistics and social network analysis.

Hari Shreedharan
Hari Shreedharan (Cloudera)

Hari Shreedharan is a software engineer at Cloudera, an Apache Flume committer/PMC member, and a Spark contributor. He is the author of the O’Reilly Media book Using Flume.

Rosaria Silipo
Rosaria Silipo ( AG), @DMR_Rosaria

Rosaria Silipo (LinkedIn) is not only an expert in data mining, machine learning, reporting, and data warehousing, she has become a recognized expert on the KNIME data mining engine, on which she has published three books: KNIME Beginner’s Luck, The KNIME Cookbook, and The KNIME Booklet for SAS Users.

Previously Dr. Silipo worked as a freelance data analyst for many companies throughout Europe. She has also led the SAS development group at Viseca (Zürich), implemented the speech-to-text and text-to-speech interfaces in C# at Spoken Translation (Berkeley, California), and developed a number of speech recognition... Read More.

Joseph Sirosh

Joseph Sirosh is the corporate vice president of the Cloud AI Platform at Microsoft, where he leads the company’s enterprise AI strategy and products such as Azure Machine Learning, Azure Cognitive Services, Azure Search, and Bot Framework. Previously, he was the corporate vice president for Microsoft’s Data Platform; the vice president for Amazon’s Global Inventory Platform, responsible for the science and software behind Amazon’s supply chain and order fulfillment systems, as well as the central Machine Learning Group, which he built and led; and the vice president of research and development at Fair Isaac Corp., where he led R&D projects... Read More.

Ryan Smith (DigitalGlobe)

As a member of the U.S. Government Tech Solutions team at DigitalGlobe, Ryan Smith leads cross-functional teams to build analytic solutions for various customers. These solutions focus on extracting insights from customer, commercial, and open data to support decision makers.

Scott Sokoloff

Scott Sokoloff transforms mountains of data on consumer behavior into actionable data-driven insights. His methodologies allow for the attribution of online activity to offline behavior and vice versa. He has worked with many industry giants including Microsoft, Capital One, Dominos Pizza, Burger King, Visa, PayPal, Forbes, Constant Contact and countless others combining best practices in analytics, econometrics, statistics, data science, and sales forecasting. The focus of his work is listening to what consumers are saying via their direct actions, to determine how they will behave in the future in order to maximize the profitability of decision making.

Offering a track... Read More.

Dima Spivak
Dima Spivak (StreamSets), @dimaspivak

Dima Spivak is a software engineer at StreamSets, where he works on test infrastructure. He is also a committer and PMC member on the Apache HBase project. Before joining StreamSets, he developed test infrastructure at Cloudera.

Srikrishna Sridhar

Krishna Sridhar is a data scientist at Dato. He holds a PhD in computer science from the University of Wisconsin-Madison, where he worked on high-performance software for large-scale problems in mathematical optimization and data analysis. Krishna’s work has been used in applications such as healthcare, industrial production planning, and machine learning.

Jessica Stauth
Jessica Stauth (Quantopian), @jstauth

Jessica Stauth is managing director of research at Quantopian, a crowdsourced quantitative investment firm, where she and her team are responsible for selecting algorithms from the Quantopian community for the company’s portfolio. Previously, Jessica was an equity quant analyst at the StarMine Corporation and director of quant product strategy for Thomson Reuters.

Julie Steele

Julie Steele is Director of Marketing at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Julie was Director of Communications at Silicon Valley Data Science (SVDS), where she was instrumental in developing a strong sense of brand through content and design. Her experience on both sides of the content creation process began when she worked at O’Reilly Media, first as an Acquisitions Editor and then on the Strata conferences. Julie is author and coauthor of multiple books, published reports, and articles on data visualization and data science.

Nathan Stephens
Nathan Stephens (RStudio, Inc.)
R Day Tutorial

Nathan Stephens recently joined RStudio as director of solutions engineering. His background is in applied analytics and consulting. He has experience building data science teams, creating innovative data products, analyzing big data, and architecting analytic platforms. He was an early adopter of R and has introduced it into many organizations. Nathan holds an MS in statistics from Brigham Young University.

Douglas Stradley
Douglas Stradley (Trifacta)

Doug Stradley is the director of customer success at Trifacta. He and his team work with Enterprise customers around the world, helping them wrangle enormous, complex, and nasty mountains of data into usable information. Prior to Trifacta, Doug was the director of customer success at Informatica Cloud. With adoption as a focus, Doug has worked with many Fortune 500 companies to build productive relationships between humans and technology.

Brian Suda
Brian Suda (, @briansuda

Brian Suda is a master informatician currently residing in Reykjavík, Iceland. Since first logging on in the mid-’90s, he has spent a good portion of each day connected to the internet. When he is not hacking on microformats or writing about web technologies, he enjoys taking kite aerial photography. His own little patch of internet can be found at, where many of his past projects, publications, interviews, and crazy ideas can be found.

Jagane Sundar
Jagane Sundar (WANdisco)

Jagane Sundar is the CTO at WANdisco. Jagane has extensive big data, cloud, virtualization, and networking experience. He joined WANdisco through its acquisition of AltoStor, a Hadoop-as-a-service platform company. Previously, Jagane was founder and CEO of AltoScale, a Hadoop- and HBase-as-a-platform company acquired by VertiCloud. His experience with Hadoop began as director of Hadoop performance and operability at Yahoo. Jagane’s accomplishments include creating Livebackup, an open source project for KVM VM backup, developing a user mode TCP stack for Precision I/O, developing the NFS and PPP clients and parts of the TCP stack... Read More.

David Tabacco
David Tabacco (Merck & Co., Inc.)

David Tabacco has worked extensively on the data strategy and the practical aspects of creating a data lake architecture to empower pharmaceutical data analytics. Over the course of this journey, David has focused on building and licensing big data tools that will enable the process of discovering, cataloging, enriching and governing data in the big data platform.

In previous roles, David led identity and access management initiatives such as single-sign-on and federation. Later, he was embedded in the clinical trials solution architecture team to understand and pair technology with business challenges.
David holds a BS in computer science from... Read More.

Matthew Tamayo-Rios

Matthew Tamayo-Rios is founder and CEO of Kryptnostic, a team of determined optimists united by the belief that individuals and organizations can safely leverage their data in the cloud. Previously, Matthew has worked at Microsoft on the OS Security team and at Palantir on the government side of the business. He studied mathematics and computer science at RPI and applied mathematics at the University of Washington. His initial foray into computer security was at the early age of nine, hacking his mother’s point-of-sale retail system to adjust the ice cream inventory.

Andy Terrel
Andy Terrel (NumFOCUS), @aterrel
PyData at Strata Tutorial

Data architect, computational scientist, and technical leader. Andy is the CTO of Fashion Metric, where he is bringing his experience building smart scalable data systems to the fashion industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.

Piotr Teterwak

Piotr Teterwak works on the toolkit development team at Dato. He received a BA in computer science from Dartmouth College, where he conducted work exploring the learning of convolutional deep neural nets with applications in computer vision.

William Theisinger

William Theisinger is VP of engineering for and is responsible for the data collection, processing, warehousing, and reporting of both internal and external data for the company. Prior to, William founded a consulting company that specialized in data collection, processing, and warehousing for both large (Microsoft, AT&T Interactive) and small (Pricegrabber, Idealab) companies. William started his focus on data engineering while working for (an Idealab company) and later for Overture and Yahoo, before returning to Idealab to concentrate on early start-up tech companies.

AnnMarie Thomas
AnnMarie Thomas (School of Engineering and Schulze School of Entrepreneurship, University of St. Thomas), @amptMN

AnnMarie Thomas is an engineering and entrepreneurship professor at the University of St. Thomas, where she directs the Center for Engineering Education and the Playful Learning Lab. She was the founding executive director of the Maker Education nonprofit, and is the author of Making Makers: Kids, Tools, and the Future of Innovation. AnnMarie has an SB in ocean engineering from MIT, and MS and PhD degrees from Caltech.

Joy Thomas

Dr. Joy Thomas, chief data scientist at Apigee, joined the company through the acquisition of InsightsOne, which he co-founded in 2011. Dr. Thomas served as chief scientist at Purpleyogi/Stratify from its founding in 1999 and led the development of advanced mining, clustering, and classification algorithms that formed the basis of the Stratify Legal Discovery Service. After Stratify was acquired by Iron Mountain in 2007, he became chief scientist at Iron Mountain Digital, where he led advanced technology development until 2011. From 1990 to 1999, he was a research staff member at the IBM T.J. Watson Research Center, where he... Read More.

Kathleen Ting
Kathleen Ting (Cloudera)

Kathleen Ting is currently a technical account manager at Cloudera, where she helps strategic customers deploy and use the Hadoop ecosystem in production. Kathleen has spoken on Hadoop, ZooKeeper, and Sqoop at many big data conferences, including Hadoop World, ApacheCon, and OSCON. She’s contributed to several projects in the open source community, is a committer and PMC Member on Sqoop, and is a coauthor of the Apache Sqoop Cookbook.

Ali Tore
Ali Tore (ClearStory Data)

Ali Tore has more than 20 years of experience leading enterprise product development. Most recently, he cofounded and served as CPO and VP of analytics at Model N, a leading provider of revenue management solutions that went public in 2013. Previously, Ali was a product and program manager at NetDynamics (acquired by Sun Microsystems), which pioneered the first Java-based application server. He holds an undergraduate degree in industrial engineering and management science from Northwestern University and a graduate degree in management science and engineering from Stanford University.

Steven Totman

Steven Totman is Cloudera’s big data subject-matter expert, helping companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Steve works with over 180 customers worldwide and helps across verticals in architectures around data management tools, data models, and ethical data usage. Previously, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents in data integration and... Read More.

Florin Trandafir

Florin Trandafir is the global IT program manager for BI and Analytics at Nokia. Florin has a solid background in analytics, working in various positions in telecom and consultancy business. He started as a business intelligence consultant focusing on Nokia operations business and taking further responsibilities in operational and service management for Analytics and Financial Applications in the network field. Furthermore, in a business role, he was leading the Analytics area (focused on SAP technologies) for a major consultancy company in European Nordic market.

Currently, his main responsibility is in the program management area for the Business Intelligence and... Read More.

Shivakumar Vaithyanathan is an IBM fellow and director, Watson Content Services. Prior to his current position he managed the Machine Learning Systems group at IBM Research, and prior to that he started and built the Search & Analytics Department at IBM Almaden, with research focus ranging from Natural Language Processing to Entity Resolution and Machine Learning. Multiple technologies developed in this department ship with several IBM products, including IBM’s big data efforts. He also initiated and ovesaw the build-out of IBM’s next generation Enterprise search technology that currently powers IBM’s external-facing His research is at... Read More.

Bryan Van de Ven
Bryan Van de Ven (Continuum Analytics), @ContinuumIO
PyData at Strata Tutorial

Bryan Van de Ven is a software engineer at Continuum Analytics. Previously, Bryan worked at the Applied Research Labs, developing software for sonar feature detection and classification systems on US Naval submarine platforms, and Enthought, where he worked on problems in financial risk modeling and fluid mixing simulation. Bryan has also worked on an assortment of iOS projects as an independent consultant. Bryan is a core contributor of Bokeh and contributed to the Chaco visualization library. Bryan holds undergraduate degrees in computer science and mathematics from UT Austin and a master’s degree in physics from UCLA.

Victor Vazquez
Victor Vazquez (Airbnb)

Victor Vazquez is a data scientist at Airbnb. His work currently focuses on search and the marketplace; he also has experience with international payments and compliance. He received his BA in economics from MIT.

krish venkataraman

Krish Venkataraman is Syncsort’s chief financial officer and chief operations officer. He has strategic and tactical expertise and demonstrated success across the industry spectrum – from M&A to corporate finance, investment banking, global corporate strategy, equity research, consulting, trading and exchanges, and payment systems.

Prior to joining Syncsort in March 2014, Krish served as chief financial officer and chief administrative officer of global information technology for NYSE Euronext, where he managed significant portions of capital and expenses for the S&P 500 company with $4 billion in global annual revenues.  He helped drive the strategy, governance, financial reporting, and management of a workforce... Read More.

Ashish Verma
Ashish Verma (Deloitte)

Ashish Verma is a managing director at Deloitte, where he leads the Big Data and IoT Analytics practice, building offerings and accelerators to enhance business processes and effectiveness. Ashish has more than 18 years of management consulting experience helping Fortune 100 companies build solutions that focus on addressing complex business problems related to realizing the value of information assets within an enterprise.

Ekaterina Volkova
Ekaterina Volkova (Cornell University)

Ekaterina Volkova is a PhD candidate in finance at Cornell University. Among other topics, Ekaterina is interested in using financial data to track likely instances of insider trading.

Chris Wake
Chris Wake (Spire Global, Inc.), @cjwake

Chris Wake is the head of business operations for Spire, the satellite-powered data company. He joined Spire as its first non-founder in early 2013, and has worked on many areas of its development from initial customer identification to expansion abroad. Prior to joining Spire, Wake spent time working in venture capital, and assisted a selection of early stage technology companies in scaling. He holds an MBA from the University of Oxford, and his work has been featured in Forbes, The Huffington Post, and Wired, among others.

Dean Wampler
Dean Wampler (Lightbend), @deanwampler
Spark on Mesos Session

Dean Wampler is an expert in streaming data systems, focusing on applications of ML/AI. Formerly, he was the vice president of fast data engineering at Lightbend, where he led the development Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent Strata speaker,... Read More.

Andrew Wang
Andrew Wang (Cloudera)

Andrew Wang is a software engineer on the HDFS team at Cloudera. Previously, he was a graduate student in the AMPLab at the University of California, Berkeley, advised by Ion Stoica, where he worked on research related to in-memory caching and quality of service. In his spare time, he enjoys going on bike rides, cooking, and playing guitar.

Peter Wang
Peter Wang (Anaconda), @pwang
PyData at Strata Tutorial

Peter Wang is the cofounder and CTO of Anaconda, where he leads the product engineering team for the Anaconda platform and open source projects including Bokeh and Blaze. Peter’s been developing commercial scientific computing and visualization software for over 15 years and has software design and development experience across a broad variety of areas, including 3-D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences worldwide. Peter... Read More.

Tricia Wang
Tricia Wang (Constellate Data ), @triciawang

With more than 15 years’ experience working with designers, engineers, and scientists, Tricia Wang has a particular interest in designing human-centered systems. Tricia advises organizations on integrating big data and what she calls “thick data”—data brought to light using digital-age ethnographic research methods that uncover emotions, stories, and meaning—to improve strategy, policy, products, and services. Organizations she has worked with include P&G, Nokia, GE, Kickstarter, the United Nations, and NASA. Tricia recently finished an expert-in-residency at IDEO, where she extended and amplified IDEO’s impact in design research. When not working with organizations, she spends the other half of... Read More.

Zuo Wang
Zuo Wang (Wanda), @harpe1999

Zuo Wang is a principal researcher at Wanda AI Technology Center. For the past few years, he has worked on large-scale distributed deep learning systems including PaddlePaddle, Mxnet, Tensorflow, and lead the effort to apply deep learning on clothes classification, clothing fashion ananlysis, and cross-domain clothing similarity matching. Zuo’s main interest is in deep learning, computer vision, and distributed systems. He used to work on MicroStrategy, a high performance enterprise analytics platform, and Apache Impala, an SQL query engine for data stored in Apache Hadoop.

Daniel Weeks
Daniel Weeks (Netflix)

Daniel Weeks manages the big data compute team at Netflix and is a Parquet committer. Previously, Daniel focused on research in big data solutions and distributed systems.

Laurent Weichberger
Laurent Weichberger (OmPoint Innovations, LLC)

Laurent Weichberger is in constant motion as the Big Data Bear and Sr. Technical Instructor for Datameer, Inc. Laurent has been teaching Java since 2000, and started his work in Big Data during 2012 when he worked for Hortonworks, and Cloudera. He was the Director of Training at DataStax, and later became Director of Practice at Couchbase. More recently he spent the better half of 2015 working for Databricks writing and teaching about Spark, and he now is focused full time on promoting the wondrous Datameer software worldwide.

Patrick Wendell
Patrick Wendell (Databricks)

Patrick Wendell is a cofounder of Databricks as well as a founding committer and PMC member of Apache Spark. Patrick has acted as release manager for several Spark releases in addition to maintaining several subsystems of Spark’s core engine. At Databricks, Patrick directs the company’s maintenance and development of Spark.

Patrick holds an MS in computer science from UC Berkeley, where his research focused on low-latency scheduling for large-scale analytics workloads, and a BSE in computer science from Princeton University.

Ben Werther
Ben Werther (Platfora), @bwerther

Ben Werther is the Founder and Executive Chairman of Platfora. Ben launched Platfora, and was the founding CEO for four years, with the goal of transforming how ‘citizen data scientists’ in every company make sense and drive action through direct and effortless use of big data. Before founding Platfora, Ben was vice president of products for DataStax, where he shaped the company’s enterprise and Hadoop strategy, and was also head of products at Greenplum through its acquisition by EMC. Ben has a B.S. in Computer Science from Monash University (Australia) and an M.S. in Computer Science from Stanford... Read More.

Alexander White
Alexander White (Next Big Sound), @mralexwhite

Alex White co-founded Next Big Sound in 2008, while in his last semester at Northwestern University. The analytics service measures daily music consumption and purchase decisions around the globe. From social to streaming to sales, Next Big Sound combines artist activity with context to help the modern music industry make decisions.

White and his co-founders have been featured in Fast Company (#1 most innovative company in the music industry, 2015), Forbes (30 under 30) in the music category three times, Billboard (10 best music companies), Bloomberg BusinessWeek (25 under 25), Entrepreneur Magazine’s 30 under 30 list in the New York... Read More.

Tom White
Tom White (Cloudera)

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He is the author of Hadoop: The Definitive Guide for O’Reilly. Previously he worked as an independent consultant specializing in Hadoop, and before that was co-founder and lead developer at Kizoom, a UK mobile applications startup. Tom has a bachelor’s degree in mathematics from the University of Cambridge, and a master’s degree in history and philosophy of science from the Universities of Leeds, UK, and Florence, Italy.

Thomas Wiecki
Thomas Wiecki (Quantopian), @twiecki

Thomas Wiecki is the lead data science researcher at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC—a probabilistic programming framework written in Python. A recognized international speaker, Thomas has given talks at various conferences and meetups across the US, Europe, and Asia. He holds a PhD from Brown University.

Edd Wilder-James
Data 101 Tutorial

Edd Wilder-James is a strategist at Google, where he is helping build a strong and vital open source community around TensorFlow. A technology analyst, writer, and entrepreneur based in California, Edd previously helped transform businesses with data as vice president of strategy for Silicon Valley Data Science. Formerly Edd Dumbill, Edd was the founding program chair for the O’Reilly Strata Data Conference and chaired the Open Source Software Conference for six years. He was also the founding editor of the peer-reviewed journal Big Data. A startup veteran, Edd was the founder and creator... Read More.

Cack Wilhelm
Cack Wilhelm (Scale Venture Partners)

Cack Wilhelm is a principal at Scale Venture Partners, where she focuses on investments in early-stage software companies, with an eye toward those helping businesses better utilize data, automate workflows, incorporate AI, and build more resilient software. Looking further ahead, Cack is watching closely as platforms such as virtual reality and augmented reality take shape. Cack cut her teeth selling 11g databases at Oracle and Hadoop clusters at Cloudera in the months before Hadoop reached Version 1.0. Cack has since transferred that operational and go-to-market experience into helping Scale portfolio companies such as Treasure Data, Realm, and CircleCI. Cack was... Read More.

Josh Wills
Josh Wills (Cloudera), @josh_wills

Josh Wills is director of data science at Cloudera, where he works with customers and engineers to develop Hadoop-based solutions across a wide range of industries. Prior to joining Cloudera, Josh was at Google where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. He earned his bachelor’s degree in mathematics from Duke University and his master’s in operations research from the University of Texas-Austin.

Matt Winkler (C+E)
Matt Winkler (C+E) (Microsoft)

Matt Winkler is a principal group program manager in the Data Group at Microsoft, where he leads a program management team building services and tools for developers to build intelligent apps using cognitive APIs, the Bot Framework, and the Cortana Intelligence Suite. Matt has worked at Microsoft for the last 10 years as an evangelist and a program manager working on the .NET Framework, Visual Studio, and Azure Web Sites. As part of the Microsoft big data team, Matt led a PM team building HDInsight, Microsoft’s managed Hadoop and Spark service and Azure data lake analytics. Matt holds a... Read More.

Doug Wolfe
Doug Wolfe (CIA)

Doug Wolfe was selected to serve as CIO of the CIA in 2013. In his role, he oversees the agency information technology vision and strategic direction, and is an advisor to the intelligence community CIO. Prior, he served as deputy director for acquisition, technology, and facilities at the Office of the Director of National Intelligence. Wolfe joined the CIA in 1984. He worked for 16 years as a part of the CIA component in the National Reconnaissance Office, and was involved in the launch and operations of multiple satellite systems. He partnered with the aerospace... Read More.

Jenn Wortman Vaughan
Jenn Wortman Vaughan (Microsoft Research), @jennwvaughan

Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City, where she studies algorithmic economics, machine learning, and social computing, with a recent focus on prediction markets and crowdsourcing. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn’s 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, and... Read More.

Ryan Wright (Kelley Blue Book)

Ryan Wright serves as manager of data management for Kelly Blue Book. In this role, he oversees the company’s team of engineers and analysts who load and review the quality of potential data sources for Kelly Blue Book’s trusted vehicle value information. In addition, Wright monitors the performance of on-going scheduled external data files into Kelly Blue Book’s enterprise databases.

As a member of Kelly Blue Book’s enterprise data warehouse team, Wright is charged with scaling and deploying business intelligence tool suites across Kelly Blue Book’s enterprise. Wright is directly involved with creating business intelligence solutions for Kelly Blue Book,... Read More.

Yihui Xie
Yihui Xie (RStudio, Inc.), @xieyihui
R Day Tutorial

Yihui Xie is an active R user and the author of several R packages, such as animation, formatR, Rd2roxygen, and knitr, among which the animation package won the 2009 John M. Chambers Statistical Software Award (ASA). He is also the author of the book Dynamic Documents with R and knitr. In 2006 he founded the “Capital of Statistics” (, which has grown into a large online community on statistics in China. He initiated the first Chinese R conference in 2008 and has been organizing R conferences in China since then. During his PhD training at the Iowa State University,... Read More.

Reynold Xin
Reynold Xin (Databricks)

Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.

Matt Yanchyshyn
Matt Yanchyshyn (Amazon Web Services)

Matt Yanchyshyn leads the AWS Technology Partner Solutions Architecture team at Amazon Web Services. He helps AWS partners architect secure and high-performance applications for the cloud. Matt has worked in the digital media and cloud computing industry for over a decade and has helped hundreds of customers bring compelling AWS-backed products to the market.

Fangjin Yang
Fangjin Yang (Imply)

Fangjin Yang is a coauthor of the open source Druid project and a cofounder of Imply, a data analytics startup based in San Francisco. Previously, Fangjin held senior engineering positions at Metamarkets and Cisco Systems. Fangjin has a BASc in electrical engineering and an MASc in computer engineering from the University of Waterloo, Canada.

Chuck Yarbrough
Chuck Yarbrough (Pentaho)

Chuck Yarbrough is the senior director of solutions marketing and management at Pentaho, a leading big data analytics company that helps organizations engineer big data connections, blend data, and report and visualize all of their data. Chuck is responsible for creating and driving Pentaho solutions that leverage the Pentaho platform, enabling customers to implement big data solutions quicker and achieve greater ROI faster. Chuck has more than 20 years of experience helping organizations use technology to their advantage to ensure they can run, manage, and transform their business through better use of data. A lifelong participant in the data... Read More.

Reza Zadeh
Reza Zadeh (Matroid | Stanford), @Reza_Zadeh

Reza Bosagh Zadeh is founder and CEO at Matroid and an adjunct professor at Stanford University, where he teaches two PhD-level classes: Distributed Algorithms and Optimization and Discrete Mathematics and Algorithms. His work focuses on machine learning, distributed computing, and discrete applied mathematics. His awards include a KDD best paper award and the Gene Golub Outstanding Thesis Award. Reza has served on the technical advisory boards of Microsoft and Databricks. He is the initial creator of the linear algebra package in Apache Spark. Through Apache Spark, Reza’s work has been incorporated into industrial and academic cluster computing environments.... Read More.

Benjamin Zaitlen
Benjamin Zaitlen (Anaconda)
PyData at Strata Tutorial

Ben Zaitlen is the technical lead of the Anaconda Cluster product at Continuum Analytics. Ben received undergraduate degrees in mathematics and physics from UC Santa Cruz, and a Master’s degree in physics from Indiana University. Previous to Continuum, he worked at the Biocomplexity Institute developing and supporting a multi-scale modeling environment for developmental biology. Ben is also passionate about electronics and has developed a number of embedded and wearable hardware projects.

Philip Zeyliger
Philip Zeyliger (Cloudera)

Philip Zeyliger is a software engineer at Cloudera. He came to Cloudera from Google, where he worked on scalable storage for user-facing applications. Before that, he worked in finance at D.E. Shaw. Philip holds a bachelor’s degree in mathematics from Harvard University. His interests include systems and databases. He’s a committer on the Apache Avro project.

Owen Zhang
Owen Zhang (DataRobot)
PyData at Strata Tutorial

Owen Zhang is the chief product officer at DataRobot. Owen spent most of his career in the property and casualty insurance industry. Most recently Owen served as vice president of modeling of the newly formed AIG Science team.

After spending several years in IT building transactional systems for commercial insurance, Owen discovered his passion for machine learning and started building insurance underwriting, pricing, and claims models. Owen has a master’s degree in electrical engineering from the University of Toronto and a bachelor’s degree from the University of Science and Technology of China. Owen is currently ranked #1 on the... Read More.

Yan Zhang
Yan Zhang (Microsoft)

Yan Zhang is a senior data scientist with the algorithm and data science team of the Data Group within Cloud and Enterprise at Microsoft. She builds predictive analytics models and generalizes machine learning solutions on the cloud machine learning platform. Her recent research includes cost prediction and fraud claim detection in the healthcare domain, predictive maintenance in IoT applications, customer segmentation, and text mining. Previously, she was a research faculty member at Syracuse University. Yan earned her PhD in data mining from the Computer Science Department at the University of Vermont. She’s the author of 23 publications, including journal articles,... Read More.

Zhe Zhang
Zhe Zhang (LinkedIn), @oldcap

Zhe Zhang is an engineering manager at LinkedIn, where he leads an excellent engineering team to provide big data services (Hadoop distributed file system (HDFS), YARN, Spark, TensorFlow, and beyond) to power LinkedIn’s business intelligence and relevance applications. Zhe’s an Apache Hadoop PMC member; he led the design and development of HDFS Erasure Coding (HDFS-EC).

Alice Zheng

Alice Zheng leads the machine learning optimization team on Amazon’s advertising platform. She specializes in research and development of machine learning methods, tools, and applications. Outside of work, she is writing a book, Mastering Feature Engineering. Previously, Alice worked at GraphLab/Dato/Turi, where she led the machine learning toolkits team and spearheaded user outreach. Prior to joining GraphLab, she was a researcher in the Machine Learning group at Microsoft Research, Redmond. Alice holds PhD and BA degrees in computer science and a BA in mathematics, all from UC Berkeley.

Siwei Zhu
Siwei Zhu (Scribd)

Siwei Zhu is a data scientist at Scribd focused on understanding how users engage with the product. Previously, he has worked as a data scientist at Facebook.

Shivon Zilis
Shivon Zilis (Bloomberg Beta), @shivon

Shivon Zilis is a venture capitalist and founding member of Bloomberg Beta, where she focuses on early-stage data and machine-intelligence investments. Shivon has led 12 investments since launch. One, Newsle, was acquired by LinkedIn; others include Context Relevant, Alation, and InfluxDB. She recently released a report on the current state of machine intelligence that analyzed thousands of companies and put forward predictions on where the industry is headed. Shivon’s previous experience includes building startups at Bloomberg Ventures, the firm’s incubator, and developing cloud core banking solutions for microfinance institutions at IBM. She is a C100 charter member and was... Read More.

Noah Zucker
Noah Zucker (Novus Partners), @noahlz

Vice President of Engineering at Novus Partners

Monte Zweben
Monte Zweben (Splice Machine Inc.), @mzweben

Monte Zweben is the CEO and co-founder of Splice Machine, provider of the Hadoop RDBMS. A SQL-on-Hadoop solution, Splice Machine has helped many companies scale real-time applications using commodity hardware without application rewrites. A technology industry veteran, Monte’s early career was spent with the NASA Ames Research Center as the deputy chief of the artificial intelligence branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. Monte then founded and was the chairman and CEO of Red Pepper Software, a leading supply chain optimization company. In 1996 it... Read More.

Margit Zwemer
Margit Zwemer (LiquidLandscape), @MPZwemer

Margit Zwemer is the founder of data visualization company LiquidLandscape. She was formerly a data scientist at Kaggle, and algorithmic trader at Societe Generale.

Dave Zwieback
Dave Zwieback (Next Big Sound), @mindweather

Dave Zwieback has been working with large-scale mission-critical infrastructure and teams for almost two decades. Dave is the VP of engineering at Next Big Sound (acquired by Pandora Media, Inc.) and CTO of Lotus Outreach. He has previously worked with the adaptive learning startup Knewton, the quantitative investment management firm D.E. Shaw & Co., and the financial services behemoth Morgan Stanley. He also ran an infrastructure architecture consultancy for seven years. Dave is the author of Beyond Blame: Learning from Failure and Success from O’Reilly Media. He blogs at