Data science is a hot topic, but much of it is simply business intelligence in a new mantle. In this track, we push the envelope of data science, exploring emerging topics and new areas of study, made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. We’ll cover topics such as data management, machine learning, natural language processing, crowdsourcing, and algorithm design.
Who should attend: Data scientists, data engineers, statisticians, data modellers, and analysts with a strong understanding of data science fundamentals will find themselves at home in this tutorial, as will CTOs, chief scientists, and academic researchers.
Ben Lorica is the chief data scientist at O’Reilly Media. Ben has applied business intelligence, data mining, machine learning, and statistical analysis in a variety of settings, including direct marketing, consumer and market research, targeted advertising, text mining, and financial engineering. His background includes stints with an investment management company, internet startups, and financial services.
Reza Bosagh Zadeh is founder and CEO at Matroid and an adjunct professor at Stanford University, where he teaches two PhD-level classes: Distributed Algorithms and Optimization and Discrete Mathematics and Algorithms. His work focuses on machine learning, distributed computing, and discrete applied mathematics. His awards include a KDD best paper award and the Gene Golub Outstanding Thesis Award. Reza has served on the technical advisory boards of Microsoft and Databricks. He is the initial creator of the linear algebra package in Apache Spark. Through Apache Spark, Reza’s work has been incorporated into industrial and academic cluster computing environments. Reza holds a PhD in computational mathematics from Stanford, where he worked under the supervision of Gunnar Carlsson. As part of his research, Reza built the machine learning algorithms behind Twitter’s who-to-follow system, the first product to use machine learning at Twitter.
David Blei is a professor of statistics and computer science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.
David earned his bachelor’s degree in computer science and mathematics from Brown University (1997) and his PhD in computer science from the University of California, Berkeley (2004). Before arriving at Columbia, he was an associate professor of computer science at Princeton University. He has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013).
Anima Anandkumar is a principal scientist at Amazon Web Services. Anima is currently on leave from UC Irvine, where she is an associate professor. Her research interests are in the areas of large-scale machine learning, nonconvex optimization, and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. Previously, she was a postdoctoral researcher at MIT and a visiting researcher at Microsoft Research New England. Anima is the recipient of several awards, including the Alfred. P. Sloan fellowship, the Microsoft faculty fellowship, the Google research award, the ARO and AFOSR Young Investigator awards, the NSF CAREER Award, the Early Career Excellence in Research Award at UCI, the Best Thesis Award from the ACM SIGMETRICS society, the IBM Fran Allen PhD fellowship, and several best paper awards. She has been featured in a number of forums, such as the Quora ML session, Huffington Post, Forbes, and O’Reilly Media. Anima holds a BTech in electrical engineering from IIT Madras and a PhD from Cornell University.
Hussein Mehanna is an engineering manager at Facebook, where he founded and manages the Applied Machine Learning platform team. Hussein started as the original developer on the team, which quickly developed from an ads-focused ML platform to a Facebook-wide platform. Prior to Facebook, Hussein worked as a software engineer for Bing, Microsoft. He is a holder of a masters degree in speech recognition from the University of Cambridge, UK.
Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Before joining Microsoft in 1997, Chayes was for many years professor of mathematics at UCLA. Chayes is the author of over 125 academic papers and holds over 30 patents. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, graph algorithms and algorithmic game theory.
Chayes received her B.A. in biology and physics at Wesleyan University, where she graduated first in her class, and her Ph.D. in mathematical physics at Princeton. She did her postdoctoral work in the mathematics and physics departments at Harvard and Cornell. She is the recipient of a National Science Foundation Postdoctoral Fellowship, a Sloan Fellowship, and the UCLA Distinguished Teaching Award. Chayes has been the recipient of many leadership awards including the Leadership Award of Women Entrepreneurs in Science and Technology, the Women Who Lead Award, and the Women of Leadership Vision Award of the Anita Borg Institute. She has twice been a member of the Institute for Advanced Study in Princeton. Chayes is a Fellow of the American Association for the Advancement of Science, the Fields Institute, the Association for Computing Machinery, and the American Mathematical Society, and an Elected Member of the American Academy of Arts and Sciences.
Ben Recht is an associate professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California, Berkeley. Ben’s research focuses on scalable computational tools for large-scale data analysis, statistical signal processing, and machine learning. He explores the intersections of convex optimization, mathematical statistics, and randomized algorithms. He is particularly interested in simplifying the analysis and manipulation of noisy and incomplete data by exploiting domain-specific knowledge and prior information about structure. Ben is the recipient of an NSF Career Award, an Alfred P. Sloan Research Fellowship, and the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization. He is currently on the Editorial Boards of Mathematical Programming and the Journal for Machine Learning Research.
Tanzeem Choudhury received her Ph.D. from the Media Laboratory at the Massachusetts Institute of Technology. As part of her doctoral work, she created the sociometer and conducted the first experiment that uses mobile sensors to model social networks, which led to a new field of research referred to as Reality Mining. She holds a B.S. in electrical engineering from the University of Rochester, and an M.S. from the MIT Media Laboratory.
Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City, where she studies algorithmic economics, machine learning, and social computing, with a recent focus on prediction markets and crowdsourcing. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn’s 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, and a Presidential Early Career Award for Scientists and Engineers. In her “spare” time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which will be held for the tenth straight year in 2015.
Adam Marcus is a cofounder and CTO of B12, a company building a better future of creative and analytical work, starting with design. With Orchestra, its open source project management system for experts and machines, B12 automatically generates websites for clients (algorithmic design) and then recruits wonderful designers and art directors to fill in the details from the algorithmically generated starting points. (This summer, B12 announced the close of a $12.4M Series A funding round.) Previously, Adam was director of data at Locu, a startup that was acquired by GoDaddy. He has written widely on crowdsourcing and data management and processing, including coauthoring a book, Crowdsourced Data Management: Industry and Academic Perspectives. He is a recipient of the NSF and NDSEG fellowships and has worked at ITA, Google, IBM, and FactSet. Adam holds a PhD in computer science from MIT, where he researched database systems and human computation. In his free time, he builds course content to get people excited about data and programming.
Stefanie Jegelka is the X-Consortium career development assistant professor at the Department of Electrical Engineering and Computer Science at MIT, and a member of CSAIL and the Institute for Data, Systems and Society. Before joining MIT in Spring 2015, she was a postdoctoral scholar in the AMPLab at UC Berkeley, working with Michael Jordan and Trevor Darrell. She earned her PhD from ETH Zurich in collaboration with the Max Planck Institutes in Tuebingen, Germany, and a Diplom from the University of Tuebingen. She has been a fellow of the German National Academic Foundation, and has received an Anita Borg and several other fellowships, as well as a Best Paper Award at ICML. Her research interests lie in algorithmic machine learning, in particular scalable analytics with combinatorial structure, with applications in various fields including computer vision, biology, and the development of new materials. She has given four tutorials on Submodularity in Machine Learning at international conferences, and has organized several workshops.
Misha Bilenko is the principal researcher leading the Machine Learning Algorithms team in the Cloud+Enterprise division of Microsoft. Before that, he worked for seven years in the Machine Learning Group at Microsoft Research, where he collaborated with a number of product groups on applied ML algorithms, systems, and tools. Misha joined Microsoft in 2006 after receiving his Ph.D. in computer science from the University of Texas at Austin. He co-edited Scaling Up Machine Learning, published by Cambridge University Press, and his work has received best paper awards from KDD and SIGIR. His research interests include parallel and distributed learning algorithms, accuracy debugging methods, and learnable similarity functions.
Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.