New speakers are added continuously. Please check back to see the latest updates to the program.
Mike Abbott is a general partner at Kleiner Perkins Caufield & Byers, where he focuses on investments in the firm’s digital practice, helping entrepreneurs in the social, mobile, and cloud computing sectors rapidly scale teams and ventures. Mike serves as an expert resource on enterprise infrastructure, cloud computing, and big data. He also helps entrepreneurs win the race for talent in a hypercompetitive recruitment environment. Mike is an engineering leader, entrepreneur, and investor and an expert in big data businesses. Formerly the vice president of engineering at Twitter, Mike led the team to rebuild and solidify Twitter’s infrastructure, growing the... Read More.
Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, Verisign, and LinkedIn. Currently, he is director of product management and data science at Confluent. He is the holder of several patents for computer security and cryptography and the author of Baseball Hacks and R in a Nutshell. He graduated from MIT with a BSc and MEng in computer science and electrical engineering.
Sarah Aerni has a background in the field of bioinformatics, developing tools to help biomedical researchers understand their data. She holds a B.S. in biology with a specialization in bioinformatics and minor in French literature from the University of California-San Diego, and an M.S. and Ph.D in biomedical informatics from Stanford University. During her time as a researcher she focused her efforts on building computational models enabling research for a broad range of fields in biomedicine. She also co-founded a startup providing informatics services to researchers and small companies. At Pivotal she works with customers in life science and healthcare,... Read More.
Nidhi Aggarwal leads strategy and marketing at Tamr. Prior to joining Tamr, Nidhi founded Cloud vLab, makers of qwikLAB, a software-learning platform used to create and deploy on-demand lab environments. In the years before Cloud vLab, Nidhi worked at McKinsey & Company, advising Fortune 150 companies on big data strategy. Nidhi holds a PhD in computer science from the University of Wisconsin-Madison.
Jaipaul Agonus is a director at FINRA’s Market Regulation Technology. He leads a multi-year effort to migrate FINRA’s Market Regulation batch analytics portfolio to a big data platform using Hadoop ecosystem products in the cloud.
With over 15 years in advanced analytical applications and architecture, John Akred is dedicated to helping organizations become more data driven. As CTO of Silicon Valley Data Science, John combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.
Sridhar Alla is the director of big data solutions and architecture at Comcast, where he has delivered several key solutions, such as the Xfinity personalization platform, clickthru analytics, and the correlation platform. Sridhar started his career in network appliances on NAS and caching technologies. Previously, he served as the CTO of security company eIQNetworks, where he merged the concepts of big data and security products. He holds patents on the topics of very large-scale processing algorithms and caching.
David Alves is a software engineer at Cloudera and a PhD student at UT Austin. He is a committer at the Apache Foundation and in the past has contributed to several open source projects, such as Apache Cassandra and Apache Drill.
Anima Anandkumar is a principal scientist at Amazon Web Services. Anima is currently on leave from UC Irvine, where she is an associate professor. Her research interests are in the areas of large-scale machine learning, nonconvex optimization, and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. Previously, she was a postdoctoral researcher at MIT and a visiting researcher at Microsoft Research New England. Anima is the recipient of several awards, including the Alfred. P. Sloan fellowship, the Microsoft faculty fellowship, the Google research award, the ARO and AFOSR Young Investigator... Read More.
Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He is widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget,... Read More.
Amar Arsikere has a large scale data infrastructure background with 18 years of experience in building software products at several companies including Google and Zynga. He is currently a co-founder and CEO at Infoworks.io. Amar founded the Systems Engineering Group at Zynga and led the design and deployment of one of the largest in-memory databases there. At Google he pioneered the development of a data warehousing platform on BigTable. This platform successfully replaced the Informatica/Oracle/Microstrategy/QlikView technology stack. Amar is a recipient of the InfoVision award from IEC and the Jars Top 25 award. He holds several patents in... Read More.
Astrid Atkinson currently works in search infrastructure as director of software engineering at Google. Astrid has built infrastructure and managed a variety of engineering teams during her 10+ years at Google and spent 5+ years on call for Google.com. She led the team responsible for running and building Google’s web-serving layer and managed site reliability for Google’s social products. As part of the Cloud Platform team, Astrid led the development of the next generation of app- and service-level infrastructure, including the next-generation app engine.
Fredrik Backner is Vixe President of Data & Analytics at TeliaSonera, a leading Nordic operator with headquarters in Stockholm, Sweden. In his role Fredrik has globalresponsibility for enabling business value from Data & Analytics across six countries and is tasked with ensuring that big data capabilities are provided to the countries as internal cloud services, ranging from data lakes, advanced analytics and data visualization. Fredrik’s organization also provides data science, analytics and data visualization services and business consultancy to the countries and business units.
Fredrik has a solid entrepreneurial background from initiating and driving large change and improvement programs within... Read More.
Geophysicist with computer science habits; founder of PyLadies-HTX. Paige is passionate about public transportation, sustainable energy, scientific computation, STEM education reform, adventures — and how Python integrates with all of the above. She is currently an Earth Sciences graduate student at Rice University, and is employed full-time by Chevron in upstream technical computing. http://www.paige-bailey.com
Vishal Bamba is vice president of strategy and architecture at Transamerica Technology, where he leads a team focusing on innovation initiatives within the enterprise. Vishal has over 15 years of experience in distributed systems and has led many innovation projects. He has consulted and worked for several companies including Disney, Getty, Northrop, and AIG/SunAmerica. Vishal holds an MS in computer science from the University of Southern California.
Lauralea Banks Edwards is a systems-oriented data analyst who works at the intersection of business and technology. Her research investigates and challenges the ways data creation, storage, and analytics reinforce oversimplified ideas of our social reality. While Ms. Edwards currently wrangles project teams and data within higher education, her previous experience includes co-founding a non-profit, lobbying for the restaurant industry, and building data models for the United States Military Academy at West Point. She holds a BS in behavioral science, a Masters of international affairs from Columbia University, and is currently pursuing a Ph.D. in cultural studies and social thought... Read More.
Cécile Barbaroux is head of data and insight at Schibsted Classified Media, where she leads a central team of data scientists and engineers with a clear mission to facilitate and inspire data-driven product development. Since joining the company in 2012, she has focused on democratizing access to data and evolving the group data strategy. Before working at Schibsted, Cécile worked as a marketing analyst at AirFrance and Shell, where she initiated a passion for data and business intelligence.
To be updated
Nenshad Bardoliwalla is the founding vice president of products at Paxata, where he is responsible for product strategy, product management, and product marketing. Nenshad is an executive and thought leader with a proven track record of success leading product strategy, product management, and development in business analytics. Previously, he cofounded Tidemark Systems, Inc., where he drove the market, product, and technology efforts for its next-generation analytic applications built for the cloud through its series C funding; served as vice president for product management, product development, and technology at SAP, where he helped to craft the business analytics vision, strategy,... Read More.
Marie Beaugureau is the lead data editor for O’Reilly Media.
Alex Behm is a software engineer at Cloudera, working on the Impala team. He holds a PhD in computer science from UC Irvine.
Roy Ben-Alta is big data analytics business development manager at Amazon Web Services. He is working with AWS customers in building data-driven products, whether batch or real-time, and creating analytics solutions in the cloud. Roy has worked in the data and analytics industry for over a decade, and has helped hundreds of customers bring compelling data-driven products to the market.
Tim Berglund is a teacher, author, and technology leader with DataStax. He has spoken at numerous conferences internationally and in the United States and contributes to the Denver tech community as president of the Denver Open Source User Group. He is the copresenter of various O’Reilly training videos on topics ranging from Git to Mac OS X productivity tips to Apache Cassandra and is the author of Gradle Beyond the Basics. Tim blogs very occasionally at Timberglund.com. He lives in Littleton, Colorado, with the wife of his youth and their three children.
Albert Bifet is a big data scientist with 10+ years of international experience in research and in leading new open source software projects for business analytics, data mining, and machine learning (Huawei, Yahoo, University of Waikato, UPC). He obtained a Ph.D. from UPC-BarcelonaTech. Albert has worked in Hong Kong, New Zealand, and Europe. At Yahoo Labs, he co-founded Apache SAMOA (Scalable Advanced Massive Online Analysis) in 2013. Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning group, he has... Read More.
Misha Bilenko is the principal researcher leading the Machine Learning Algorithms team in the Cloud+Enterprise division of Microsoft. Before that, he worked for seven years in the Machine Learning Group at Microsoft Research, where he collaborated with a number of product groups on applied ML algorithms, systems, and tools. Misha joined Microsoft in 2006 after receiving his Ph.D. in computer science from the University of Texas at Austin. He co-edited Scaling Up Machine Learning, published by Cambridge University Press, and his work has received best paper awards from KDD and SIGIR. His research interests include parallel and distributed... Read More.
Bill Schmarzo, author of the upcoming Big Data: Understanding How Data Powers Big Business, to be published by Wiley, is responsible for setting the strategy and defining the service line offerings and capabilities for the EMC Consulting Enterprise Information Management and Analytics service line. He’s written several white papers and is a frequent speaker on the use of big data and advanced analytics to power an organization’s key business initiatives.
Bill has more than two decades of experience in data warehousing, BI, and analytics applications. Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives... Read More.
After a brief spell designing ejection seats for fighter jets, Sarah Bird’s career turned to applying technology to international development. She has worked in many sectors including mobile health and data collection in Pakistan, Peru, Haiti, and elsewhere. Having always dabbled in software in her spare time, in 2012 Sarah gave in and became a full-time software developer. She is now a full-stack web developer at Aptivate, a non-profit that builds IT solutions for the international development sector.
David Blei is a professor of statistics and computer science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.
David earned his bachelor’s degree in computer science and mathematics from Brown University (1997) and his PhD in computer science from the University of California, Berkeley (2004). Before arriving at Columbia, he was an associate professor of computer... Read More.
Ryan Blue is a software engineer at Cloudera, currently working on the Kite SDK team.
David Boardman is a senior interaction design lead at IDEO New York, where he guides teams designing interactions across multiple touch points that elevate people’s experiences and innovate businesses. David has contributed in bringing to life several complex digital ecosystems for a broad set of industries including healthcare, the public sector, finance, and media, for clients such as the US Department of State, WebMD, UBS, Sky Television, Telstra, Hewlett-Packard, Cisco Systems, the Clinton Global Initiative, Nokia, and Telefónica. David has also worked as a design consultant at frog, a global design innovation firm, and has been involved as... Read More.
Ron Bodkin is the founder and CEO of Think Big Analytics, the first and leading provider of independent consulting and integration services specifically focused on big data solutions. Previously, Ron was vice president of engineering at Quantcast, where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision making. Prior to that, Ron founded New Aspects, which provided enterprise consulting for aspect-oriented programming. Ron was also cofounder and CTO of B2B applications provider C-Bridge, where he headed a team of 900 people and led the company to... Read More.
Farrah Bostic created the Difference Engine based on her belief that deep understanding of customer needs is essential to growing businesses through great products and services. Farrah has honed her customer-centric insights as an advisor to some of the world’s most respected brands, including Apple, Microsoft, Disney, Samsung, and UPS. She began her career as a creative and then went on to be a strategist at leading agencies, including Wieden + Kennedy, TBWA\Chiat\Day, Mad Dogs & Englishmen, and Digitas, where she was group planning director and mobile strategy lead. Farrah also ran innovation as a partner at Hall... Read More.
Danah Boyd is a researcher at Microsoft Research New England and a Fellow at the Harvard University Berkman Center for Internet and Society. She recently completed her PhD in the School of Information at the University of California-Berkeley. Dr. Boyd’s dissertation “Taken Out of Context: American Teen Sociality in Networked Publics” focused on how American youth use networked publics for sociable purposes. She examined the role that social network sites like MySpace and Facebook play in everyday teen interactions and social relations. She was interested in how mediated environments alter the structural conditions in which teens operate, forcing them to... Read More.
David Boyle leads the work of the Insight team at BBC Worldwide, the commercial and global wing of the BBC, where he helps to transform the relationship that BBC Worldwide has with its audience by building premium, industry-leading insight capabilities into consumers, BBC brands, and the market, as well as what connects with audiences emotionally and inspires them. David has spent the last seven years constructing global insight capabilities for the publishing and music industries, which were widely acknowledged as having helped them make quicker, smarter, and bolder decisions for their brands. He joined BBC... Read More.
Mary Yoko Brannen is the Jarislowsky East Asia (Japan) chair at the Centre for Asia Pacific Initiatives, professor of international business and research director at the University of Victoria Gustavson School of Business, and holds a visiting professorship of strategy and management at INSEAD in Fontainebleau, France. She is also deputy editor of the Journal of International Business Studies—the highest ranked journal in the field of IB. She received her M.B.A. with emphasis in international business and Ph.D. in organizational behavior with a minor in cultural anthropology from the University of Massachusetts at Amherst, and a B.A. in comparative... Read More.
Richard Brath has been designing and building innovative information visualizations for 20 years, ranging from one of the first interactive 3D financial visualizations on the web in 1995, to visualizations embedded in financial data systems used every day by thousands of market professionals. Richard is pursuing a PhD in new data visualization techniques at LSBU.
Jenelle Bray is a staff data scientist at LinkedIn on the Security team, where she builds models to detect and prevent fraudulent and abusive behavior, including scraping and fake accounts. Jenelle has a PhD in computational chemistry from Caltech, where she developed methods to predict membrane protein structures. She then moved to Stanford University as a postdoctoral fellow, where she designed algorithms to study large-scale protein motion and to predict small molecule binding in proteins.
Eric Brewer is a vice president of infrastructure at Google. He pioneered the use of clusters of commodity servers for internet services, based on his research at Berkeley. His “CAP Theorem” covers basics tradeoffs required in the design of distributed systems, and followed from his work on a wide variety of systems from live services, to caching and distribution services, to sensor networks. He is a member of the National Academy of Engineering, and winner of the ACM Infosys Foundation award for his work on large-scale services. Eric was named a “Global Leader for Tomorrow” by the World Economic... Read More.
Peter Brodsky is a middle school dropout, a college graduate, and a PhD dropout. Peter built and sold his first company and is now building second company.
Kurt Brown leads the Data Platform team at Netflix. Kurt’s group architects and manages the technical infrastructure underpinning the company’s analytics, which includes various big data technologies like Hadoop, Spark, and Presto, Netflix open sourced applications and services such as Genie and Lipstick, and traditional BI tools including Tableau and Redshift.
Andrew Brust is senior director, technical product marketing and evangelism at Datameer, and writes a blog for ZDNet called Big on Data. Andrew is co-author of Programming Microsoft SQL Server 2012 (Microsoft Press); an advisor to NYTECH, the New York Technology Council; and writes the Redmond Review column for VisualStudioMagazine.com.
Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.
Dan Burkert is a software engineer at Cloudera. Previously he worked at WibiData and Near Infinity.
Călin-Andrei Burloiu has worked at Avira since 2013 as a big data engineer. His interest in this area started in 2012 during an internship at the National University of Singapore, where he first made contact with the Hadoop ecosystem and big data while working on a source code search engine. Călin-Andrei has a master in computer science. He has a passion for distributed systems and recently became interested in data science.
Joe Caserta is president of Caserta Concepts, an award-winning New York-based innovation consulting and technology implementation firm specializing in big data analytics, data warehousing, business intelligence solutions, and helping clients maximize data value. A recognized big data strategy consultant, author, and educator, Joe is coauthor of the best-selling book The Data Warehouse ETL Toolkit (Wiley, 2004), a contributor to industry publications, and frequent keynote speaker and expert panelist at industry conferences and events. He also serves on the advisory boards of financial and technical institutions and is the organizer and host of the Big Data Warehousing Meetup group in... Read More.
Maciej Ceglowski is the founder and sole employee of Pinboard, a personal web archive and bookmarking site with an emphasis on user privacy. He’s been an outspoken advocate of small pay-for-service websites as an alternative to the hype and impermanance of Silicon Valley startup culture. He has also spoken extensively about the dangers of universal surveillance as a business model and the need to decentralize the Internet. Before founding Pinboard in 2009, Ceglowski worked as an engineer at a variety of tech companies, most notably Yahoo. He lives and works in San Francisco.
Jagdish Chand is VP technology for big data predictive analytics at Apigee. Previously, he served as director of engineering at Yahoo! before co-founding predictive analytics company InsightsOne, where he served as VP engineering until it was acquired by Apigee in 2013. At Apigee, he successfully integrated the InsightsOne engineering team and renamed the product as Apigee Insights. Jagdish continues to drive Apigee Insights adoption with customers and leads advanced product development for big data predictive analytics.
Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Before joining Microsoft in 1997, Chayes was for many years professor of mathematics at UCLA. Chayes is the author of over 125 academic papers and holds over 30 patents. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, graph algorithms and algorithmic game theory.
Chayes received her B.A. in biology and physics at Wesleyan University, where... Read More.
Jerry Chen is a partner at Greylock where he invests in new enterprise applications and in all aspects of cloud and application infrastructure. Prior to joining Greylock, Jerry was vice president of cloud and application services at VMware, where he was part of the executive team that scaled the company from 400 to over 15,000 employees and $5B in revenue. During his nine years at VMware, he launched dozens of products including several “1.0” releases, and started two new business units for VMware including the Cloud Application Platform and the Enterprise Desktop business units. In particular, Jerry enjoys the challenge... Read More.
Roger Chen is working on a new venture and cochairs the O’Reilly Artificial Intelligence Conference. Previously, he was a principal at O’Reilly AlphaTech Ventures (OATV), where he invested in and worked with early stage startups primarily in the realm of data, machine learning and robotics. Roger has a deep and hands-on history with technology, having spent a past life as an engineer and scientist prior to working in venture capital. He developed novel nanotechnology as a PhD researcher at UC Berkeley and spent stints as an engineer with Oracle, EMC and Vicor. Roger holds a BS from Boston... Read More.
Ewen Cheslack-Postava is an engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. Ewen received his PhD from Stanford University, where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity and described a system for efficiently processing these queries at scale.
Anant Chintamaneni is VP of products at BlueData. Anant has more than 15 years experience in business intelligence, advanced analytics, and big data infrastructure. He is currently responsible for product management at BlueData, where he focuses on helping enterprises deploy big data technologies including Hadoop and Spark. Prior to BlueData, Anant led the product management team for Pivotal’s big data suite.
Alan Choi is a software engineer at Cloudera working on the Impala project. Before joining Cloudera, he worked at Greenplum on the Greenplum-Hadoop integration. Prior to that, Alan worked extensively on PL/SQL and SQL at Oracle.
Tanzeem Choudhury received her Ph.D. from the Media Laboratory at the Massachusetts Institute of Technology. As part of her doctoral work, she created the sociometer and conducted the first experiment that uses mobile sensors to model social networks, which led to a new field of research referred to as Reality Mining. She holds a B.S. in electrical engineering from the University of Rochester, and an M.S. from the MIT Media Laboratory.
Miklos Christine is a solutions engineer for Databricks. Miklos was previously a system engineer at Cloudera where he helped strategic customers deploy and use the Apache Hadoop ecosystem in production. He has contributed to several projects in the open source community, previously worked on the design and implementation of the system infrastructure for the OS that runs on Cisco’s routers and switches, and holds a BS in electrical engineering and computer sciences from the University of California-Berkeley.
Phillip Cloud is a software engineer at Continuum Analytics. He started doing open source work by contributing heavily to Pandas. Now he works mostly on Blaze and its associated libraries, along with a bit of consulting. He enjoys building data-related tools that help people get their jobs done.
Chris is a software architect for Continuum Analytics, and is based in the
New York City area. He has worked previously for top Wall St. firms and was
the lead designer of the UI framework for a front office trading platform.
He is the creator of the PhosphorJS and Nucleic projects which provide
libraries for developing enterprise quality applications on the desktop and in
the browser. He received his MS in Mechanical Engineering from the University
of South Florida.
Raymond Collins has led and implemented data integration projects and analytics projects for companies like Sony, Veterans Affairs, Bausch & Lomb, TE Connectivity, and Rolls Royce.
Ben Collins-Sussman is the engineering site lead for Google’s Chicago office. A founding developer of the Subversion version control system, he co-authored O’Reilly’s Version Control with Subversion book as well as Team Geek. Since joining Google in 2005, he has led engineering teams for Google Code, Google Affiliate Network, the DFP advertising platform, and now manages teams working on the serving stack for Google Search.
Ben collects hobbies that explore the tension between art and science. He has given numerous conference talks about the social challenges of software development. He writes interactive fiction games and tools, and was the... Read More.
Jacomo Corbo is the chief scientist for QuantumBlack, a visual analytics firm that helps clients meet the analysis challenges of big data to make better decisions. Corbo is also the Canada Research Chair in Information and Performance Management at the University of Ottawa, and a Wharton Clayright Scholar at the University of Pennsylvania’s Wharton School of Business. His research has been funded by grants from the National Research Council, the Alfred P. Sloan Foundation, the Wharton Mack Center for Technological Innovation, the Wharton Customer Analytics Initiative, as well as by companies such as GE Finance and IBM.
Between January... Read More.
Elliott is a big data, data warehouse, information management and technology innovation expert with a passion for helping transform data into powerful information. He has more than a decade of experience in implementing tailored big data and data warehouse solutions with hands-on experience in every component of the data warehouse software development lifecycle. At Caserta Concepts, Elliott oversees large-scale major technology projects, including those involving cloud, business intelligence, data analytics, big data and data warehousing.
Elliott is recognized for his many successful Big Data projects ranging from Big Data Warehousing, Machine Learning, with his personal favorite, Recommendation Engines. His... Read More.
Samuel Cozannet is a technology enthusiast, solution-oriented get-things-done professional, with a track record in product and program management. He has a passion for innovation and believes technology can make the world a better place. He spends most of his time and energy driving the adoption of IoT and big data technologies by companies and enterprises of all sizes and industries.
Charlie Crocker is a data geek with 20 years of experience bringing data out of the shadows to drive business value and optimize operational costs. At Autodesk, he is currently working across divisions to identify and validate potential reliable data sources and access mechanisms, while also focusing on delivering real-time analytics to stakeholders. Prior to Autodesk, Charlie was a partner in a startup focused on spatial databases and web-based tools for state and local government agencies and utility companies.
Alistair Croll is an entrepreneur with a background in web performance, analytics, cloud computing, and business strategy. In 2001, he cofounded Coradiant (acquired by BMC in 2011) and has since helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and several other early-stage companies. He works with startups on business acceleration and advises a number of larger companies on innovation and technology. A sought-after public speaker on data-driven innovation and the impact of technology on society, Alistair has founded and run a variety of conferences, including Cloud Connect, Bitnorth, and the International Startup Festival, and is the chair of O’Reilly’s Strata +... Read More.
Michael Crutcher is responsible for the direction of Cloudera’s storage products. These include HDFS, HBase, Parquet, and several others. He’s also responsible for managing strategic partnerships with storage vendors.
JD Cryans is a software engineer at Cloudera and an Apache HBase PMC member.
Kristi Cunningham leads the Enterprise Data Management (EDM) group within the Risk Management organization at Capital One. Her responsibilities include setting policy and standards for effective data quality management across the enterprise, monitoring compliance to standards, building data management competency, and providing effective data management solutions for the organization. A primary responsibility involves being a change leader for the organization in building effective data management practices into everyone’s day-to-day responsibilities and job functions.
Nick Curcuru has been delivering analytics solutions for nearly 20 years in operations and consulting. He is currently principal of the big data analytics practice at MasterCard Advisors, where he works with the executive suite cascading to the operational level to enable data-driven strategy for the organization. Nick has extensive experience in retail, manufacturing, transportation, hospitality, entertainment, media, and communications. He joined MasterCard Advisors from the SAS Institute; prior to the SAS, he worked for Andersen Consulting.
Nick has contributed to several books, including Arthur Andersen’s Global Lessons in Activity-Based Management (John Wiley & Sons, Inc., 1999) and... Read More.
Doug Cutting is the chief architect at Cloudera and the founder of numerous successful open source projects, including Lucene, Nutch, Avro, and Hadoop. Doug joined Cloudera from Yahoo, where he was a key member of the team that built and deployed a production Hadoop storage-and-analysis cluster for mission-critical business analytics. Doug holds a bachelor’s degree from Stanford University and sits on the board of the Apache Software Foundation.
Timothy Danford is a computer scientist working on advanced automation approaches to big data variety in the pharmaceutical and healthcare industries. Previously, Timothy was a software architect, engineer, and founding team member for Genome Bridge LLC, a Broad Institute subsidiary organized to develop cloud-based SaaS genomic analysis pipelines. He has experience in developing data-management services, applications, and ontologies for bioinformatics and genomics systems at Novartis and Massachusetts General Hospital. As a PhD student in computer science at MIT CSAIL, he focused on computational functional genomics. He is a contributor to ADAM, an open source project... Read More.
Tathagata Das is an Apache Spark committer and a member of the PMC. He is the lead developer behind Spark Streaming, which he started while a PhD student in the UC Berkeley AMPLab, and is currently employed at Databricks. Prior to Databricks, Tathagata worked at the AMPLab, conducting research about data-center frameworks and networks with Scott Shenker and Ion Stoica.
Prior to joining Amplify as a general partner, Mike Dauber spent over six years at Battery Ventures, where he led early-stage enterprise investments on the West Coast, including Battery’s investment in a stealth security company that is also in Amplify’s portfolio. Most recently, Mike sat on the boards of Continuuity, Duetto, Interana, and Platfora. Mike previously invested in Splunk and RelateIQ, which was recently acquired by Salesforce. Mike began his career as a hardware engineer at a startup and later held product, business development, and sales roles at Altera and Xilinx. Mike is a frequent speaker at conferences and is... Read More.
A 20-year tech industry veteran, Margaret leads global product marketing for the Integrated Solutions business unit at Red Hat. She is a frequent author and speaker on cloud computing, big data, open source, women in tech, and the intersection of business and technology. Margaret is a proven entrepreneur and intrapreneur, having led successful programs and teams at several startups, such as Aventail and Hubspan, and Fortune 500 companies, including Amazon, Microsoft, and HP. Prior to Red Hat, she was VP of Product Marketing and Cloud Evangelist for HP Helion, the cloud computing division of Hewlett-Packard. Her passions include agile marketing,... Read More.
Based in Denver, Vincent Dell’Anno is managing director, Information Management-Data Supply Chain, Accenture Analytics, now a part of Accenture Digital. He also serves as a member of the Accenture Analytics global leadership team. As Accenture’s Data Supply Chain lead, Vincent manages a global team of technologists and data scientists that leverage new and emerging technologies to help clients manage large volumes of data to drive high performing analytic-driven outcomes, cost effectively, at scale. Vince has a BA in economics from Dickinson College and an MBA from the George Washington University School of Business.
Senior Architect at Akamai Technologies
Matt Derda is a CPFR analyst with PepsiCo Customer Supply Chain. CPFR, which stands for collaborations, planning, forecasting, and replenishment, is a new program in PepsiCo Customer Supply Chain, and Matt has had the opportunity to be a part of the piloting group. Through CPFR, Matt and his team have delivered improved forecast accuracy and fill rates by expanding collaboration with customers and leveraging shared data to provide best-in-class service. Matt’s team has built multiple “CPFR Tools” that use large datasets to drive the program forward.
Adam Devine leads product marketing for WorkFusion, a SaaS platform for collecting, cleansing, and controlling data. Adam has 15 years of experience growing businesses through product marketing, including product positioning, market intelligence, messaging, and content creation. He began his career in management consulting at BearingPoint’s Banking & Capital Markets practice. Adam speaks frequently about human-machine collaboration, machine learning, and automation at conferences, including FIMA, FISD, Massolution, MarketTech, NAFIS, NFAIS, and SIIA.
Vasant Dhar is professor, Stern School of Business and Center for Data Science at New York University, and founder of SCT Capital Management. He created the Adaptive Quant Trading (AQT) program, a data-driven learning machine that trades the world’s most liquid futures contracts systematically. Dhar has written over 100 research articles and dozens of opinion editorials in media including the Financial Times, Wall Street Journal, Forbes, and Wired Magazine. He is editor-in-chief of the Big Data journal.
Robby Dick has been working with the workload automation discipline in various capacities since 1994.
Anthony Dina serves as the director of enterprise technologists at Dell, Inc., where he leads a team of solutions architects with expertise in big data and application acceleration to work with customers on how to transform IT into better business outcomes. Anthony has 17 years in the IT industry and has held a number of executive director of strategy and director of solutions marketing titles. Some of his successes include ramping the blades business to number one, launching the first Opteron server, and championing virtual IO solutions, all within 10 years. Anthony holds a masters of business administration from the... Read More.
Sheetal Dolas is a principal architect working with Hortonworks with strong expertise in the Hadoop ecosystem and rich field experience. He helps small to large enterprises solve their business problems strategically and functionally as well as at scale by using big data technologies. Sheetal has over 14 years of strong IT experience and has served in key positions as lead big data architect, SOA architect, and technology architect in multiple large and complex enterprise programs. He has extensive knowledge of big data/NoSql technologies including Hadoop, Hive, Pig, HBase, Storm, Kafka etc., and has been working in this space for... Read More.
Mark Donsky leads data management and governance solutions at Cloudera. Previously, Mark held product management roles at companies such as Wily Technology, where he managed the flagship application performance management solution, and Silver Spring Networks, where he managed big data analytics solutions that reduced greenhouse gas emissions by millions of dollars annually. He holds a BS with honors in computer science from the University of Western Ontario.
Allen Downey is a professor at Olin College and the author of Think Python, Think Stats, Think Bayes, and more. He writes about statistics in his blog Probably Overthinking It.
Michael Droettboom is a main contributor to matplotlib, the premier plotting library in the scientific Python ecosystem. He is the creator of “airspeed velocity” for benchmarking Python projects over time, the author of Understanding JSON Schema, and a primary contributor to astropy.
Chris DuBois is a data scientist focused on building tools for other data scientists. At Dato, Chris has helped design and implement tools for creating recommendation systems and for large-scale text analysis. His current work makes it simpler to train models that generalize well. After studying applied mathematics at Pomona College, he earned a PhD in statistics from the University of California, Irvine, where he researched latent variable models for social-network data occurring over time.
Vlad is a Chief Data Scientist at DonorsChoose.org. Aside from working with “datasets that change mindsets”, Vlad likes good design, nature and backpacking. Before he was a co-founder at The Unreasonable Institute and Startup Festival, India. He’s currently learning construction by building a DIY tiny house on wheels.
Ted Dunning has been involved with a number of startups—the latest is MapR Technologies, where he is chief application architect working on advanced Hadoop-related technologies. Ted is also a PMC member for the Apache Zookeeper and Mahout projects and contributed to the Mahout clustering, classification, and matrix decomposition algorithms. He was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics. Opinionated about software and data-mining and passionate about open source, he is an active participant of Hadoop and related communities and loves helping projects get going with new... Read More.
An Apache Cassandra committer and PMC member, Gary Dusbabek is a lifelong programmer specializing in distributed systems. His past experience includes working with large-scale text and image indexes in the newspaper industry and building a multi-data-center distributed metrics and monitoring system for a large cloud provider. Gary is the principal architect behind the open source Blueflood metrics platform. He is currently employed at Silicon Valley Data Science.
Jana Eggers is a tech executive focused on products and the messages surrounding them. Jana has started and grown companies and led large organizations within even bigger companies. She supports, subscribes to, and contributes to customer-inspired innovation, systems thinking, lean analytics, and Autonomy/Mastery/Purpose-style leadership. Jana’s software and technology experience includes technology and executive positions at Intuit, Blackbaud (software for nonprofits), Basis Technology (internationalization technology), Lycos, American Airline’s Sabre (decision support systems for logistics), Los Alamos National Laboratory (computational chemistry and supercomputing), Spreadshirt (customized apparel platform and ecommerce), and acquired startups that you’ve never heard of. Jana holds a bachelor’s degree... Read More.
Mike Emerick is currently acting as the healthcare industry architect for MapR technologies. Prior to this Michael was the co-founder of the IBM Healthcare Transformation Lab, where he focused on the buildout of large healthcare infrastructures and healthcare data analytics. His work included federal level health information exchanges for Australia, China, and Canada, and regional health information exchanges throughout the U.S. Mike worked on building out business and technical models for health benefit exchanges and accountable care organizations. He also worked on early solutions for genomic optimized care for patients with HIV/AIDS using HPC architectures. Mike’s... Read More.
Tim Estes is the chairman, CEO, and founder of Digital Reasoning. Tim’s academic work at the University of Virginia focused in the areas of philosophy of language, mathematical logic, semiotics, epistemology, and phenomenology. It was that eclectic academic background, coupled with the belief that in the future all software would learn from data as a core capability, that gave rise to Digital Reasoning. Tim and his team work closely with leaders in government and industry to solve extraordinarily valuable and morally compelling problems in National Security, Finance, Legal, and Health Care by automating the understanding of unstructured data.
Bob leads Technical Marketing for Cisco’s Data Virtualization (formerly Composite Software) and Analytics Business Units. In this role Bob guides thought leadership, analyst relations and new market penetration efforts.
Bob was the EVP of Marketing at Composite Software for seven years prior to its acquisition by Cisco in 2013. At Composite, Bob established data virtualization as a category and Composite as the market leader including co-authoring the first book on Data Virtualization, Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility.
Bob’s has driven multiple market transitions including creation of the Data Virtualization, and... Read More.
Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).
Vivek Farias is chief technology officer and co-founder of Celect. He is the Robert N. Noyce Professor of Management at MIT’s Sloan School. His research has led to numerous innovations in operations, supply-chain, and yield management. Prior to academia he worked in algorithmic finance. He received his PhD in electronic engineering at Stanford.
Bob Filbin is chief data scientist at Crisis Text Line, the first large-scale 24/7 national crisis line for teens on the medium they use and trust most: texting. Bob specializes in the application of behavioral psychology to questions of data collection, analysis, and reporting, to make sure data leads to good behavioral change. Bob has given lectures on using data to drive behavioral change at places including MIT, the University of Pennsylvania, and the North American International Auto Show, and has authored several articles in the Harvard Business Review on data. He runs in Prospect Park.
Andrew is the CTO and Co-founder of Leanplum, based in San Francisco. Leanplum is solving personalization on mobile by empowering companies to engage with their users via targeted messages and user experience optimization. Before Leanplum, Andrew was a Software Engineer at Google, working on optimizing video ad revenue. He graduated from Duke University with a BS in Electrical and Computer Engineering and Computer Science.
Brian started Google’s Chicago engineering office in 2005 and led several of Google’s global engineering efforts, including the Data Liberation Front, and Transparency Engineering. He also served as internal advisor for Google’s open data efforts, having previously led the Google Code and Google Affiliate Network teams. Prior to joining Google, Brian worked as an engineer at CollabNet, Apple, and a local Chicago development shop.
Brian first started contributing to open source software in 1998 and was a core Subversion developer from 2000 to 2005 as well as the lead developer of the cvs2svn utility. He is a member of the... Read More.
Camille Fournier is the former head of engineering at Rent the Runway. She was previously a vice president at Goldman Sachs. Camille is an Apache ZooKeeper committer and PMC member and a Dropwizard framework PMC member.
Martin Fowler is an author, speaker, consultant, and self-described loud-mouthed pundit on the topic of software development. He works for ThoughtWorks, a software delivery company, where he has the exceedingly inappropriate title of “Chief Scientist.” Martin has written half-a-dozen books on software development, including Refactoring and Patterns of Enterprise Application Architecture. He writes regularly about software development on martinfowler.com.
Martin’s main interest is to understand how to design software systems to maximize the productivity of development teams. In doing this he’s looked to understand the patterns of good software design, and also the processes that support software design. He... Read More.
The proud offspring of Haitian immigrants and Kentucky farmers, Mimi’s work for social, racial, and economic justice meets at the intersection of immigrants, women of color, & low-income communities. In her role at CODE2040, she oversees student-facing programming including the annual Fellows Program and the Technical Applicant Prep suite of programs and tools. Before CODE2040, Mimi was the Executive Director at Code for Progress, a non-profit coding bootcamp that pays adults of color to learn to code and helps them start careers in tech.
Mimi grew up in New York, and is a recent transplant to the Bay Area by... Read More.
Bill Franks is chief analytics officer for Teradata, providing insight on trends in the analytics and big data space, and helping clients understand how Teradata and its analytic partners can support their efforts. In addition, Bill is a faculty member of the International Institute for Analytics and the author of the book Taming the Big Data Tidal Wave (John Wiley & Sons, Inc., April 2012).
He is also an active speaker and blogger. Bill’s focus has always been to help translate complex analytics into terms that business users can understand, and to then help an organization implement the results effectively... Read More.
Michael Freeman is a lecturer at the University of Washington Information School, where he teaches courses on data visualization and web development. With a background in public health, Michael works alongside research teams to design and build interactive data visualizations to explore and communicate complex relationships in large datasets. His freelance work ranges from web design to software consulting. You can take a look at samples from his projects here.
Chris Fregly is a research scientist at PipelineIO, a San Francisco-based streaming machine learning and artificial intelligence startup. Previously, Chris was a distributed systems engineer at Netflix, a data solutions engineer at Databricks, and a founding member of the IBM Spark Technology Center in San Francisco. Chris is a regular speaker at conferences and meetups throughout the world. He’s also an Apache Spark contributor, a Netflix Open Source committer, founder of the Global Advanced Spark and TensorFlow meetup, author of the upcoming book Advanced Spark, and creator of the upcoming O’Reilly video series Deploying and Scaling Distributed TensorFlow in... Read More.
Eric Frenkiel is the cofounder and CEO of MemSQL, an in-memory distributed database that combines real-time and historical big data analytics. MemSQL is a Y Combinator company that has raised more than $45M in venture capital. Prior to MemSQL, Eric worked at Facebook on partnership development. He has worked in various engineering and sales engineering capacities at both consumer and enterprise startups. Eric is a graduate of Stanford University’s School of Engineering. In 2011 and 2012, Eric was named to Forbes’s 30 under 30 list of technology innovators.
Venky Ganti has been a data enthusiast since graduate school, and has enjoyed working at various levels of the data analysis stack. At Google, he was an avid data consumer who helped engineer innovative data products that now generate over one billion dollars in yearly revenue. At Microsoft, he worked on advanced data quality infrastructure in ETL platforms. Venky started out working on advanced data analysis and mining technology during his PhD at the University of Wisconsin-Madison. Venky thoroughly enjoys spending time with his family, going on walks, and roller-blading, when he feels adventurous.
Yael Garten leads a team of data scientists at LinkedIn that focuses on understanding and increasing growth and engagement of LinkedIn’s 400 million members across mobile and desktop consumer products. Yael is an expert at converting data into actionable product and business insights that impact strategy. Her team partners with product, engineering, design, and marketing to optimize the LinkedIn user experience, creating powerful data-driven products to help LinkedIn’s members be productive and successful. Yael champions data quality at LinkedIn; she has devised organizational best practices for data quality and developed internal data tools to democratize data within the company. Yael... Read More.
Alan Gates is a co-founder at Hortonworks, and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in mathematics from Oregon State University and an MA in theology from Fuller Theological Seminary. He is also the author of Programming Pig from O’Reilly Press.
Matthew Gee is cofounder and principal at the Impact Lab, a data-analytics company focused exclusively on developing scalable data science solutions to social-sector problems. He is also a senior research scientist at the University of Chicago’s Center for Data Science and Public Policy and a research fellow at the Urban Center for Computation and Data. Matt is the cofounder of the Eric and Wendy Schmidt Data Science for Social Good fellowship, which in its first three years has paired 126 fellows with over 40 national, state, and local government organizations and NGOs to build data-driven solutions to social problems.
Matt’s... Read More.
Ari Gesher is a senior software engineer and engineering ambassador at Palantir Technologies. He is co-author of the upcoming book Architecture of Privacy, about building data systems that responsibly handle sensitive data. Ari can often be found speaking on the topic of privacy protections, big data, and the limits of automated decision making.
At Palantir Technologies, Ari has split his time between working as a design prototyper for the user interface team, a backend engineer on Palantir’s analysis platform, thinking and writing about Palantir’s vision for human-driven information data systems, and moonlighting on both Palantir’s Privacy and Civil Liberties team... Read More.
Charles Givre is an unapologetic data geek who is passionate about helping others learn about data science and become passionate about it themselves. For the last five years, Charles has worked as a data scientist at Booz Allen Hamilton for various government clients and has done some really neat data science work along the way, hopefully saving US taxpayers some money. Most of his work has been in developing meaningful metrics to assess how well the workforce is performing. For the last two years, Charles has been part of the management team for one of Booze Allen Hamilton’s largest analytic... Read More.
Michele Goetz is principal analyst at Forrester Research, serving enterprise and data architecture professionals. She is a leading expert on data management, artificial intelligence, data governance, master data management, and data quality. Michele helps enterprises leverage data assets more effectively by improving the availability and accuracy of the information that businesses use in processes and analytics.
Prior to joining Forrester, Michele managed the business intelligence and data management programs at PTC. During her tenure, she developed and led the global consolidation of customer data across multiple customer relationship management (CRM) platforms to support a single view of the... Read More.
Brett Goldstein is a leader in enterprise architecture, big data analytics, and government technology with 15 years of experience in operations, management, and leadership in technical environments in both the public and private sector. Brett was recently named the inaugural recipient of the Fellowship in Urban Science at the University of Chicago’s Harris School of Public Policy. As a senior fellow in urban science, he will focus on issues of computation and public policy to inform better decision making in government. Previously, Brett was the commissioner and chief information officer of the Chicago Department of Innovation and Technology (DoIT), appointed... Read More.
Micha Gorelick was the first man on Mars in 2023 and won the Nobel Prize in 2046 for his contributions to time travel. He then went back to the 2000s to study astronomy, teach scientific computing, and work on data at bitly. After writing a book on High Performance Python, he helped start Fast Forward Labs as a resident mad scientist. There he worked on many issues, from machine learning to performant stream algorithms. A monument celebrating his life can be found in Central Park, 1857.
Alex Gorelik is the founder and CEO of Waterline Data, a startup focused on enhancing the value of Hadoop through data self-service and governance. Alex is a serial entrepreneur and innovator who has spent over 25 years inventing and bringing to market cutting-edge data-oriented technology.
Prior to Waterline, Alex was an EIR at Menlo Ventures. He joined Menlo from Informatica, where he held several executive roles, including GM of Informatica’s Data Quality Business Unit—driving marketing, product management, and R&D for an $80M business—and SVP of R&D for Core Technology—driving innovation in big data and social media while... Read More.
Daniel L. Goroff is vice president and program director at the Alfred P. Sloan Foundation, a grant-making philanthropy that supports breakthroughs in science, technology, and economics. He is professor emeritus of mathematics and economics at Claremont’s Harvey Mudd College, where he previously served as vice president for academic affairs and dean of the faculty.
Goroff earned his B.A.-M.A. degree in mathematics Summa Cum Laude at Harvard as a Borden Scholar, an M.Phil. in economics at Cambridge University as a Churchill Scholar, a masters in mathematical finance at Boston University, and a Ph.D. in mathematics at Princeton University as a Danforth... Read More.
Matthew Granade is a cofounder of Domino Data Lab, which makes a workbench for data scientists to run, scale, share, and deploy analytical models. He also invests in, advises, and serves on the boards of startups in data, data analysis, finance, and financial tech. He currently works with multiple companies including Quantopian, Premise, and Orbital Insights. Previously, Matthew was co-head of research at Bridgewater Associates, where he built and managed teams that ensured Bridgewater’s understanding of the global economy, created new systems for generating alpha, produced daily trading signals, and published Bridgewater’s market commentary. Before that, Matthew was an engagement... Read More.
Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Prior to founding Cask, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase.... Read More.
Garrett Grolemund is the editor-in-chief of shiny.rstudio.com, the development center for the Shiny R package, and is the author of Hands-On Programming with R as well as Data Science with R, a forthcoming book by O’Reilly Media. Garrett works as a data scientist and chief instructor for RStudio, Inc.
Robert Grossman is a faculty member and the chief research informatics officer in the Biological Sciences Division of the University of Chicago. Robert is the director of the Center for Data Intensive Science (CDIS) and a senior fellow at both the Computation Institute (CI) and the Institute for Genomics and Systems Biology (IGSB). He is also the founder and a partner of the Open Data Group, which specializes in building predictive models over big data. Robert has led the development of open source software tools for analyzing big data (Augustus), distributed computing (
Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive Jupyter widgets library. He has also been a major contributor to the open source Sage mathematical software system and co-organizes the PyDataNYC Meetup. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. He earned a PhD in mathematics from Brigham Young University.
Mark Grover is a software engineer working on Apache Spark at Cloudera. Mark is a committer on Apache Bigtop and a committer and PMC member on Apache Sentry and has contributed to a number of open source projects including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He is a coauthor of Hadoop Application Architectures and also wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data at various national and international conference. He occasionally blogs on topics related to technology.
Peter Guerra is Chief Data Scientist and Vice President leading Booz Allen Hamilton’s Data Science commercial team. He has 15 years of experience in creating big data and data science solutions for government and commercial clients. He was responsible for the architecture and implementation of one of the world’s largest Hadoop clusters for the Federal Government. He has consulted with Fortune 500 companies and federal government organizations throughout his career. Recently he has focused on data governance and security of large data systems, working on a book for O’Reilly titled “Data Security for Modern Enterprises”. He is a frequent speaker... Read More.
Carlos Guestrin is the director of machine learning at Apple and the Amazon Professor of Machine Learning in Computer Science and Engineering at the University of Washington. Carlos was the cofounder and CEO of Turi (formerly Dato and GraphLab), a machine-learning company acquired by Apple. A world-recognized leader in the field of machine learning, Carlos was named one of the 2008 Brilliant 10 by Popular Science. He received the 2009 IJCAI Computers and Thought Award for his contributions to artificial intelligence and a Presidential Early Career Award for Scientists and Engineers (PECASE).
Meet us at Booth #105 and Checkout our Open Source ETL on Hadoop Utility developed in partnership with Capital One..
Ali Habib is a medical student at Northwestern University’s Feiberg School of Medicine. In addition to beginning a radiology residency, Ali is also interested in data science applied toward analysis in financial markets.
Jon Haddad has 15 years’ experience in both development and operations. For the last 10, he’s worked at various startups in southern California. For the last two years, he’s been the maintainer of cqlengine, the Python object mapper for Cassandra, now integrated into the native Cassandra driver. Jon is currently a technical evangelist at Datastax, where he continues to focus on advancing Cassandra in the Python, operations, and data science communities. Jon holds a degree in computer science from the University of Vermont.
Alan Hannaway is the product owner for data at 7digital, where he is responsible for ensuring the company is developing and extracting value from its line of data products. Prior to 7digital, Alan worked in a variety of roles, most recently providing data to the entertainment industry through his own startup. Alan started his career working as a researcher in computer science, focusing his interests on the application of technology to measure the scale and distribution of content consumption on large Internet networks.
Software Engineer at DigitalOcean
Ben Harden leads the Big Data Practice at CapTech, and has over 17 years of enterprise software development experience in the areas of data warehousing, metadata management, data governance, business intelligence, and enterprise scale Hadoop data ingestion and refinement. He is a certified IBM Cognos Specialist, Business Objects Report Designer, Certified Scrum Master, Scaled Agilist, and Project Management Professional.
Rob Harper is partner, lead product architect at Uncharted, and has been building technical platforms and products in the visualization industry for a decade. Over the past number of years Rob has been focusing on development of web-based HTML5 technology approaches for big data.
Michael Hausenblas is a data center application architect with Mesosphere. He helps DevOps to build and operate scalable and elastic distributed applications. His background is in large-scale data integration, Hadoop, and NoSQL. Michael is also contributing to open source software at Apache (Myriad, Drill).
Jeffrey Heer is Trifacta’s chief experience officer and cofounder as well as a professor of computer science at the University of Washington, where he directs the Interactive Data Lab. Jeff’s passion is the design of novel user interfaces for exploring, managing, and communicating data. The data visualization tools developed by his lab (D3.js, Protovis, Prefuse) are used by thousands of data enthusiasts around the world. In 2009, Jeff was named to MIT Technology Review’s list of Top Innovators under 35.
Joseph M. Hellerstein is the Jim Gray Chair of Computer Science at UC Berkeley and cofounder and CSO at Trifacta. Joe’s work focuses on data-centric systems and the way they drive computing. He is an ACM fellow, an Alfred P. Sloan fellow, and the recipient of three ACM-SIGMOD Test of Time awards for his research. He has been listed by Fortune among the 50 smartest people in technology, and MIT Technology Review included his work on their TR10 list of the 10 technologies most likely to change our world.
Sam Heywood is responsible for driving Cloudera’s portfolio of security technologies. He is a seasoned product and marketing executive with leadership experience at several notable technology startups and is well versed in systems management, online CRM platforms, consumer eCommerce, and security technologies. Prior to Cloudera, Sam was VP products and marketing for Gazzang, leading global product innovation and delivery, corporate marketing, and demand generation programs. Sam was senior director of products at uShip, driving the company’s expansion into multiple product lines spanning the consumer retail and commercial freight markets. Sam also held product and marketing management roles at Convio,... Read More.
Andrew Hill is a biologist with technology bent, focusing on informatics and the use of big data. He has worked in diverse domains within biology, including epidemiology, microbiology, biodiversity informatics, and phyloinformatics. Andrew is the chief science officer at CartoDB, where he explores the future of online mapping to help guide innovation and education.
Eva Ho is a General Partner at Susa Ventures, an early stage technology fund investing in companies that leverage the power of data to create market-leading platforms, tools, and analytics with inherent network effects. Eva is a serial entrepreneur and founder, most recently a founding executive at Factual, a leading location data provider in Los Angeles. Prior, she was a Senior Product Marketing Manager at Google and Youtube for 5 years. Prior to Google, she was the head of marketing for Applied Semantics, a company that sold to Google in 2003. She also co-founded Navigating Cancer, a health startup, in... Read More.
Jeff Holoman is a systems engineer at Cloudera. Jeff is a Kafka contributor and has focused on helping customers with large-scale Hadoop deployments, primarily in financial services. Prior to his time at Cloudera, Jeff worked as an application developer, system administrator, and Oracle technology specialist.
Juliet Hougland is a data scientist at Cloudera and contributor/committer/maintainer for the Sparkling Pandas project. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal and designing and building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an MS in applied mathematics from the University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in math-physics.
Dr. Timothy Howes, co-inventor of LDAP and holder of numerous patents, leads innovation on ClearStory’s Spark-based data analysis platform. A respected entrepreneur and computer scientist, he was a co-founder of Loudcloud/Opsware and Rockmelt, and previously served as VP of engineering at Yahoo and CTO of HP Software and Netscape’s Server Products Division. He holds a bachelor of science degree in aerospace engineering, a master of science in computer science and engineering, and a Ph.D. in computer science, all from the University of Michigan.
Jonathan Hsieh is a software engineer at Cloudera. He is an Apache HBase committer, and Apache Flume founder.
Juan M. Huerta is the Head of Data Science at Dow Jones where he and his team focus on bringing the most innovative data and algorithmic approaches to the analysis of Dow Jones news and information, as well as toward the transformation of our business. Previous to Dow Jones, Juan’s work has focused on developing algorithms to decode, understand and extract information from location data, financial and banking data, as well as natural language, dialog, and speech signals. Working in premiere R&D organizations like the IBM Research Division, Carnegie Mellon University, and Dragon Systems, as well as leading financial... Read More.
Ignacio Hwang is the senior product manager responsible for Hadoop initiatives at the Hewlett Packard Enterprise Big Data Software division, with over 15 years of IT infrastructure experience finding innovative solutions for real enterprise applications. His professional background covers storage, cloud, virtualization, and Hadoop technologies, giving him a deep insight in what is required to build robust products to help drive today’s high performance analytics operations. He received his Bachelor degree at Tufts University and M.B.A at Boston College.
Bar Ifrach completed his BA in economics at Tel Aviv university in 2007 and continued on to graduate school at Columbia Business School in New York. He completed his PhD in operations research and economics in 2012. His academic research focused on learning and pricing in online marketplaces and game theory. Following that, Bar completed a postdoc at Stanford, where he researched visibility and ranking for mobile applications. He joined Airbnb as a data scientist in the search team in September 2013, and is currently leading a team of data scientists on the conversion team.
Ihab Ilyas is a professor in the Cheriton School of Computer Science at the University of Waterloo, where his main research is in the area of database systems, with special interest in data quality and integration, managing uncertain data, rank-aware query processing, and information extraction. Ihab is also a cofounder of Tamr, a startup focusing on large-scale data integration and cleaning. He is a recipient of the Ontario Early Researcher Award (2009), a Cheriton Faculty Fellowship (2013), an NSERC Discovery Accelerator Award (2014), and a Google Faculty Award (2014), and he is an ACM Distinguished Scientist. Ihab is... Read More.
Michał Iwanowski holds the position of product director at DeepSense.io. He graduated from the Warsaw University of Technology, specializing in software engineering and machine learning. He gained experience at IBM while working with big data exploration, predictive analytics, and data warehouses. At IBM he’s been developing an analytical toolkit for machine learning and data mining, while authoring a number of invention disclosures and a patent claim. He has collaborated with medical researchers, performed statistical analyses of medical research results, and created systems for computer-aided experiment design.
Anand Iyer is a senior product manager at Cloudera, the leading vendor of open source Apache Hadoop. His primary areas of focus are platforms for real-time streaming, Apache Spark, and tools for data ingestion into the Hadoop platform. Before joining Cloudera, Anand worked as an engineer at LinkedIn, where he applied machine-learning techniques to improve the relevance and personalization of LinkedIn’s Feed. Anand has extensive experience leveraging big data platforms to deliver products that delight customers. He holds a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.
Jeff Jarrell is a data architect at American Airlines on both the Big Data and the AA.com Web Analytics teams. He’s been through all the battles with the team in getting Hadoop into Production and is now working with the various business groups gaining insights from their Big Data system.
Stefanie Jegelka is the X-Consortium career development assistant professor at the Department of Electrical Engineering and Computer Science at MIT, and a member of CSAIL and the Institute for Data, Systems and Society. Before joining MIT in Spring 2015, she was a postdoctoral scholar in the AMPLab at UC Berkeley, working with Michael Jordan and Trevor Darrell. She earned her PhD from ETH Zurich in collaboration with the Max Planck Institutes in Tuebingen, Germany, and a Diplom from the University of Tuebingen. She has been a fellow of the German National Academic Foundation, and has received... Read More.
Rahel holds a PhD in Economics from Princeton, and MS in Mathematics from NYU. She is Director of Data Science at Hearst. She is passionate about big data and cross-disciplinary literacy. She has consulted with Fortune 100 companies leveraging machine learning, domain expertise, modeling, time series and econometrics tools to solve and address business challenges. She has worked in the space of Big Data since 2010 and worked with data and analytics throughout her career starting in financial investments and trading, and now working with content and digital media. She runs the popular meetup: Economics and Big Data and... Read More.
Weihua Jiang is the engineering manager at Intel for big data enabling. He has worked on big data since 2011. He was the release manager for Intel’s Hadoop distribution from 2011 to 2014. Currently he is focusing on big data enabling, including optimizing the software stack for better performance and to make the ecosystem enterprise ready.
Ann Johnson is cofounder and CEO of Interana, the experts in event data analytics, where she has created a community of all-star talent working to make data-informed decisions a natural extension of everyone’s workflow. Previously, Ann served as a new product manager and integration engineer at Intel. Ann received an MS in electrical engineering from Caltech, where she was selected for the Intel Scholarship program and subsequently offered a leadership position at Intel.
Joy Johnson leads mobile at music technology startup AudioCommon, a team of MIT musicians and PhD hackers revolutionizing the way music is created, organized, and shared in today’s interconnected world. Through AudioCommon’s cloud-based collaboration platform, musicians and the greater industry can collaborate in new ways during the very early stages of the creative process (capturing data that has never been captured before) and share a new type of content to engage fans with a new interactive experience, giving artists a new way to monetize and thrive in today’s Industry.
Joy is a recent graduate of the Massachusetts Institute of... Read More.
Jeff Jonas is an IBM Fellow and chief scientist of Context Computing. His work in context-aware computing was originally developed at Systems Research & Development (SRD), founded by Jonas in 1985, and acquired by IBM in January, 2005.
Prior to SRD’s acquisition, Jonas spearheaded the design and development of a number of innovative systems, including technology used by the Las Vegas gaming industry. One such innovation played a pivotal role in protecting that industry from aggressive card count teams. The most notable, known as the “MIT team,” was featured in the book Bringing Down the House, and... Read More.
Håkan Jonsson is a data scientist in the Lifelog Insights team at Sony Mobile. He is a PhD student at Lund University with context awareness, mobile sensing, and social computing as his subjects.
Anthony D. Joseph is a Professor in Electrical Engineering and Computer Science at UC Berkeley. He received his B.S., S.M., and Ph.D. Degrees in Computer Science from MIT. He joined the UC Berkeley faculty in 1998, where he is developing adaptive techniques for: cloud computing, network and computer security, and security defenses for machine learning-based decision systems. He also co-leads the DETERlab testbed, a secure scalable testbed for conducting cybersecurity research, and he is a Technical Advisor at Databricks.
Sven Junkergård is the Chief Technology Officer of Zephyr Health, the Insights-as-a-Service leader harnessing the power of global healthcare data to address critical business and patient needs.
Sven specializes in identifying new technologies, partnerships and data sources that further advance Zephyr Health’s insights focused on product lifecycle success for BioPharma and Medical Device companies.
Russell Jurney is principal consultant at Data Syndrome, a product analytics consultancy dedicated to advancing the adoption of the development methodology Agile Data Science, as outlined in the book Agile Data Science 2.0 (O’Reilly, 2017). He has worked as a data scientist building data products for over a decade, starting in interactive web visualization and then moving towards full-stack data products, machine learning and artificial intelligence at companies such as Ning, LinkedIn, Hortonworks and Relato. He is a self taught visualization software engineer, data engineer, data scientist, writer and most recently, he’s becoming a teacher. In addition to helping companies... Read More.
Ritu Kama is the director of product management for big data at Intel. She has over 15 years of experience in building software solutions for enterprises. She led engineering, QA, and solution delivery organizations within data center software divisions for security and identity products. She has led the product and program management responsibilities for Intel’s distribution of Hadoop and big data solutions. Prior to joining Intel, Ritu led technical and architecture teams at IBM and Ascom. She has an M.B.A. degree from the University of Chicago and a bachelor’s degree in computer science.
Reiner Kappenberger has over 20 years of computer software industry experience focusing on encryption and security for big data environments. His background ranges from device management in the telecommunications sector to GIS and database systems. He holds a diploma from the FH Regensburg, Germany in computer science.
Holden Karau is a software development engineer at IBM and is active in open source. Prior to IBM, she worked on a variety of big data, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. Holden is the author of Learning Spark and has assisted with Spark workshops. She graduated from the University of Waterloo with a bachelors of mathematics in computer science.
Ronald E. Kasabian is vice president in the Data Center Group and general manager of big data solutions at Intel Corporation. He has overall responsibility for Intel’s strategy and plans in the big data arena, spanning hardware platforms, software solutions, services, strategic business partnerships, and paths to market.
Ron joined Intel in 1984 and spent his first 14 years at the company developing and managing software solutions for various enterprise applications. For several years beginning in 1998, he led product development at Pandesic LLC, a joint venture formed by Intel and SAP to deliver e-commerce solutions.
Over the... Read More.
Jim led CSC’s global Big Data & Analytics (BD&A) business unit, their fastest growing business, focused on turning data into revenue through the use of analytics. This unit had over 1,000 people, with a focus on solving really hard problems.
As a senior executive/CEO, he built high-tech products, teams, and businesses in Big Data, cloud computing, Software as a Service (SaaS), online and mobile digital media, advertising, and semiconductors. He also spent 10+ years in leadership roles developing enterprise data warehousing, data mining, and business intelligence solutions.
Adam Kelleher is a data scientist and mathemagician. He has a physics PhD from the University of North Carolina-Chapel Hill.
Kyle Kelley was a software developer at Rackspace and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone; from small teams to massive scale.
Alex Kelly is currently a software development manager for General Motors who is very passionate about Big Data, IoT, Cars, Planes, and many other things. Before General Motors, Alex worked for Microsoft as a Product Manager on the Power BI team where he became familiar with many Big Data Tools. He passionately believes technology is the gateway to changing the world, and his goal is to empower everyone with technology.
In his free time, he usually can be found acting as a consultant for startups where he focuses on UX, UI, team/culture building, and service technologies.
Jake Kendall leads the research and innovation initiative of the Financial Services for the Poor team at the Bill & Melinda Gates Foundation. Jake’s team manages FSP’s major research grants, data collection activities, and technology innovation projects. Previous to joining the Foundation, he spent time as an economist with the Consultative Group to Assist the Poor (CGAP) housed in the World Bank. Jake holds a PhD in development economics from UC Santa Cruz and a BS in physics from MIT. Jake has also worked as a Peace Corps volunteer, a brand analyst for a major advertising firm, in... Read More.
Katie Kent is the Product Manager for Galvanize Enterprise, the learning community for technology. In this role she builds executive and contributor training in software development, data science, and data engineering. Katie was part of the founding of data science training startup Zipfian Academy, where she was responsible for growth of the business from concept to acquisition. Previously Katie worked in venture capital, working with startups building data- and design-driven products. Katie’s academic background is in environmental social science research at the University of Michigan.
Paul Kent is vice president of big data initiatives at SAS, where he divides his time between customers, partners, and the Research & Development teams discussing, evangelizing, and developing software at the confluence of big data and high-performance computing. Paul was previously vice president of the Platform R&D division at SAS, where he led groups responsible for the SAS foundation and mid-tier technologies—teams that develop, maintain, and test Base SAS, as well as related data access, storage, management, presentation, connectivity, and middleware software products. Paul has contributed to the development of SAS software components including... Read More.
Jooseong is a software engineer at Pinterest on the data engineering team. He has worked on various components of the offline data stack including Pinalytics (analytics and visualization engine), A/B experiments framework, and platforms for processing pipelines. Before joining Pinterest, Jooseong was a software engineer at Oracle in the kernel service team, where he worked on the cpu scheduler and parallel statement scheduler for data warehouses.
Phil Kim leads the Data Lab, a passionate group of creative thinkers and builders who blend data science, engineering, product, and design to develop breakthrough solutions for Capital One. The Data Lab works to deliver more intuitive and intelligent experiences to help its customers succeed, as well as develop new approaches to high-value analytical problems, such as risk prediction and fraud detection.
Phil has an extensive background in designing and building technology-driven products and businesses. Most recently, he co-founded Bundle (personal finance analytics), where he was the CTO and head of product. Phil joined Capital One in November 2012... Read More.
Aaron Kimball is the CTO of Zymergen, Inc. Zymergen uses high-throughput techniques, combined with big data analysis, to improve genetic strains for microbial chemical production. Aaron has been working with Hadoop since 2007. In 2008 he was Cloudera’s first employee, where he wrote Apache Sqoop and MRUnit, as well as performed a lot of Hadoop training. In 2010, Aaron founded WibiData and assumed the role of chief architect. WibiData helps organizations build big data applications. Aaron holds a BS in computer science from Cornell University and an MS in computer science from the University of Washington.
As chief technology officer and senior vice president of global e-commerce at Walmart, Jeremy King leads product, engineering, and the web ops teams charged with developing Walmart’s online business on a global scale as the company moves to the next generation of e-commerce. Jeremy received a B.S. in information technology from San Jose State University.
Martin Kleppmann is a software engineer and entrepreneur, specialising in the data infrastructure of internet companies. His last startup, Rapportive, was acquired by LinkedIn in 2012. He is a committer on Apache Samza and Apache Avro, and author of the O’Reilly book Designing Data-Intensive Applications. His technical blog is at martin.kleppmann.com.
Joe Klobusicky is an applied mathematics/predictive analyst at Geisinger Health System. His interests include recommender systems, natural language processing, and Markov theory with a slant toward bioinformatics and pharmacoeconomics.
Maria Konnikova writes about human behavior, science, and psychology, most notably for her weekly blog at The New Yorker. In her bestseller, Mastermind: How to Think Like Sherlock Holmes, she offers tips and advice for improving cognitive ability. And in all her work, she displays a flare for finding new angles through which to explore popular topics such as motivation, performance, and the brain. Maria’s breakout book, Mastermind, has been translated into 16 languages. In it, she explores the famous detective’s signature methods of observation, logical deduction, and mindfulness, showing readers how to apply his techniques in everyday situations. Her... Read More.
Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.
Balaji Krishna has been with SAP for over 16 years, with customer-facing experience as support consultant, RIG, solution management, and currently product management. He has been a trusted advisor to customers in architecting and implementing the best end-to-end EDW and analytics solutions. In his current role, Balaji is responsible for SAP Vora and HANA/Hadoop integration topics.
Chris Kudelka has worked on the big data team at Riot Games since 2011. In his current role, Chris is product lead and engineering manager for the Insights Tech team (Riot’s big data initiative). His team enables core-game and backend-platform, and other feature teams integrate with Riot’s data ecosystem and analytics tools so they can focus on the player from both behavior and performance perspectives.
Prior to Riot, Chris was a researcher and developer at Washington University’s Cognitive Aging Lab. He received his degree in philosophy-neuroscience-psychology from Washington University in St Louis, with a focus on linguistics. He used to... Read More.
Lenni Kuff is an engineering manager at Cloudera. Before joining Cloudera, he worked at Microsoft on a number of projects including SQL Server storage engine, SQL Azure, and Hadoop on Azure. Lenni graduated from the University of Wisconsin-Madison with degrees in computer science and computer engineering.
Scott Kurth is the vice president of advisory services at Silicon Valley Data Science, where he helps clients define and execute the strategies and data architectures that enable differentiated business growth. Building on 20 years of experience making emerging technologies relevant to enterprises, he has advised clients on the impact of technological change, typically working with CIOs, CTOs, and heads of business. Scott has helped clients drive global technology strategy, conduct prioritization of technology investments, shape alliance strategy based on technology, and build solutions for their businesses. Previously, Scott was director of the Data Insights R&D practice within the Accenture... Read More.
Haden Land is vice president of research and technology for Lockheed Martin IS&GS, with 30 years of professional experience. He serves numerous U.S. government agencies, allied nations, and regulated commercial industries. Haden is responsible for technical solutions, strategic partnerships, global innovation centers, research and development, and emerging technology planning. His areas of expertise include cloud computing, big data, cyber security, enterprise mobility, complex adaptive systems, enterprise architecture, and advanced concepts. He has domain knowledge within government, space, energy, law enforcement, transportation, and healthcare.
Previously, Haden was vice president of solutions engineering for Lockheed Martin IS&GS, vice president of engineering and... Read More.
Phil Langdale is a lead engineer at Cloudera working on Cloudera Manager. He has worked on all versions of Cloudera Manager since inception.
Uri Laserson is a data scientist at Cloudera. Previously, he obtained his PhD from MIT where he developed applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012, he was selected to Forbes’s list of 30 under 30.
Rachel Laycock is a market technical principal at ThoughtWorks in New York, where she has played the role of coach, trainer, technical lead, architect, and developer, coaching teams on Agile and continuous delivery technical practices. She is now a member of the Technical Advisory Board to the CTO, which regularly produces the ThoughtWorks Technology Radar. Rachel has over 10 years of experience in systems development and has worked on a wide range of technologies and the integration of many disparate systems. She is fascinated by problem solving and has discovered that people problems are often more difficult to solve... Read More.
Kim Le is currently a Program Manager for General Motors with a proven track record in leading large scales initiatives in the Sales and Marketing & Finance space. She holds a Masters in Management Information System and is passionate about providing technology that will help drive to better utilization of data for everyone.
With over 15 years of software experience under his belt, John Leach’s expertise in analytics and BI drives his role as chief technology officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. At Incite Retail, he built custom big data systems (leveraging HBase and Hadoop) for Fortune 500 companies.
Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. John was a key subject matter expert for Blue Martini Software in many strategic implementations across the world. His... Read More.
Raph Lee manages the Data Tools team at Airbnb, which is responsible for lowering barriers toward data-informed decision-making through automation, education, data visualization, and storytelling. A full-stack engineer by training and a four-year-plus veteran of Airbnb, he’s worked on everything from host-facing features to database tuning to SEO.
Mike Lee Williams is director of research at Fast Forward Labs, an applied machine intelligence lab in New York City, where he builds prototypes that bring the latest ideas in machine learning and AI to life and helps Fast Forward Labs’s clients understand how to make use of these new technologies. Mike holds a PhD in astrophysics from Oxford.
Matt LeMay is the co-founder of Constellate Data, where he designs human-centered systems for contextualizing and collaborating around data. In his work as a technology communicator, Matt has designed and led workshops about product management and data strategy for companies including Pfizer, Visa, McCann, and Johnson & Johnson. Previously, Matt worked as Senior Product Manager at music startup Songza (acquired by Google), and Head of Consumer Product and Platform Manager at Bitly. Matt is also a musician, recording engineer, senior contributor to music website Pitchfork.com, and the author of a book about singer-songwriter Elliott Smith.
Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus), a memory-speed virtual distributed storage system. Before founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds an MS from Cornell University and a BS from Peking University.
Nong Li is a software engineer at Cloudera working on the RecordService and Impala projects. Before joining Cloudera, he worked at Microsoft developing new APIs for the Windows graphics system (DirectX). Nong holds a Sc.B. in computer science from Brown University.
Todd Lipcon is an engineer at Cloudera, where he primarily contributes to open source distributed systems in the Apache Hadoop ecosystem. Previously, he focused on Apache HBase, HDFS, and MapReduce, where he designed and implemented redundant metadata storage for the NameNode (QuorumJournalManager), ZooKeeper-based automatic failover, and numerous performance, durability, and stability improvements. In 2012, Todd founded the Apache Kudu project and has spent the last three years leading this team. Todd is a committer and PMC member on Apache HBase, Hadoop, Thrift, and Kudu, as well as a member of the Apache Software Foundation. Prior to Cloudera, Todd worked... Read More.
Alex Loffler is a principal technology architect at TELUS, one of Canada’s largest providers of cellular, fixed-line, and cable television services. He has nearly 20 years of experience in architecting enterprise software solutions. Alex holds several patents in the U.S. and Europe. Alex has an MSc from University College London and a BSc, with honors, from the University of Sheffield.
Ben Lorica is the chief data scientist at O’Reilly Media. Ben has applied business intelligence, data mining, machine learning, and statistical analysis in a variety of settings, including direct marketing, consumer and market research, targeted advertising, text mining, and financial engineering. His background includes stints with an investment management company, Internet startups, and financial services.
Mike Loukides is vice president of content strategy for O’Reilly Media. He’s edited many highly regarded books on technical subjects that don’t involve Windows programming. He’s particularly interested in programming languages, Unix and what passes for Unix these days, and system and network administration. Mike is the author of System Performance Tuning and a coauthor of Unix Power Tools. Most recently, he’s been fooling around with data and data analysis, languages like R, Mathematica, and Octave, and thinking about how to make books social.
Jason Loveland is a software architect at Lockheed Martin IS&GS, with 12 years of professional experience. Jason is responsible for leading research and development programs applying big data applications and advanced analytics for space systems and cyber security domains. He has expertise in enterprise architecture, software engineering, modeling and simulation, big data, and cloud solutions.
Jason holds a Bachelors in Computer Engineering from Villanova University. He also holds a Masters in Engineering from Old Dominion University.
Brandon MacKenzie is the Data Science on Hadoop leader on IBM’s Worldwide Technical Sales team for Information Management Software. He is an expert on statistical processing in Hadoop and HPC environments. Brandon earned his master’s degree from The University of Edinburgh.
Jock D. Mackinlay is an American information visualization expert and vice president of visual analysis at Tableau Software. Jock has a Ph.D. in computer science from Stanford University, where he pioneered the automatic design of graphical presentations of relational information. He joined Xerox PARC in 1986, where he collaborated with the User Interface Research Group to develop many novel applications of computer graphics for information access, coining the term Information Visualization. Much of the fruits of this research can be seen in his book, Readings in Information Visualization: Using Vision to Think (Morgan Kauffman, written and edited with Stuart... Read More.
Mark Madsen is a research analyst at Third Nature, where he advises companies on data strategy and technology planning. Mark has designed analysis, data collection, and data management infrastructure for companies worldwide. He focuses on two types of work: the business applications of data and guiding the construction of data infrastructure. As a result, Mark does as much information strategy and IT architecture work as he does performance management and analytics.
Roger Magoulas is the research director at O’Reilly Media and chair of the Strata + Hadoop World conferences. Roger and his team build the analysis infrastructure and provide analytic services and insights on technology-adoption trends to business decision makers at O’Reilly and beyond. He and his team find what excites key innovators and use those insights to gather and analyze faint signals from various sources to make sense of what others may adopt and why.
Rajiv Maheswaran is CEO of Second Spectrum, an innovative sports analytics and data visualization startup located in Los Angeles, California. His work spans the fields of data analytics, data visualization, real-time interaction, spatiotemporal pattern recognition, artificial intelligence, decision theory, and game theory. Previously, Rajiv served as a research assistant professor within the University of Southern California’s Department of Computer Science and a project leader at the Information Sciences Institute at the USC Viterbi School of Engineering. He and Second Spectrum COO Yu-Han Chang codirected the Computational Behavior Group at USC. Rajiv has received numerous awards and... Read More.
Ted Malaska is a senior solution architect at Blizzard. Previously, he was a principal solutions architect at Cloudera. Ted has 18 years of professional experience working for startups, the US government, some of the world’s largest banks, commercial firms, bio firms, retail firms, hardware appliance firms, and the largest nonprofit financial regulator in the US and has worked on close to one hundred clusters for over two dozen clients with over hundreds of use cases. He has architecture experience across topics including Hadoop, Web 2.0, mobile, SOA (ESB, BPM), and big data. Ted is a regular contributor... Read More.
Rishi Malhotra is co-founder and CEO of Saavn, India’s leading music streaming service. As CEO, Rishi has led the company through significant and rapid user growth, while helping to secure partnerships with companies like Twitter, Facebook, Google, Shazam, T-Mobile, and Sonos. He is focused on driving the global Saavn team to deliver award-winning mobile products, big data systems for media, and industry-defining business innovation. Saavn is on path to become one of the largest streaming music companies in the world by 2017. Rishi has led the team in raising more than $125MM in funding from leading institutional investors,... Read More.
Silviu Maniu is a researcher at Noah’s Ark Lab, Huawei Technologies. He holds a PhD degree in Computer Science from Telecom ParisTech. His main research interests are social and uncertain data management databases, and stream machine learning.
Sri Manjunath is a cofounding engineer at Saavn and former lead engineer at Yahoo! He has 10+ years of experience building scalable websites and back-end systems. He currently heads the engineering team at Saavn.
Adam is a co-founder of Unlimited Labs. Previously, he was Locu’s director of data. Prior to that, he completed his Ph.D. in Computer Science at MIT. His dissertation is on database systems and human computation. He is a recipient of the NSF and NDSEG fellowships, and has previously worked at ITA, Google, IBM, and FactSet. In his free time, he builds course content to get people excited about data and programming.
A scientist, best-selling author, and entrepreneur, Gary Marcus is currently professor of psychology and neural science at NYU and CEO and cofounder of the recently formed Geometric Intelligence, Inc. Gary’s efforts to update the Turing test have spurred a worldwide movement and his research on language, computation, artificial intelligence, and cognitive development has been published widely in leading journals such as Science and Nature. He is also the author of four books, including The Algebraic Mind, Kluge: The Haphazard Evolution of the Human Mind, and the New York Times best-seller Guitar Zero, and contributes frequently to the the... Read More.
Jayson Margalus is a demo engineer at MapR, faculty member at DePaul Unviersity, and has a background in design with a specialty in games, interactive exhibits, and data. He lives in Mokena, Illinois where he chairs the Mokena Technology Committee, the Mokena makerspace SpaceLab, and runs a Maker Faire. He also founded the Glen Ellyn makerspace Workshop 88. Some maker-related projects include the “Big Data Outbreak” project for Big Data Everywhere and Hackerspaces in Space. Jay also writes for outlets like Make, NBC Chicago, and the Mokena Messenger.
Kristi Marotta received a bachelors in actuarial science from the University of Iowa. She currently uses Tableau to visualize data and solve business problems in her position as a competitive intelligence consultant at Allstate.
Hilary Mason is founder and CEO of Fast Forward Labs, a machine intelligence research company, and data scientist in residence at Accel Partners. Previously Hilary was chief scientist at Bitly. She cohosts DataGotham, a conference for New York’s home-grown data community, and cofounded HackNY, a nonprofit that helps engineering students find opportunities in New York’s creative technical economy. Hilary served on Mayor Bloomberg’s Technology Advisory Board and is a member of Brooklyn hacker collective NYC Resistor.
Murthy Mathiprakasam is a director of product marketing for Informatica’s big data products. Murthy has a decade and a half of experience working with emerging high-growth software technologies, including roles at Mercury Interactive/HP, Google, eBay, VMware, and Oracle. Murthy holds an MS in management science from Stanford University and BS degrees in management science and computer science from the Massachusetts Institute of Technology.
Damon McDougall did his PhD in Mathematics at the University of Warwick in the UK. His research focuses are in Bayesian inverse problems, parameter estimation, learning, computational science, high-performance computing, and software engineering. Damon is a core developer of Matplotlib and contributes heavily to the open source community.
Patrick McFadin is one of the leading experts in Apache Cassandra and data-modeling techniques. As a consultant and the chief evangelist for Apache Cassandra at DataStax, Patrick has helped build some of the largest and most exciting deployments in production. Prior to DataStax, he was chief architect at Hobsons, an education services company. There, Patrick spoke often on web application design and performance.
Emma McGrattan is SVP of engineering at Actian, where she leads the Actian Vector, Actian Vector Hadoop Edition, and Actian Matrix development teams. A leading authority in DBMS technologies, Emma has over 20 years’ experience managing, supporting, and developing a variety of databases, from her early days with Ingres to the cutting-edge Actian Vortex. Emma joined the original Ingres Corp. in 1992 and held a senior leadership role on the Ingres engineering team through a number of acquisitions. Born in Ireland, Emma earned a bachelor of electronic engineering from Dublin City University.
Hugh McGrory brings expertise in film production, art, and technology to the world of immersive media. He was a partner at Culture Shock, consulting for clients including The National Film Board of Canada. In 2011 Hugh brought the partners together to create The Andy Warhol Film Digitization Project, featuring over 500 films by Warhol, developed in collaboration with The Moving Picture Company and Technicolor and described in the New York Times as “the largest effort to digitize the work of a single artist in MoMA’s collection.”
Hugh grew up in Derry, Northern Ireland. He co-founded the Belfast-based studio Make.ie in... Read More.
Jim McHugh has over 20 years of experience as a marketing executive and leadership positions with startup, mid-sized, and high profile companies, including Sun Microsystems and Apple, prior to joining Cisco Systems. Jim is the vice president of product and solutions marketing for Unified Computing Systems at Cisco. He leads and drives marketing initiatives for UCS and partner solutions marketing (including EMC, Intel, NetApp, SAP, Microsoft, and VCE.)
Jim is focused on building a vision for organizational success and executing marketing strategies measured by achievement of UCS revenue, market share, and growth. He has a... Read More.
Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library, and he is a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera, and he was the founder and CEO of DataPad.
Director of Research, Harvard’s National Preparedness Leadership Initiative
Contributing Editor, Strategy+Business Magazine
Contributing Editor, Business Review (China)
Contributing Editor, Center for Higher Ambition Leadership
Former Contributing Editor, Harvard Business Publishing
Hussein Mehanna is an engineering manager at Facebook, where he founded and manages the Applied Machine Learning platform team. Hussein started as the original developer on the team, which quickly developed from an ads-focused ML platform to a Facebook-wide platform. Prior to Facebook, Hussein worked as a software engineer for Bing, Microsoft. He is a holder of a masters degree in speech recognition from the University of Cambridge, UK.
Gian is a committer on the Druid project. Previously, as a senior software engineer at Metamarkets, he was responsible for the infrastructure powering real-time data processing and ingestion. Gian holds a BS in computer science from the California Institute of Technology.
Katherine Milkman is a tenured associate professor at the Wharton School at the University of Pennsylvania, and the winner of numerous research and teaching awards. Her research relies heavily on big data to document various ways in which individuals systematically make counterintuitive choices. Before becoming an academic, she was a Division I collegiate athlete and one of the top 120 women’s junior tennis players in the U.S. She also worked briefly in investment banking at Goldman Sachs and equity research at Morgan Stanley.
Katherine has published over two dozen research papers in the last decade in leading social science journals.... Read More.
Prat Moghe is the founder and CEO of Cazena. Prat is a successful big data entrepreneur with nearly 20 years of experience inventing next-generation products and building strong teams in the technology sector. Prior to founding Cazena, as SVP of strategy, products, and marketing at Netezza, Prat led a 400-person team that launched the latest-generation Netezza appliance, which led the market in price and performance. Netezza was acquired by IBM for $1.7B in 2010.
x.ai is a personal assistant who schedules meetings for you.
Bill Moschella is the co-founder and chief executive officer of Evariant, Inc., a company that provides a SaaS healthcare CRM and big data platform to healthcare providers. He has used his leadership and entrepreneurial skills to build Evariant into a dominant healthcare market leader by helping these organizations execute successful patient and physician engagement strategies. The company has more than doubled its revenue for software subscriptions each year for the past two years, grown headcount year-over-year (2014 to 2015) by 144%, and was recognized as being one of the fastest growing organizations in the state of Connecticut for three... Read More.
Andreas Mueller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at New York University. In the last four years, he has been maintainer and one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely-used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and... Read More.
Terry Mughan (PhD) is Associate Professor in the School of Business at Royal Roads University and Associate Fellow at the Centre for Global Studies, University of Victoria, both in Canada. His research interests have revolved around the place of language and cultural skills in business
internationalisation strategies, including a 1200 company study of companies in the East of England. He has authored several research reports for policy bodies such as UK Trade and Investment and the OECD (Organisation for Economic Cooperation and Development) on the internationalization of small and medium-sized companies (SMEs). He has published articles in The... Read More.
Jacques Nadeau is the cofounder and CTO of Dremio. He is also the founding PMC chair of the open source Apache Drill project, spearheading the project’s technology and community. Previously, Jacques was the architect and engineering manager for Drill and other distributed systems technologies at MapR and the CTO and cofounder of YapMap, an enterprise search startup, and held engineering leadership roles at Quigo (AOL), Offermatica (ADBE), and aQuantive (MSFT).
Neha Narkhede is the cofounder and head of engineering at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s petabyte-scale streaming infrastructure built on top of Apache Kafka and Apache Samza. Neha specializes in building and scaling large distributed systems and is one of the initial authors of Apache Kafka. A distributed systems engineer by training, Neha works with data scientists, analysts, and business professionals to move the needle on results.
Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise. He... Read More.
Jan Neumann manages the research group at Comcast Labs DC where he and his team focus on using machine learning and large scale computing for content discovery, multimedia information extraction, and big data analysis with the goal to innovate the TV and home consumer experience. Before Comcast, he worked for Siemens Corporate Research on various computer vision related projects. He holds a Ph.D. in Computer Science from the University of Maryland, College Park.
Billy Newport has been at Goldman Sachs as a Technology Fellow since 2011, working on big data and graph problems at the firm. Prior to that he was a Distinguished Engineer at IBM for 10 years, where he worked primarily on distributed systems and high availability for the WebSphere platform. He graduated from Waterford Institute of Technology with a first class honor degree in industrial computing in 1989.
Christopher Nguyen is CEO and cofounder of Arimo (née Adatao), the leader in collaborative, predictive intelligence for enterprises. Previously, Christopher served as engineering director of Google Apps and cofounded two successful startups. As a professor, he also cofounded the computer engineering program at HKUST (香港科技大学). Christopher has a BS from UC Berkeley, where he graduated summa cum laude, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1. He is also a cocreator of the open source Distributed DataFrame project.
Piotr Niedzwiedz is a founder and CTO of deepsense.io, a big data science company based in Menlo Park, California, and Warsaw, Poland. Deepsense.io provides machine-learning and deep learning consulting and has developed Seahorse, a scalable data analytics workbench powered by Apache Spark, which lets users build data-processing workflows without needing to write any code. Piotr is a successful entrepreneur. Prior to deepsense.io, he cofounded CodiLime, an IT company delivering software services in networks and security areas. Previously, he worked as a software engineer at Google and Facebook on projects related to big data and distributed systems. He supports and... Read More.
Jack Norris is the senior vice president of data and applications at MapR Technologies. Jack has a wide range of demonstrated successes, from defining new markets for small companies to increasing sales of new products for large public companies, in his 20 years spent in enterprise software marketing. Jack’s broad experience includes launching and establishing analytics, virtualization, and storage companies and leading marketing and business development for an early-stage cloud storage software provider. Jack has also held senior executive roles with EMC, Rainfinity, Brio Technology, SQRIBE, and Bain & Company. Jack earned an MBA from UCLA’s Anderson... Read More.
Robert Novak is a consulting systems engineer for big data in the Cisco Americas Partner Organization. In short, he’s told, a Big Data Unicorn. He has been a sysadmin for 20 years, a big data admin since 2003 or so, a Hadoop admin since 2009, and a Cisco UCS C-Series admin since Christmas 2011. Robert brings the viewpoint of the practitioner and customer into the sales, channel partner, and independent software partner fields at Cisco to integrate Hadoop, big data, and analytics into Cisco’s data center technologies, especially UCS.
As a software developer, systems architect, director and now founder, John O’Duinn has designed and helped build release engineering infrastructure that is practical, reliable, cross-platform, scalable and efficient. In addition to technology, John loves growing a culture where distributed teams and individuals work seamlessly together no matter where they are physically in the world. At Mozilla this involved building a tightly knit team of 18 release engineers in 14 cities, in four non-adjacent timezones working with the geo-distributed Mozilla open source project. At Hortonworks, the team was in four cities, in three non-adjacent time zones, working closely with the geo-distributed... Read More.
Cathy O’Neil a data scientist for the startup media company Intent Media. Cathy began her career as a postdoc in MIT’s Math department. She has been a professor at Barnard College, where she published a number of research papers in arithmetic algebraic geometry, worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis and for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. Cathy holds a PhD in math from Harvard.
A leading expert on big data architecture and Hadoop, Stephen O’Sullivan has 20 years of experience creating scalable, high-availability data and applications solutions. A veteran of WalmartLabs, Sun, and Yahoo, Stephen leads data architecture and infrastructure at Silicon Valley Data Science.
Matt Ocko has three decades of experience as a technology entrepreneur and VC, in the U.S. and globally. His prior investments include Cotendo (AKAM), Zynga (ZNGA), Facebook (FB), XenSource (CTRX), UltraDNS (NSR), FlashSoft (SDNK), Fortinet (FTNT), Aggregate Knowledge (NSR), Virtuata (CSCO), DataMirror (IBM), Couchbase, Ayasdi, Kenshoo, D-Wave Systems, MetaMarkets, Uber, AngelList, and many others, including multiple additional acquisitions by Google, Facebook, Netapp, and other Fortune 1000 tech companies.
Matt founded Da Vinci Systems, a pioneering e-mail software vendor with over 1 million users worldwide prior to its acquisition. He is an... Read More.
Amy O’Connor is a big data evangelist and telecommunications specialist at Cloudera, the leading big data vendor. She advises customers globally as they introduce big data solutions and adopt enterprise-wide big data delivery capabilities. Amy was recently named one of Information Management’s 10 Big Data Experts to Know. Prior to joining Cloudera, Amy built and ran Nokia’s big data team, developing and managing Nokia’s data assets and leading a team of data scientists to drive insights. Previously, Amy was vice president of services marketing and also led strategy for the software and storage business units of Sun Microsystems.
Andrew Odewahn is the CTO of O’Reilly Media, where he helps define and create the new products, services, and business models that will help O’Reilly continue to make the transition to an increasingly digital future. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing. Andrew has an MBA from New York University and a degree in computer science from the University of Alabama. He’s also thru-hiked the Appalachian Trail from Georgia to Maine.
Travis Oliphant has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive Guide to NumPy.
Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of... Read More.
Mike Olson cofounded Cloudera in 2008 and served as its CEO until 2013, when he took on his current role of chief strategy officer. As CSO, Mike is responsible for Cloudera’s product strategy, open source leadership, engineering alignment, and direct engagement with customers. Previously, Mike was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine, and he spent two years at Oracle Corporation as vice president for embedded technologies after Oracle’s acquisition of Sleepycat. Prior to joining Sleepycat, Mike held technical and business positions at database vendors Britton Lee, Illustra Information Technologies,... Read More.
Peter Olson is a director and creative technologist at IDEO where he focuses on creative, practical, and human-centered applications of technology for clients and the larger design and technical community. He is passionate about using technology and data as tools for storytelling, insight, communication, and understanding.
Prior to joining IDEO, Peter was a founder of and served as a vice president of technology for Marvel Entertainment’s Digital Media Group, where he helped drive innovation and technical strategy within the larger Marvel and Disney organizations. Peter has additionally worked as a consultant for a variety of companies and as... Read More.
Sean Owen is director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the coauthor of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.
David Paige is the senior director of enterprise data platform at Cox Automotive, Inc. He has over 20 years of experience in distributed systems, having led many innovative data platform projects. His group of architects builds out and manages the technical infrastructure for the company’s analytics and data platforms. The Cox Automotive, Inc. data and analytic infrastructure includes various big data technologies (Hadoop, Hive, Spark, Pig, HBase, and others), and traditional BI tools (Netezza, MicroStrategy, SAS, etc.).
Iulia Pasov is a machine learning engineer at Avira, the German antivirus company, where she has worked since December 2014. She likes to tackle complex machine learning and natural language processing tasks, and has experience in web development as well. Iulia holds two masters degrees in artificial intelligence, one from the Politechnic University of Bucharest, and the other from Lumiere Lyon 2 University and Polytech, Nantes.
DJ Patil is the chief data scientist and deputy chief technology officer for data policy at the White House Office of Science and Technology Policy, where he advises on policies and practices to maintain US leadership in technology and innovation, fosters partnerships to maximize the nation’s return on its investment in data, and helps to attract and retain the best minds in data science to serve the public. Since joining OSTP, DJ has collaborated with colleagues across government, including the chief information officer and the US Digital Service as part of the Obama administration’s commitment to open data and... Read More.
Pamela Pavliscak (pronounced pav-li-check) is the CEO of SoundingBox, where she advises designers, developers, and decision makers on how to create technologies with emotional intelligence. Pamela is also on the faculty at Pratt Institute’s School of Information and is leading an effort for IEEE Standards for ethics and artificial intelligence. Pamela explores our conflicted and emotional relationship with technology and often speaks on creativity in the digital age, generation Z, and emotion and technology, most recently at SXSW and Collision.
Arthur Peng is a software engineer at Intel, where he works on applications of Intel’s CPU technology to Impala.
Mike Percy is software engineer currently working at Cloudera on Kudu, a native columnar database for the Hadoop ecosystem. He is also a committer and PMC member on Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! building a content recommendation system on top of Hadoop and HBase. Mike holds an MS in Computer Science from Stanford University and a BS in Computer Science from the University of California, Santa Cruz.
Kevin Perko is the Data Team Lead at Scribd, the leading subscription reading service. He focuses on evaluating search engine performance, building data pipelines, and democratizing access to data through various initiatives including Reddit-style AMAs, emails, and individual outreach. With nearly a decade of analytics experience, Kevin has worked for a multitude of Bay Area startups including Eventbrite, GREE, and Education.com. He has a background in Finance from Santa Clara University and has volunteered with The University of Cape Town to teach computer skills in the townships of South Africa.
Prior to joining Dstillery (former Media6Degrees), Claudia Perlich spent five years working at the Data Analytics Research group at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She has been published in over 30 scientific publications and holds multiple patents in the area of machine learning. Claudia has won many data mining competitions, including the prestigious 2007 KDD CUP on movie ratings, the 2008 KDD CUP on breast-cancer detection, and the 2009 KDD CUP on churn and propensity predictions for... Read More.
SVP/GM Security and Fraud Solutions at First Data Corporation.
Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.
Thomas Phelan is cofounder and chief architect of BlueData. Prior to BlueData, Tom was an early employee at VMware and as senior staff engineer was a key member of the ESX storage architecture team. During his 10-year stint at VMware, he designed and developed the ESX storage I/O load-balancing subsystem and modular “pluggable storage architecture.” He went on to lead teams working on many key storage initiatives, such as the cloud storage gateway and vFlash. Earlier, Tom was a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially available 64-bit... Read More.
Susanna Pirttikangas, D. Sc. (Tech.) received her PhD in embedded systems from the University of Oulu, Finland. Her post-doctoral visits were to Japan (Waseda University, 2004-2005 and Tokyo Denki University, 2008) and China (Tsinghua University, 2011). She is a co-leader of the Interactive Spaces research group within the Department of Computer Science and Engineering. The group is lead by Dean Jukka Riekki, and other co-leaders are Senior Research Fellow Mika Rautiainen and Iván Sánchez. In the team, Susanna works as a data scientist specializing in situation awareness. She has experience in developing methodology to de-noise, fuse, segment, and classify real-time... Read More.
Jeff Pollock is an expert data integration technology leader. He is currently vice president of product management for the Oracle Data Integration & Governance business unit, and previously was responsible for all IBM Information Integration & Governance products. Prior to Oracle and IBM, Jeff was an independent architect for the U.S. Defense Department, vice president of technology at Cerebra, and chief technical officer of Modulant – he has been developing data integration, semantic middleware, and inference-driven SOA platforms since 2001. Prior to that, Mr. Pollock was a principal rngineer with Modem Media and senior architect with Ernst... Read More.
Jules Polonetsky serves as executive director and co-chair of the Future of Privacy Forum, a Washington, D.C.-based think tank that seeks to advance responsible data practices. FPF is supported by the chief privacy officers of more than 110 leading companies, several foundations, as well as by an advisory board comprised of the country’s leading academics and advocates. FPF’s current projects focus on big data, mobile, location, apps, the internet of things, wearables, de-identification, connected cars and student privacy.
His previous roles have included serving as chief privacy officer at AOL and before that at DoubleClick; as consumer affairs... Read More.
Beate Porst is the lead product manager for data integration in the information integration and governance group at IBM. Her primary focus is on setting the vision, strategy, and tactical advancement of IBM’s data preparation and integration technology. Prior to being a product manager, Beate was a solution architect in the IBM Advanced Engineering and Solution group, leading the architecture and development of reusable assets to support a richer integration amongst IBM Information Management products. Beate has more then 15 years experience in data management, virtualization, integration, and governance, in both engineering and product management roles. Beate... Read More.
Bill Porto is an expert in applying computational intelligence to solve real-world problems across various problem domains. As senior analytics engineer at RedPoint Global, he develops automated business optimization software that incorporates evolutionary optimization, neural networks, and a host of other non-traditional machine learning techniques. An applied mathematician by trade, Bill has created adaptive solutions to dynamic problems for resource allocation, pattern recognition, drug discovery, and logistics scheduling. Before RedPoint, he was president of Natural Selection, Inc. where he received the 2010 FDA Honor Award for his work on their PREDICT automated risk-assessment system.
Jake Porway is the founder and executive director of DataKind, a nonprofit that harnesses the power of data science in the service of humanity. He is an alum of the New York Times R&D Lab and has worked at Google and Bell Labs. A recognized leader in the Data for Good Movement, he has spoken at IBM, Microsoft, Google, and the White House. Jake is also a PopTech Social Innovation fellow and a National Geographic Emerging Explorer. He holds a BS in computer science from Columbia University and an MS and PhD in statistics from UCLA.
James Powell is a NYC-based Python programmer with experience in quantitative finance and data science. James is also very active in the Python community, where he organizes NYC Python, the world’s largest and most active Python meetup group. He also works with the numeric and scientific computing nonprofit NumFOCUS to help organize the PyData conference series. James is a frequent speaker at Python conferences and has been invited to speak at events such as PyData New York, PyData London, PyGotham, the conference For Python Quants, and PyCon Spain.
Sean Power is a consultant, analyst, author, and speaker. He is the co-founder of Watching Websites, a boutique consulting firm focusing on early stage startups, products, and non-profits as they emerge and mature in their niches. He has built professional services organizations, and traveled across North America delivering engagements to Fortune 1000 companies. He helps executives understand their competitive landscape and the future of their industry. He has done technical editing for Troubleshooting Linux Firewall for Addison-Wesley, and co-authored Complete Web Monitoring with Alistair Croll for O’Reilly Media.
Sean has had first-hand experience creating and implemented social computing strategies with... Read More.
Arvind Prabhakar is co-founder and CTO at StreamSets, a Big Data Startup based in San Francisco. He is an Apache Software Foundation member and a PMC member on Flume, Sqoop, Storm and MetaModel projects.
Prior to starting StreamSets, Arvind held many roles at Cloudera ranging from software engineer to director of engineering. Before Cloudera, Arvind was an architect in the core platform engineering team at Informatica and a staff engineer at Sun Microsystems.
Ravi Prakash is a Hadoop committer and a senior software engineer at Altiscale. Previously, he was a senior software developer at Yahoo!, where he worked on Hadoop Core development (HDFS, MapReduce, and YARN). Ravi has also worked in software development at Tavare Research Labs and Motorola. Ravi has a BS in computer science from GGS Indraprastha University and an MS in computer science from the University of Southern California.
Peter Prettenhofer is a data scientist / software engineer at DataRobot. He studied computer science at Graz University of Technology, Austria and Bauhaus University Weimar, Germany, focusing on machine learning and natural language processing. He is a contributor to scikit-learn where he co-authored a number of modules such as Gradient Boosted Regression Trees, Stochastic Gradient Descent, and Decision Trees.
Randall Pruim is a professor of mathematics and statistics at Calvin College, author of Foundations and Applications of Statistics: An Introduction Using R, and the maintainer of several R packages, including fastR and mosaic. His research interests include statistical computing and statistics education (especially for students in the natural sciences).
Evan Prodromou is founder and CTO of Fuzzy.io, an AI-as-a-service startup based in Montreal. His previous startups include Wikitravel, StatusNet, where he led development of StatusNet and pump.io Open Source social software, and Breather. He is chair of the W3C working group on Social Web standards.
Greg Rahn has worked as performance engineer for over a decade on parallel RDBMS systems and Hadoop SQL engines. He spent eight years running competitive data warehouse benchmarks at Oracle as a member of the esteemed Real-World Performance Group as well as working on Impala performance while at Cloudera. Currently he is leading product at Snowflake Computing.
Karthik Ramasamy is the engineering manager and technical lead for real-time analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He cofounded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high availability solutions for... Read More.
Anand Ranganathan is the director of solutions at Unscrambl, LLC, which is a startup building solutions incorporating a variety of big data platforms and analytics for different industries. He is a data scientist, big data developer, architect, and researcher rolled into one person. He has worked with over 100 customers worldwide to design, implement, and deploy big data solutions, involving technologies such as IBM InfoSphere Streams, Hadoop, and lately, Spark.
Before joining Unscrambl, Anand was a global technical ambassador for big data in IBM’s Software Group. He evangelized IBM’s big data products and services, and led WW technical... Read More.
Jai Ranganathan is the director of product strategy at Cloudera, where he is responsible for planning the future roadmap of Cloudera products. Before Cloudera, he spent a decade at VMware, where among other things he was one of the developers on vMotion, storage vMotion, and the distributed management framework for vSphere.
Nirmal Ranganathan is a Principal Engineer working on the Data Stores Platform at Rackspace. He constantly works with various teams within Rackspace and customers alike, directing them on how best to take advantage of Big Data technologies. Nirmal plays an active role in the local Austin tech scene by volunteering for organizing meetups and other events in the Austin area. Nirmal was one of the founding members of Trove (Openstack’s Database as a Service) and has contributed to various Openstack initiatives, Cassandra, Alluxio and Thrift.
Kamalesh Rao is a North Carolina native who moved to New York City during the second term of Grover Cleveland. He entered the family trade because he was not cool, talented, or brave enough to attempt a career in something interesting or worthwhile like dance, parkour, or accounting. He really likes to write about himself in the third person.
As president and CEO of VoltDB, Bruce Reading brings nearly 30 years of experience building teams and creating business value in a variety of strategic roles including sales, marketing, asset management, mergers & acquisitions, and operations.
Before joining VoltDB, Bruce was senior vice president and general manager for Compuware Corporation (formerly NASDAQ:CPWR). Prior to Compuware, he spent six years as president, chief operating officer, and senior vice president at Gomez, Inc. Previously, Bruce served in senior management capacities at Access International, Cayman Systems, and Dictaphone Corporation. A native Canadian, Bruce maintains an active role in the startup... Read More.
Jeff Reback is a senior software developer for Continuum Analytics. As a former quant, he has lots of experiencing build financial trading systems, using Python, and working with very large data. Jeff has been a core committer to the pandas project for the past few years and currently manages the project.
Ben Recht is an associate professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California, Berkeley. Ben’s research focuses on scalable computational tools for large-scale data analysis, statistical signal processing, and machine learning. He explores the intersections of convex optimization, mathematical statistics, and randomized algorithms. He is particularly interested in simplifying the analysis and manipulation of noisy and incomplete data by exploiting domain-specific knowledge and prior information about structure. Ben is the recipient of an NSF Career Award, an Alfred P. Sloan Research Fellowship, and the 2012 SIAM/
Harper Reed is a hacker/engineer who builds paradigm-shifting tech and leads others to do the same. Harper loves using the enormity of the internet to bring people together, whether as CTO of Obama for America, CTO at Threadless.com, or on his own projects. Harper and his team created Dashboard, a site that connects volunteer teams and acts as an online component of the field office. Harper can often be found playing with new technology, looking for something to hack, or enjoying life in Chicago with his amazing wife. Currently Harper is focusing on defining the future of commerce... Read More.
Kim Rees is a founding partner of Periscopic, an award-winning information visualization firm. Their work has been featured in the MOMA, CommArts, PRINT, Adobe Success Stories, and others.
Kim is a prominent individual in the data visualization community. She has been featured in CommArts and the Huffington Post, and has presented at several industry events including Strata, Eyeo, Visualized, and OpenVis among others. She also runs the popular Portland Data Visualization Meetup. Kim received her BA in computer science from New York University.
Alex Rice is a cofounder and chief technology officer at HackerOne, which provides a platform that enables organizations to build strong relationships with a community of security experts. Alex is responsible for developing the HackerOne technology vision, driving engineering efforts, and counseling customers as they build world-class security programs. Previously, Alex worked at Facebook for over six years, where he founded the product security team, built one of the industry’s most successful security programs, and introduced new transport layer encryption used by more than a billion users. Alex also serves on the board of the Internet Bug Bounty, a nonprofit... Read More.
Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.
John B. Rollins, Ph.D. is a data scientist in the IBM Analytics division of IBM. His background is in the fields of data mining, engineering, and econometrics in many industries. He holds seven patents, and has authored a best-selling engineering textbook and many technical papers. He holds doctoral degrees in economics and petroleum engineering from Texas A&M University.
Stephen Romanoff is a director in Capital One’s Technology organization. He leads teams in developing data management solutions for Capital One’s big data initiatives. Before joining Capital One, he was a consultant specializing in big data capabilities—development, architecture, and strategy—for numerous federal government agencies. He has degrees from Emory University and the University of Virginia.
Mike Rosenthal spent the last six years overseeing brand partnerships and digital strategy for the band OK Go before joining Mick Management in 2015. In his role as head of strategic marketing at Mick, Mike works with a roster of artists including Walk the Moon, Of Monsters and Men, Leon Bridges, and Childish Gambino in developing new approaches to artist development and partnership strategy.
Jacques Roy is a member of the IBM worldwide analytics platform technical team, specializing in big data streaming analytics. He has also worked in many technology areas including operating systems, databases, and application development. He is the author of multiple books, with the most recent being The Power of Now: Real-Time Analytics and IBM InfoSphere Streams. He is also a regular contributor to IBM Data magazine. Jacques has been a presenter at many conferences including IBM’s Information on Demand (IOD).
Karen Rubin has spent the past 10 years building products and managing product development teams. She is currently on the product team at Quantopian, building the world’s first algorithmic trading platform in the cloud. She is currently focused on a new IPython research platform that will allow quants to access curated financial data in an interactive research environment.
Before coming to Quantopian, Karen spent time working on the investing team at Matrix Partners, where she helped evaluate potential investments and supported portfolio companies. She also spent five years on the product team at HubSpot, where she was responsible for building... Read More.
Laurel Ruma is the director of talent for O’Reilly Media. Most recently, Laurel cochaired Where 2.0, OSCON Java, and Gov 2.0 Expo. She joined O’Reilly after working for five years at various IT analyst firms in the Boston area. Laurel is the coeditor of Open Government, published by O’Reilly.
Sandy Ryza is a senior data scientist at Clover Health. He was previously at Cloudera doing engineering and data science. He is an author of O’Reilly’s Advanced Analytics with Spark, as well as a Spark committer and member of the Hadoop project management committee. He graduated Phi Beta Kappa from Brown University.
Melissa Santos has over a decade of experience with all parts of the data pipeline, from ETLs to modeling. Her role as a data scientist at Big Cartel involves teaching both engineers and nontechnical people how to get the data they need. Melissa holds a PhD in applied math.
Rahul Saxena is the engineering lead at Saavn for Search and Recommendations. His team architects and manages search and recommendation algorithms. They work on technologies like Solr, Neo4j, and Mahout.
Peter Schlampp is passionate about designing products that change the way users live, work, and interact with their world. He experienced first-hand the utility and complexity of big data while building products to secure enterprise networks. Peter has led Product and Marketing teams at Solera Networks, IronPort Systems, and Cisco Systems.
Eric Schmidt is the product management lead for Cloud Dataflow on the Cloud engineering team at Google, where his primary role is to help shape the future of fully managed, large-scale data processing. Eric spends the majority of his time working with existing cloud customers and on-premises developers who are moving their MapReduce and related data processing workloads to the cloud. He led the announcement of Cloud Dataflow (as Google I/O’s 2014 keynote) with the development of a real-time sentiment analysis and results prediction framework for the 2014 World Cup. Eric has a deep passion for user interaction modeling, data... Read More.
Jim Scott is the director of enterprise strategy and architecture at MapR Technologies, Inc. Across his career, Jim has held positions running operations, engineering, architecture, and QA teams in the consumer packaged goods, digital advertising, digital mapping, chemical, and pharmaceutical industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG), where he has coordinated the Chicago Hadoop community for six years.
Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging “big search” technologies into the many components comprising the Cloudera Enterprise Data Hub (EDH). Yonik was previously chief open source architect and cofounder at LucidWorks.
Michael Segel has been working with Hadoop since 2009 at various companies as a solution architect, solving the tough challenges. He is currently globe-trotting as a principal architect with Segel & Associates, looking for the next challenging problem to solve. Michael spends his free time thinking about solutions as he walks his dogs around the River North neighborhood in Chicago. While the founder of CHUG (Chicago area Hadoop User Group), Michael is also in the process of starting a Big Data Anonymous work group for those recovering big data-holics.
Jonathan Seidman is a software engineer on the Partner Engineering team at Cloudera. Previously, he was a lead engineer on the Big Data team at Orbitz Worldwide, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly Media.
Evan Selinger is an associate professor of philosophy at Rochester Institute of Technology, where he is also affiliated with the Center for Media, Arts, Games, Interaction, and Creativity (MAGIC). He’s also a fellow at The Institute for Ethics and Emerging Technology, and serves on the Advisory Board of The Future of Privacy Forum. Evan’s research primarily addresses ethical issues concerning technology, science, the law, expertise, and sustainability.
A prolific academic author, Evan also cares deeply about public engagement, and regularly writes for popular magazines, newspapers, and blogs, including: Wired, The Atlantic, Slate, The Wall Street Journal, The Nation, Salon,... Read More.
Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the... Read More.
Vin Sharma is the director of machine learning solutions in the Data Center group at Intel, where he focuses on autonomous driving and automated trading. Vin has helped build data center infrastructure software platforms—most recently the Trusted Analytics Platform—and has helped drive enterprise adoption of open source software like Linux, KVM, OpenStack, Hadoop and analytics for over 20 years. Before joining Intel, Vin held various engineering and management roles at HP for 15 years, building enterprise software products based on Linux, Java, XML, and other open source software.
Tomer Shiran is cofounder and CEO of Dremio. Previously, Tomer was the VP of product at MapR, where he was responsible for product strategy, roadmap, and new feature development. As a member of the executive team, he helped grow the company from 5 employees to over 300 employees and 700 enterprise customers. Prior to MapR, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He is the author of five US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from Technion, the... Read More.
Shiva is CEO/Co-Founder of Urban Engines, a startup focused on improving urban mobility. Prior to Urban Engines, from 2001 through 2010, Shiva was a Vice President and Distinguished Entrepreneur at Google, helping to build AdSense, Cloud Apps & ‘big data’ infrastructure such as
Dremel/WebIQ, and research and development centers across the world. With a deep interest in open data, Shiva created Sitemaps.org to surface Web data with an industry-first collaboration between Google, Yahoo!, and Microsoft. Shiva has a Ph.D. in Computer Science from Stanford University, where he was awarded the Samuel Thesis Award.
Gary Short is a Data Solution Architect for Microsoft. He specialises in machine learning and “big data” on the Azure Platform, but has an interest in data science, in all forms, especially computational linguistics and social network analysis.
Hari Shreedharan is a software engineer at Cloudera, an Apache Flume committer/PMC member, and a Spark contributor. He is the author of the O’Reilly Media book Using Flume.
Rosaria Silipo (LinkedIn) is not only an expert in data mining, machine learning, reporting, and data warehousing, she has become a recognized expert on the KNIME data mining engine, on which she has published three books: KNIME Beginner’s Luck, The KNIME Cookbook, and The KNIME Booklet for SAS Users.
Previously Dr. Silipo worked as a freelance data analyst for many companies throughout Europe. She has also led the SAS development group at Viseca (Zürich), implemented the speech-to-text and text-to-speech interfaces in C# at Spoken Translation (Berkeley, California), and developed a number of speech recognition... Read More.
Joseph Sirosh is the corporate vice president of Microsoft’s Data group, leading the database, big data, and machine-learning products, as well as a talented team of engineers, data scientists, and product leaders who are developing tools and services to transform data at scale into actionable intelligence. Joseph joined Microsoft from Amazon, where he was most recently the vice president for the Global Inventory Platform, responsible for the science and software behind Amazon’s supply chain and order fulfillment systems, as well as the central machine-learning group, which he built and led. Before joining Amazon, Joseph worked for Fair Isaac Corp. as... Read More.
As a member of the U.S. Government Tech Solutions team at DigitalGlobe, Ryan Smith leads cross-functional teams to build analytic solutions for various customers. These solutions focus on extracting insights from customer, commercial, and open data to support decision makers.
Scott Sokoloff transforms mountains of data on consumer behavior into actionable data-driven insights. His methodologies allow for the attribution of online activity to offline behavior and vice versa. He has worked with many industry giants including Microsoft, Capital One, Dominos Pizza, Burger King, Visa, PayPal, Forbes, Constant Contact and countless others combining best practices in analytics, econometrics, statistics, data science, and sales forecasting. The focus of his work is listening to what consumers are saying via their direct actions, to determine how they will behave in the future in order to maximize the profitability of decision making.
Offering a track... Read More.
Dima Spivak is a software engineer at Cloudera, where he works on test infrastructure. He is a committer and PMC member on the Apache HBase project.
Krishna Sridhar is a data scientist at Dato. He holds a PhD in computer science from the University of Wisconsin-Madison, where he worked on high-performance software for large-scale problems in mathematical optimization and data analysis. Krishna’s work has been used in applications such as healthcare, industrial production planning, and machine learning.
Dr. Jessica Stauth is Quantopian’s vice president of Quant Strategy. Jess holds a PhD from UC Berkeley in Biophysics. She worked as an equity quant analyst at the StarMine Corporation and as a director of quant product strategy for Thomson Reuters prior to joining Quantopian in August of 2013.
Julie Steele thinks in metaphors and finds beauty in the clear communication of ideas. She is particularly drawn to visual media as a way to understand and transmit information. Julie is coauthor of Beautiful Visualization (O’Reilly, 2010) and Designing Data Visualizations (O’Reilly, 2012).
Nathan Stephens recently joined RStudio as director of solutions engineering. His background is in applied analytics and consulting. He has experience building data science teams, creating innovative data products, analyzing big data, and architecting analytic platforms. He was an early adopter of R and has introduced it into many organizations. Nathan holds an MS in statistics from Brigham Young University.
Doug Stradley is the director of customer success at Trifacta. He and his team work with Enterprise customers around the world, helping them wrangle enormous, complex, and nasty mountains of data into usable information. Prior to Trifacta, Doug was the director of customer success at Informatica Cloud. With adoption as a focus, Doug has worked with many Fortune 500 companies to build productive relationships between humans and technology.
Brian Suda is a master informatician currently residing in Reykjavík, Iceland. Since first logging on in the mid-’90s, he has spent a good portion of each day connected to the internet. When he is not hacking on microformats or writing about web technologies, he enjoys taking kite aerial photography. His own little patch of internet can be found at Suda.co.uk, where many of his past projects, publications, interviews, and crazy ideas can be found.
Jagane Sundar is the CTO at WANdisco. Jagane has extensive big data, cloud, virtualization, and networking experience. He joined WANdisco through its acquisition of AltoStor, a Hadoop-as-a-service platform company. Previously, Jagane was founder and CEO of AltoScale, a Hadoop- and HBase-as-a-platform company acquired by VertiCloud. His experience with Hadoop began as director of Hadoop performance and operability at Yahoo. Jagane’s accomplishments include creating Livebackup, an open source project for KVM VM backup, developing a user mode TCP stack for Precision I/O, developing the NFS and PPP clients and parts of the TCP stack... Read More.
David Tabacco has worked extensively on the data strategy and the practical aspects of creating a data lake architecture to empower pharmaceutical data analytics. Over the course of this journey, David has focused on building and licensing big data tools that will enable the process of discovering, cataloging, enriching and governing data in the big data platform.
In previous roles, David led identity and access management initiatives such as single-sign-on and federation. Later, he was embedded in the clinical trials solution architecture team to understand and pair technology with business challenges.
David holds a BS in computer science from... Read More.
Matthew Tamayo-Rios is founder and CEO of Kryptnostic, a team of determined optimists united by the belief that individuals and organizations can safely leverage their data in the cloud. Previously, Matthew has worked at Microsoft on the OS Security team and at Palantir on the government side of the business. He studied mathematics and computer science at RPI and applied mathematics at the University of Washington. His initial foray into computer security was at the early age of nine, hacking his mother’s point-of-sale retail system to adjust the ice cream inventory.
Data architect, computational scientist, and technical leader. Andy is the CTO of Fashion Metric, where he is bringing his experience building smart scalable data systems to the fashion industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.
Piotr Teterwak works on the toolkit development team at Dato. He received a BA in computer science from Dartmouth College, where he conducted work exploring the learning of convolutional deep neural nets with applications in computer vision.
William Theisinger is VP of engineering for YP.com and is responsible for the data collection, processing, warehousing, and reporting of both internal and external data for the company. Prior to YP.com, William founded a consulting company that specialized in data collection, processing, and warehousing for both large (Microsoft, AT&T Interactive) and small (Pricegrabber, Idealab) companies. William started his focus on data engineering while working for Goto.com (an Idealab company) and later for Overture and Yahoo, before returning to Idealab to concentrate on early start-up tech companies.
AnnMarie Thomas is an engineering and entrepreneurship professor at the University of St. Thomas, where she directs the Center for Engineering Education and the Playful Learning Lab. She was the founding executive director of the Maker Education nonprofit, and is the author of Making Makers: Kids, Tools, and the Future of Innovation. AnnMarie has an SB in ocean engineering from MIT, and MS and PhD degrees from Caltech.
Dr. Joy Thomas, chief data scientist at Apigee, joined the company through the acquisition of InsightsOne, which he co-founded in 2011. Dr. Thomas served as chief scientist at Purpleyogi/Stratify from its founding in 1999 and led the development of advanced mining, clustering, and classification algorithms that formed the basis of the Stratify Legal Discovery Service. After Stratify was acquired by Iron Mountain in 2007, he became chief scientist at Iron Mountain Digital, where he led advanced technology development until 2011. From 1990 to 1999, he was a research staff member at the IBM T.J. Watson Research Center, where he... Read More.
Kathleen Ting is currently a technical account manager at Cloudera, where she helps strategic customers deploy and use the Hadoop ecosystem in production. Kathleen has spoken on Hadoop, ZooKeeper, and Sqoop at many big data conferences, including Hadoop World, ApacheCon, and OSCON. She’s contributed to several projects in the open source community, is a committer and PMC Member on Sqoop, and is a coauthor of the Apache Sqoop Cookbook.
Ali Tore has more than 20 years of experience leading enterprise product development. Most recently, he cofounded and served as CPO and VP of analytics at Model N, a leading provider of revenue management solutions that went public in 2013. Previously, Ali was a product and program manager at NetDynamics (acquired by Sun Microsystems), which pioneered the first Java-based application server. He holds an undergraduate degree in industrial engineering and management science from Northwestern University and a graduate degree in management science and engineering from Stanford University.
Steven Totman is Cloudera’s big data subject-matter expert, helping companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Steve works with over 180 customers worldwide and helps across verticals in architectures around data management tools, data models, and ethical data usage. Previously, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents in data integration and... Read More.
Florin Trandafir is the global IT program manager for BI and Analytics at Nokia. Florin has a solid background in analytics, working in various positions in telecom and consultancy business. He started as a business intelligence consultant focusing on Nokia operations business and taking further responsibilities in operational and service management for Analytics and Financial Applications in the network field. Furthermore, in a business role, he was leading the Analytics area (focused on SAP technologies) for a major consultancy company in European Nordic market.
Currently, his main responsibility is in the program management area for the Business Intelligence and... Read More.
Shivakumar Vaithyanathan is an IBM fellow and director, Watson Content Services. Prior to his current position he managed the Machine Learning Systems group at IBM Research, and prior to that he started and built the Search & Analytics Department at IBM Almaden, with research focus ranging from Natural Language Processing to Entity Resolution and Machine Learning. Multiple technologies developed in this department ship with several IBM products, including IBM’s big data efforts. He also initiated and ovesaw the build-out of IBM’s next generation Enterprise search technology that currently powers IBM’s external-facing www.ibm.com. His research is at... Read More.
Bryan Van de Ven is a software engineer at Continuum Analytics. Previously, Bryan worked at the Applied Research Labs, developing software for sonar feature detection and classification systems on US Naval submarine platforms, and Enthought, where he worked on problems in financial risk modeling and fluid mixing simulation. Bryan has also worked on an assortment of iOS projects as an independent consultant. Bryan is a core contributor of Bokeh and contributed to the Chaco visualization library. Bryan holds undergraduate degrees in computer science and mathematics from UT Austin and a master’s degree in physics from UCLA.
Victor Vazquez is a data scientist at Airbnb. His work currently focuses on search and the marketplace; he also has experience with international payments and compliance. He received his BA in economics from MIT.
Krish Venkataraman is Syncsort’s chief financial officer and chief operations officer. He has strategic and tactical expertise and demonstrated success across the industry spectrum – from M&A to corporate finance, investment banking, global corporate strategy, equity research, consulting, trading and exchanges, and payment systems.
Prior to joining Syncsort in March 2014, Krish served as chief financial officer and chief administrative officer of global information technology for NYSE Euronext, where he managed significant portions of capital and expenses for the S&P 500 company with $4 billion in global annual revenues. He helped drive the strategy, governance, financial reporting, and management of a workforce... Read More.
Ashish Verma is a managing director at Deloitte, where he leads the Big Data and IoT Analytics practice, building offerings and accelerators to enhance business processes and effectiveness. Ashish has more than 18 years of management consulting experience helping Fortune 100 companies build solutions that focus on addressing complex business problems related to realizing the value of information assets within an enterprise.
Ekaterina Volkova is a PhD candidate in finance at Cornell University. Among other topics, Ekaterina is interested in using financial data to track likely instances of insider trading.
Chris Wake is the head of business operations for Spire, the satellite-powered data company. He joined Spire as its first non-founder in early 2013, and has worked on many areas of its development from initial customer identification to expansion abroad. Prior to joining Spire, Wake spent time working in venture capital, and assisted a selection of early stage technology companies in scaling. He holds an MBA from the University of Oxford, and his work has been featured in Forbes, The Huffington Post, and Wired, among others.
Dean Wampler is the architect for fast data products at Lightbend, where he specializes in scalable, distributed big data and streaming systems using tools like Spark, Mesos, Akka, Cassandra, and Kafka (the SMACK stack). Dean is the author of Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly Media. He is a contributor to several open source projects and the co-organizer of several conferences around the world and several user groups in Chicago. Dean can be found on Twitter as @deanwampler.
Andrew Wang is a software engineer on the HDFS team at Cloudera. Previously, he was a graduate student in the AMPLab at the University of California, Berkeley, advised by Prof. Ion Stoica, where he worked on research related to in-memory caching and quality-of-service. In his spare time he enjoys going on bike rides, cooking, and playing guitar.
Peter Wang is the cofounder and CTO of Continuum Analytics, where he leads the product engineering team for the Anaconda platform and open source projects including Bokeh and Blaze. Peter has been developing commercial scientific computing and visualization software for over 15 years and has software design and development experience across a broad variety of areas, including 3D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences... Read More.
With more than 15 years’ experience working with designers, engineers, and scientists, Tricia Wang has a particular interest in designing human-centered systems. Tricia advises organizations on integrating big data and what she calls “thick data”—data brought to light using digital-age ethnographic research methods that uncover emotions, stories, and meaning—to improve strategy, policy, products, and services. Organizations she has worked with include P&G, Nokia, GE, Kickstarter, the United Nations, and NASA. Tricia recently finished an expert-in-residency at IDEO, where she extended and amplified IDEO’s impact in design research. When not working with organizations, she spends the other half of... Read More.
Zuo Wang is a principal researcher at Wanda AI Technology Center. For the past few years, he has worked on large-scale distributed deep learning systems including PaddlePaddle, Mxnet, Tensorflow, and lead the effort to apply deep learning on clothes classification, clothing fashion ananlysis, and cross-domain clothing similarity matching. Zuo’s main interest is in deep learning, computer vision, and distributed systems. He used to work on MicroStrategy, a high performance enterprise analytics platform, and Apache Impala, an SQL query engine for data stored in Apache Hadoop.
Daniel Weeks manages the Big Data Compute team at Netflix and is a Parquet committer. Prior to joining Netflix, Daniel focused on research in big data solutions and distributed systems.
Laurent Weichberger is in constant motion as the Big Data Bear and Sr. Technical Instructor for Datameer, Inc. Laurent has been teaching Java since 2000, and started his work in Big Data during 2012 when he worked for Hortonworks, and Cloudera. He was the Director of Training at DataStax, and later became Director of Practice at Couchbase. More recently he spent the better half of 2015 working for Databricks writing and teaching about Spark, and he now is focused full time on promoting the wondrous Datameer software worldwide.
Patrick Wendell is a cofounder of Databricks as well as a founding committer and PMC member of Apache Spark. Patrick has acted as release manager for several Spark releases in addition to maintaining several subsystems of Spark’s core engine. At Databricks, Patrick directs the company’s maintenance and development of Spark.
Patrick holds an MS in computer science from UC Berkeley, where his research focused on low-latency scheduling for large-scale analytics workloads, and a BSE in computer science from Princeton University.
Ben Werther is the Founder and Executive Chairman of Platfora. Ben launched Platfora, and was the founding CEO for four years, with the goal of transforming how ‘citizen data scientists’ in every company make sense and drive action through direct and effortless use of big data. Before founding Platfora, Ben was vice president of products for DataStax, where he shaped the company’s enterprise and Hadoop strategy, and was also head of products at Greenplum through its acquisition by EMC. Ben has a B.S. in Computer Science from Monash University (Australia) and an M.S. in Computer Science from Stanford... Read More.
Alex White co-founded Next Big Sound in 2008, while in his last semester at Northwestern University. The analytics service measures daily music consumption and purchase decisions around the globe. From social to streaming to sales, Next Big Sound combines artist activity with context to help the modern music industry make decisions.
White and his co-founders have been featured in Fast Company (#1 most innovative company in the music industry, 2015), Forbes (30 under 30) in the music category three times, Billboard (10 best music companies), Bloomberg BusinessWeek (25 under 25), Entrepreneur Magazine’s 30 under 30 list in the New York... Read More.
Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He is the author of Hadoop: The Definitive Guide for O’Reilly. Previously he worked as an independent consultant specializing in Hadoop, and before that was co-founder and lead developer at Kizoom, a UK mobile applications startup. Tom has a bachelor’s degree in mathematics from the University of Cambridge, and a master’s degree in history and philosophy of science from the Universities of Leeds, UK, and Florence, Italy.
Thomas Wiecki is the lead data science researcher at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC—a probabilistic programming framework written in Python. Thomas holds a PhD from Brown University. A recognized international speaker, he has given talks at various conferences and meetups across the US, Europe, and Asia.
Edd Wilder-James is a technology analyst, writer, and entrepreneur based in California. He’s helping transform businesses with data as VP of strategy for Silicon Valley Data Science. Formerly Edd Dumbill, Edd was the founding program chair for the O’Reilly Strata conferences and chaired the Open Source Convention for six years. He was also the founding editor of the peer-reviewed journal Big Data. A startup veteran, Edd was the founder and creator of the Expectnation conference-management system and a cofounder of the Pharmalicensing.com online intellectual-property exchange. An advocate and contributor to open source software, Edd... Read More.
Cack Wilhelm is a principal at Scale Venture Partners, where she focuses on investments in early-stage software companies, with an eye toward those helping businesses better utilize data, automate workflows, incorporate AI, and build more resilient software. Looking further ahead, Cack is watching closely as platforms such as virtual reality and augmented reality take shape. Cack cut her teeth selling 11g databases at Oracle and Hadoop clusters at Cloudera in the months before Hadoop reached Version 1.0. Cack has since transferred that operational and go-to-market experience into helping Scale portfolio companies such as Treasure Data, Realm, and CircleCI. Cack was... Read More.
Josh Wills is director of data science at Cloudera, where he works with customers and engineers to develop Hadoop-based solutions across a wide range of industries. Prior to joining Cloudera, Josh was at Google where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. He earned his bachelor’s degree in mathematics from Duke University and his master’s in operations research from the University of Texas-Austin.
Matt Winkler is a principal group program manager in the Data group at Microsoft, where he leads a program management team building services and tools for developers to build intelligent apps using cognitive APIs, the Bot Framework, and the Cortana Intelligence Suite. Matt has worked at Microsoft for the last 10 years as an evangelist and a program manager working on the .NET Framework, Visual Studio, and Azure Web Sites. As part of the Microsoft Big Data team, Matt led a PM team building HDInsight, Microsoft’s managed Hadoop and Spark service and Azure data lake analytics. Matt holds a... Read More.
Doug Wolfe was selected to serve as CIO of the CIA in 2013. In his role, he oversees the agency information technology vision and strategic direction, and is an advisor to the intelligence community CIO. Prior, he served as deputy director for acquisition, technology, and facilities at the Office of the Director of National Intelligence. Wolfe joined the CIA in 1984. He worked for 16 years as a part of the CIA component in the National Reconnaissance Office, and was involved in the launch and operations of multiple satellite systems. He partnered with the aerospace... Read More.
Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City, where she studies algorithmic economics, machine learning, and social computing, with a recent focus on prediction markets and crowdsourcing. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn’s 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, and... Read More.
Ryan Wright serves as manager of data management for Kelly Blue Book. In this role, he oversees the company’s team of engineers and analysts who load and review the quality of potential data sources for Kelly Blue Book’s trusted vehicle value information. In addition, Wright monitors the performance of on-going scheduled external data files into Kelly Blue Book’s enterprise databases.
As a member of Kelly Blue Book’s enterprise data warehouse team, Wright is charged with scaling and deploying business intelligence tool suites across Kelly Blue Book’s enterprise. Wright is directly involved with creating business intelligence solutions for Kelly Blue Book,... Read More.
Yihui Xie is an active R user and the author of several R packages, such as animation, formatR, Rd2roxygen, and knitr, among which the animation package won the 2009 John M. Chambers Statistical Software Award (ASA). He is also the author of the book Dynamic Documents with R and knitr. In 2006 he founded the “Capital of Statistics” (http://cos.name), which has grown into a large online community on statistics in China. He initiated the first Chinese R conference in 2008 and has been organizing R conferences in China since then. During his PhD training at the Iowa State University,... Read More.
Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.
Matt Yanchyshyn leads the AWS Technology Partner Solutions Architecture team at Amazon Web Services. He helps AWS partners architect secure and high-performance applications for the cloud. Matt has worked in the digital media and cloud computing industry for over a decade and has helped hundreds of customers bring compelling AWS-backed products to the market.
Fangjin Yang is a coauthor of the open source Druid project and a cofounder of Imply, a data analytics startup based in San Francisco. Previously, Fangjin held senior engineering positions at Metamarkets and Cisco Systems. Fangjin holds a BASc in electrical engineering and an MASc in computer engineering from the University of Waterloo, Canada.
Chuck Yarbrough is the senior director of solutions marketing and management at Pentaho, a leading big data analytics company that helps organizations engineer big data connections, blend data, and report and visualize all of their data. Chuck is responsible for creating and driving Pentaho solutions that leverage the Pentaho platform, enabling customers to implement big data solutions quicker and achieve greater ROI faster. Chuck has more than 20 years of experience helping organizations use technology to their advantage to ensure they can run, manage, and transform their business through better use of data. A lifelong participant in the data... Read More.
Reza Bosagh Zadeh is on the faculty at Stanford, where he teaches Distributed Algorithms and Optimization and Discrete Mathematics and Algorithms, and is the founder and CEO of Matroid. His work focuses on machine learning, distributed computing, and discrete applied mathematics. As part of his research, Reza built the machine-learning algorithms behind Twitter’s who-to-follow system, the first product to use machine learning at Twitter. Reza is the initial creator of the linear algebra package in Apache Spark and his work has been incorporated into industrial and academic cluster computing environments. Reza serves on the technical advisory board of Microsoft... Read More.
Ben Zaitlen is the technical lead of the Anaconda Cluster product at Continuum Analytics. Ben received undergraduate degrees in mathematics and physics from UC Santa Cruz, and a Master’s degree in physics from Indiana University. Previous to Continuum, he worked at the Biocomplexity Institute developing and supporting a multi-scale modeling environment for developmental biology. Ben is also passionate about electronics and has developed a number of embedded and wearable hardware projects.
Philip Zeyliger is a software engineer at Cloudera. He came to Cloudera from Google, where he worked on scalable storage for user-facing applications. Before that, he worked in finance at D.E. Shaw. Philip holds a bachelor’s degree in mathematics from Harvard University. His interests include systems and databases. He’s a committer on the Apache Avro project.
Owen Zhang is the chief product officer at DataRobot. Owen spent most of his career in the property and casualty insurance industry. Most recently Owen served as vice president of modeling of the newly formed AIG Science team.
After spending several years in IT building transactional systems for commercial insurance, Owen discovered his passion for machine learning and started building insurance underwriting, pricing, and claims models. Owen has a master’s degree in electrical engineering from the University of Toronto and a bachelor’s degree from the University of Science and Technology of China. Owen is currently ranked #1 on the... Read More.
Dr. Yan Zhang is a senior data scientist in Algorithm & Data Science Team of Data Group, Cloud & Enterprise, Microsoft. She builds predictive analytics models and generalizes machine learning solutions on Cloud machine learning platform. Her recent research include cost prediction/fraud claim detection in healthcare domain, predictive maintenance in IoT applications, customer segmentation, and text mining. Dr. Zhang received her Ph.D. in data mining at Computer Science department, University of Vermont. Before joining Microsoft, she was a research faculty at Syracuse University. She is an author of 23 publications, including journal articles, conference papers, and blog posts. Her first... Read More.
Zhe Zhang is an Engineering Manager at LinkedIn where he leads the Core Big Data Services team. The team leverages open source technologies including Hadoop, Spark, TensorFlow, and beyond, to form the storage-compute engine of LinkedIn’s big data platform. Zhe is also a PMC member of Apache Hadoop, and author of HDFS erasure coding, a major feature for Hadoop 3.0. Before LinkedIn, Zhe worked at Cloudera and IBM’s T. J. Watson Research Center. Zhe has over 20 research publications and 5 US patents. While at IBM, he received the Research Accomplishment Award and the Outstanding Technology Achievement... Read More.
Alice Zheng manages the optimization team on Amazon’s Ad Platform. Alice specializes in research and development of machine-learning methods, tools, and applications. Outside of work, she is writing a book, Mastering Feature Engineering. Previously, Alice worked at GraphLab/Dato/Turi, where she led the machine-learning toolkits team and spearheaded user outreach, was a researcher in the Machine Learning group at Microsoft Research, Redmond, and was a postdoc at Carnegie Mellon University. Alice holds PhD and BA degrees in computer science and a BA in mathematics, all from UC Berkeley.
Siwei Zhu is a data scientist at Scribd focused on understanding how users engage with the product. Previously, he has worked as a data scientist at Facebook.
Shivon Zilis is a venture capitalist and founding member of Bloomberg Beta, where she focuses on early-stage data and machine-intelligence investments. Shivon has led 12 investments since launch. One, Newsle, was acquired by LinkedIn; others include Context Relevant, Alation, and InfluxDB. She recently released a report on the current state of machine intelligence that analyzed thousands of companies and put forward predictions on where the industry is headed. Shivon’s previous experience includes building startups at Bloomberg Ventures, the firm’s incubator, and developing cloud core banking solutions for microfinance institutions at IBM. She is a C100 charter member and was... Read More.
Vice President of Tactical Engineering at Novus Partners
Monte Zweben is the CEO and co-founder of Splice Machine, provider of the Hadoop RDBMS. A SQL-on-Hadoop solution, Splice Machine has helped many companies scale real-time applications using commodity hardware without application rewrites. A technology industry veteran, Monte’s early career was spent with the NASA Ames Research Center as the deputy chief of the artificial intelligence branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. Monte then founded and was the chairman and CEO of Red Pepper Software, a leading supply chain optimization company. In 1996 it... Read More.
Margit Zwemer is the founder of data visualization company LiquidLandscape. She was formerly a data scientist at Kaggle, and algorithmic trader at Societe Generale.
Dave Zwieback has been working with large-scale mission-critical infrastructure and teams for almost two decades. Dave is the VP of engineering at Next Big Sound (acquired by Pandora Media, Inc.) and CTO of Lotus Outreach. He has previously worked with the adaptive learning startup Knewton, the quantitative investment management firm D.E. Shaw & Co., and the financial services behemoth Morgan Stanley. He also ran an infrastructure architecture consultancy for seven years. Dave is the author of Beyond Blame: Learning from Failure and Success from O’Reilly Media. He blogs at Mindweather.com.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.