There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world—petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of companies operate or rent clusters in the range of a few dozen nodes and analyze relatively small data sets of up to a few terabytes. Targeting these users fundamentally changes the way we should build analytics systems.
In this talk, Tim will present Tupleware, a new system developed at Brown University specifically aimed at the challenges faced by the typical user. The main difference of Tupleware to other frameworks is, that it automatically compiles analytical workflows into highly efficient distributed programs instead of interpreting the workflows at run-time. Our initial experiments show, that Tupleware is 30x – 300x faster than Spark and up to 6000x faster than Hadoop for common machine learning algorithms. Furthermore, Tupleware supports a wide range of programming languages (e.g., Python or Julia) without imposing any performance penalties and will soon be available as open-source (http://tupleware.cs.brown.edu/).
Tim Kraska is an Assistant Professor in the Computer Science department at Brown University. Currently, his research focuses on Big Data management for machine-learning and hybrid human/machine database systems. Before joining Brown, Tim Kraska spent 2 years as a PostDoc in the AMPLab at UC Berkeley after receiving his PhD from ETH Zurich, where he worked on transaction management and stream processing. He was awarded a Swiss National Science Foundation Prospective Researcher Fellowship (2010), a DAAD Scholarship (2006), a University of Sydney Master of Information Technology Scholarship for outstanding achievement (2005), the University of Sydney Siemens Prize (2005), a VLDB best demo award (2011) and an ICDE best paper award (2013).