Time series analysis has become a central requirement for data science across many data disciplines, including the IoT, finance and econometrics, advertising, public policy, and systems operations. The need to understand and semantically manipulate time-ordered events is notably missing from many databases and analytics tools.
Two Sigma has extended the pandas and PySpark analytics stack to provide integrated support for transformations and analytics that understand time as a first-class construct. Leif Walsh offers an overview of Flint, Two Sigma’s open source time series extension to Spark, explains how it fits in with the Spark programming model, and lays out the roadmap for the future of pandas, PySpark, and Flint.
Leif Walsh is an engineering manager at Two Sigma, where he works on the company’s next-generation data analysis platform for distributed time series research and simulation. Leif’s background is in high-performance storage. Previously, he built fractal trees at Tokutek. He loves the Oxford comma, cooking, and playing with cats.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com