For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at firstname.lastname@example.org
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at email@example.com
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
Time series data is becoming more prevalent across a wider swath of industries due to the ongoing explosion of available data 1. Time Series sensors are being ubiquitously integrated in places like cell phones, environmental sensors, and the smart grid 4. It’s also been shown that shapes in images can be decomposed into time series data which allows the shapes to achieve rotation and scale invariance allowing for easier comparison. We’re seeing the cost to sequence the human genome continue to decrease rapidly, shifting pressure to the storage and processing technologies for these genomes which can also be processed with time series techniques.
Although indexing techniques in multi-dimensional index structures combined with today’s RDBMS can handle time series data, as we scale out this type of data these systems strain to scale with the high insertion rates and real time query requirements. In response to this strain we’re seeing many companies employ HBase to handle the throughput and scale of rising data loads. Groups are also looking at techniques such as Keogh’s SAX technique 2 in order to search for patterns time series data (ex: openPDC and Hadoop). A later evolution of the SAX technique called iSAX involves indexing time series data for low latency queries. In this talk we introduce “Lumberyard” which is a scalable indexing and low latency fuzzy pattern searching time series data. Lumberyard is available currently 3 as an ASF 2.0 Licensed project on github and uses HBase and iSAX to achieve both scale and index/search respectively.
In this talk we’ll take a look at some of the indexing at scale issues that Lumberyard solves. We’ll look at some of the design issues involved in moving the iSAX index from a single process in memory data structure to a HBase-persisted data structure. Given that Lumberyard is experimental, we’ll also look at the current performance numbers and where the code stands today. This talk should be approachable for the novice to get ideas about the variety of places that hold time series data around them and for the advanced algorithm enthusiast who enjoys a design talk.
Master’s Thesis: self-organizing mesh networks
Published in IAAI-09: TinyTermite: A Secure Routing Algorithm
Conceived, built, and led Hadoop integration for the openPDC project at TVA (Smartgrid stuff). Led small team which designed classification techniques for timeseries and Map Reduce. Open source work at http://openpdc.codeplex.com
Now: Solutions Architect at Cloudera
Comments on this page are now closed.