Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

SPARKTA: A real-time analytics platform based on Apache Spark

Oscar Méndez (Stratio), David Morales (STRATIO)
13:45–14:25 Thursday, 7/05/2015
Hadoop & Beyond
Location: Buckingham Room - Palace Suite
Average rating: ***..
(3.09, 11 ratings)

Prerequisite Knowledge

Technical background in big data technologies, specially Apache Spark

Description

Nowadays, all kinds of businesses need to deal with real-time information in order to successfully deliver their core services. From social networks to security management, from real users to virtual processes, from high-level dashboards to IT monitoring… more and more sectors, consumers, and business processes require quick answers updated in real time.

Some initiatives have tried to solve this problem, but until now most of them were complex or obsolete while others were not open source. For that reason Stratio created SPARKTA: an open source and full-featured platform for real-time analytics, based on Apache Spark.

With absolutely no coding, you can simultaneously deploy several user-defined aggregation workflows, where you can decide which rollups and dimensions will be applied to the event stream, in real-time. Each workflow has its own aggregation policy where you can select which input (Kafka, Flume, Twitter, etc.), output (MongoDB, Cassandra, etc.), event parser functions (decoding, enrichment, normalization), and aggregation functions (time-based, geo-range, hierarchical counting, sum, max, min, count, sumsquares, etc.) will be executed by SPARKTA.

Moreover, the query services layer allows you to access the data easily, e.g., time-range queries with automatic selection of the best rollup, or ad-hoc aggregation for this subset of data.

SPARKTA was also designed to be highly configurable and extensible, and since it is pure Spark, it will also benefit also from the entire Spark ecosystem.

Thanks to this technology, real-time analysis is readily available for every use case: SPARKTA is easy to deploy, but also fast, scalable, and fault-tolerant.

Photo of Oscar Méndez

Oscar Méndez

Stratio

Oscar Méndez is co-founder and CEO of Paradigma Tecnólogico and Stratio. Paradigma is an software solutions company with clients, mostly enterprise and large Internet companies, in Spain. Stratio uses the best of breed of Big Data technologies to cater for clients world-wide.

David Morales

STRATIO

Working as a big data architect at Stratio, David Morales has been involved in the inception and evolution of some modules included in the Stratio platform, especially those related to data visualization, real-time, streaming, and complex event proccesing.

Comments on this page are now closed.

Comments

David Morales
11/05/2015 14:49 BST

Here you have:

1) source code

https://github.com/Stratio/sparkta

2) documentation

http://docs.stratio.com/modules/sparkta/development/

3) Slides

http://www.slideshare.net/Stratio/strata-sparkta

Enjoy¡

David Morales
4/05/2015 20:09 BST

We are working to release the project asap, maybe tomorrow.

The solution will be fully open sourced at our github account (github.com/stratio), so stay tuned.

Thanks

Yann Barraud
4/05/2015 14:35 BST

Hi,

any pointer on the product, so that we can check before attending ?

Cheers,
Yann