Containers offer significant value to businesses, including increased developer agility and the ability to move applications between on-premises servers and cloud instances and across data centers. Organizations have embarked on the journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services, such as frontend UIs and simple, content-centric experiences, are often great candidates for stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload. Stateful applications, on the other hand, are services that require backing storage, and keeping state is critical to running the service. Hadoop, Spark, and to a lesser extent, NoSQL platforms such as Cassandra, MongoDB, Postgres, and MySQL are great examples. They require some form of persistent storage that will survive service restarts.
Anant Chintamaneni and Nanda Vijaydev highlight the key gaps and considerations based on a real-world implementation of big data cluster orchestration on Kubernetes. There are several attributes of stateful, multiservice big data applications that need to be considered. Hadoop and Spark are not exactly monolithic applications but are close with their multiple, cooperating services with dynamic APIs. Service startup/teardown ordering requirements with different sets of services running on different hosts (nodes) result in tricky service interdependencies that impact scalability. There is also lots of configuration (aka state), such as host name, IP address, ports and service-specific settings, that needs to be maintained to run fault-tolerant clusters. Anant and Nanda detail technical configurations and customizations required to run Hadoop distributions on Kubernetes and explore the gaps when comparing Hadoop on Kubernetes to the standard deployment of Hadoop on physical servers or virtual machines.
Topics include:
Anant Chintamaneni is vice president of products at BlueData, where he is responsible for product management and focuses on helping enterprises deploy big data technologies such as Hadoop and Spark. Anant has more than 15 years’ experience in business intelligence, advanced analytics, and big data infrastructure. Previously, Anant led the product management team for Pivotal’s big data suite.
Nanda Vijaydev is the lead data scientist and head of solutions at BlueData (now HPE), where she leverages technologies like TensorFlow, H2O, and Spark to build solutions for enterprise machine learning and deep learning use cases. Nanda has more than 10 years of experience in data science and data management. Previously, she worked on data science projects in multiple industries as a principal solutions architect at Silicon Valley Data Science and served as director of solutions engineering at Karmasphere.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com