R is leading in the list of most popular data science languages, with 49% share of the overall voting according to a recent survey. However, the language is by nature limited in scalability and parallelism, and thus, restrained for wide deployment in enterprise-grade applications. Contemporary big data solutions are migrating from on-premises to the cloud, owing to apparent benefits of flexibility in scaling up/out resources, computational efficiency, and cost effectiveness. To better leverage the advantages of cloud computing and smooth the process of embracing the cloud, the community needs R packages as well as associated paradigms that allow R-user data scientists and data engineers to operationalize enterprise-grade pipeline for analytical solution development.
Le Zhang and Graham Williams demonstrate how to use R for architecting enterprise-grade data analytic solutions and developing artificial intelligence applications on Azure cloud. Le and Graham explore a real-world scenario about flight delay prediction to illustrate how R is used to elastically deploy, manage, and deallocate a heterogeneous set of cloud instances, such as virtual machine, Spark clusters, and storage accounts, and distribute on-demand parallel and scalable data analytics with the cutting-edge machine learning technologies in the cloud. The R packages introduced remarkably simplify the management and use of cloud resources for various big data tasks and therefore accelerate the pace of prototyping, experimenting, and productizing data-driven solutions for enterprise use.
Le Zhang is a data scientist with Microsoft Artificial Intelligence and Research, where he applies cutting-edge machine learning and artificial intelligence technology to accelerate digital transformation for enterprises and startups. He has helped numerous corporations to develop and build enterprise-grade scalable advanced data analytical systems for a broad spectrum of application scenarios like manufacturing, predictive maintenance, financial services, ecommerce, and human resource analytics. Le specializes in cloud computing, big data technologies (Hadoop, Spark, Hive, etc.), and artificial intelligence development tools (CNTK, Keras, etc.) and is proficient in R and Python. Previously, Le worked at a semiconductor company developing an intelligent wafer defect recognition system using machine learning technology. Le enjoys sharing knowledge and learning from people and is a frequent speaker at industrial and academic conferences and community meetups. He holds a PhD in computer engineering.
Graham Williams is director of data science at Microsoft, where he is responsible for the Asia-Pacific region, an adjunct professor with the University of Canberra and the Australian National University, and an international visiting professor with the Chinese Academy of Sciences. Graham has 30 years’ experience as a data scientist leading research and deployments in artificial intelligence, machine learning, data mining, and analytics. Previously, he was principal data scientist with the Australian Taxation Office and lead data scientist with the Australian Government’s Centre of Excellence in Analytics, where he assisted numerous government departments and Australian industry in creating and building data science capabilities. He has also worked on many projects focused on delivering solutions and applications driven by data using machine learning and artificial intelligence technologies. Graham has authored a number of books introducing data mining and machine learning using the R statistical software.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com