IoT devices from cameras to advanced sensors are now emitting tremendous amounts of data that streams in to analytics environments. Healthcare providers and payers are collecting vast amounts of medical images, clinical, claims, and operational data. These are common big data cases where both structured data (e.g., claim processed) and unstructured data (e.g., storage of large images) are accumulated. However, the massiveness and messiness of data environments today makes traditional ETL processes difficult.
David Huh details an end-to-end construction of a data pipeline to output machine learning that meets the stringent demands for efficiency in today’s modern big data context and covers a number of big data and machine learning cases, including one where a team first spent two weeks building a data pipeline and machine learning model using Python and then built the same pipeline and model in two hours using Pentaho. David shares strategies to architect a data pipeline in Pentaho Data Integration (PDI) to refine raw data for analysis; the pipeline employs a reusable process of extracting metadata from images and then passing that dynamic data into a pipeline through metadata injection—a process similar in concept to creating a template that can receive dynamic parameters in the ETL processes. Using the same PDI environment as an example, David explores cases of plug-in machine intelligence that extends machine learning capabilities, allowing seamless training, orchestrating, and outputting machine learning models, including one that predicts complications from surgery.
This session is sponsored by Hitachi Vantara.
Dave Huh is a data scientist in the Professional Services Group at Hitachi Vantara, where he works with healthcare and insurance companies to provide insights with advanced analytics. Dave is passionate about making analytics technologies accessible to the broader public.
.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com