Clustered data is all around us. The most common example is longitudinal clustering, where each individual instance of a phenomena you wish to model has multiple associated measurements (e.g., modeling math test scores as a function of sleep factors when you have multiple measurements per student). Another common example is clustering due to a categorical variable (e.g., clusters representing the specific math teacher of a group of students). Clustering can also be hierarchical (e.g., a student cluster contained within a teacher cluster, which is itself contained within a school cluster). When modeling clustered data, you must account for any idiosyncrasies and nonnegligible random effects by cluster.
The best way to attack this kind of data? Mixed effects models. Inspired by the models we have been building for clients, Manifold has developed mixed effects random forests (MERF)—an open source implementation package in Python.
Sourav Dey explains how the MERF model marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. He also walks you through example use cases and demonstrates MERF performance on synthetic and real data.
Sourav Dey is CTO at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Sourav leads the engineering team focusing on work across client projects, developing platform technologies to make Manifold ML engineers more efficient, and communicating to business stakeholders. Prior to Manifold, Sourav led teams building data products across the technology stack, from smart thermostats and security cams at Google-Nest to wireless communication at Qualcomm. Sourav’s career has always been at the intersection of math and computer science — a PhD from MIT in signal processing and bachelors degrees in Math and CS from MIT.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Slides can be downloaded here: https://www.manifold.ai/2019strataSF
Hi,
Thanks for the great talk. Can you please share your slides?