Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Applications of Mixed Effect Random Forests

Sourav Dey (Manifold)
11:00am11:40am Thursday, March 28, 2019

Who is this presentation for?

Practicing data scientists



Prerequisite knowledge

Audience members should have beginner-intermediate knowledge of data science and machine learning vocabulary and concepts.

What you'll learn

Attendees will learn: * Use cases of mixed effects random forests * Why MERF is more effective for clustered data than vanilla random forest * The history of mixed effect modeling * How to use our open source Python package


Clustered data is all around us. The most common example we see is longitudinal clustering, where each individual instance of a phenomena you wish to model has multiple associated measurements. For example, say we want to model math test scores as a function of sleep factors, but we have multiple measurements per student. Another common example is clustering due to a categorical variable—clusters representing the specific math teacher of a group of students. Thus clustering can also be hierarchical: there is a student cluster contained within a teacher cluster (which is yet contained within a school cluster). When modeling clustered data, we want to account for any idiosyncrasies and non-negligible random effects by cluster.

The best way to attack this kind of data? Mixed effect models. Inspired by the models we have been building for clients, we at Manifold have developed an open source implementation package in Python: Mixed Effects Random Forests (MERF).

In this talk, Sourav will explain how the MERF model marries the world of classical mixed effect modeling with modern machine learning algorithms, and how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. He will also walk through example use cases, and demonstrate MERF performance on synthetic and real data.

Photo of Sourav Dey

Sourav Dey


Sourav Dey is CTO at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google/Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees in Electrical Engineering and Computer Science from the Massachusetts Insitute of Technology (MIT).

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)