Weakly Supervised Machine Learning at Industrial Scale

When: 
Thursday, June 20, 2019 - 7:00pm
Room: 
32-G449 (Kiva)
Lecturer(s): 
Stephen Bach, Brown University

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. Weak supervision, using less expensive but noisier sources of supervision than hand-labeled data, has the potential to relax this bottleneck but introduces new challenges around managing these sources. In this talk, I'll describe a new system, Snorkel DryBell, in production at Google for weakly supervised machine learning at industrial scale. Snorkel DryBell builds on the Snorkel framework (snorkel.stanford.edu), extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and scales to millions of training examples.

Stephen Bach is an assistant professor in the computer science department at Brown University. Previously, he was a visiting scholar at Google, and a postdoctoral scholar in the computer science department at Stanford University advised by Christopher Re. He received his Ph.D. in computer science from the University of Maryland, where he was advised by Lise Getoor. His research focuses on statistical machine learning methods that exploit high-level knowledge like rules and programs. Stephen's thesis on probabilistic soft logic was recognized with the University of Maryland's Larry S. Davis Doctoral Dissertation Award. His work on the Snorkel project for weakly supervised machine learning was recognized with a Best of VLDB 2018 selection.

This joint meeting of the Boston Chapter of the IEEE Computer Society and GBC/ACM will be held in MIT Room 32-G449 (the Kiva conference room on the 4th floor of the Stata Center, building 32 on MIT maps) . You can see it on this map of the MIT campus.

This talk will be webcast on the MIT CSAIL Youtube channel http://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live beginning at 7pm.