Back To Schedule
Thursday, June 20 • 11:00am - 11:25am
FEATURED SPEAKER: Re-Training ML Models without Data Auditing Is like Skydiving without Parachutes

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Re-training deployed ML models is required for adapting to changing user patterns. But real-world data pipelines for re-training data are messy — from unexpected source changes/null values during ingestion to referential integrity problems across databases. The end result is is a mix of low and high quality data that is detrimental to model accuracy leading to unexpected bias as well as significant deviation errors. Additionally, debugging of such issues becomes a ping-pong blame game between ML and Data Engineering teams. This talk describes how we address the issue with our framework called ML Contracts Escrow. A contract defines how model training needs to be handled depending on the quality of data. There are two building blocks of the framework: a) Automatically profiling the data pipeline tracking operational properties, data quality, and configuration change tracking across all components of the pipeline including the data sources; b) Enforcing the contract that either schedules or alerts or defers the model training. The Data Engineering team is responsible for the profiling of the pipeline. The ML team focusses on the other half of the contact that specifies how different data profiling scenarios need to be handled. The framework supports a range of contract actions such as handling anomaly in job profiling to trigger a data circuit breaker or canceling re-training or change online training to offline manually triggered policy, etc. The talk covers details of the framework in the context of handling real-world data pipeline issues.

avatar for Sandeep Uttamchandani

Sandeep Uttamchandani

Chief Data Architect, Intuit
Sandeep Uttamchandani is the Chief Data Architect & Head of Data Platform Engineering at Intuit. In his role, he owns all the aspects related to the Data for Analytics, ML, and the Product databases used by 4 million+ small businesses for financial accounting, payroll, and billions... Read More →

Thursday June 20, 2019 11:00am - 11:25am EDT
Main Stage