GPT 3

Behavioral Testing of ML Models (Unit tests for machine learning)



Jay Alammar

How can we empower machine learning models with powerful software engineering techniques like unit testing?

Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model’s capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper ‘Beyond Accuracy: Behavioral Testing of NLP Models with CheckList’, which was selected as the ACL 2020 Best Paper.

Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)

——

Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
https://www.aclweb.org/anthology/2020.acl-main.442/

Code:
https://github.com/marcotcr/checklist

——

Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: http://eepurl.com/gl0BHL

More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
https://youtu.be/ioGry-89gqE

Explainable AI Cheat Sheet – Five Key Categories
https://www.youtube.com/watch?v=Yg3q5x7yDeM

The Narrated Transformer Language Model
https://youtu.be/-QH8fRhqFHM

Jay’s Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4

How GPT-3 Works – Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ