Jay Alammar
How can we empower machine learning models with powerful software engineering techniques like unit testing?
Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model’s capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper ‘Beyond Accuracy: Behavioral Testing of NLP Models with CheckList’, which was selected as the ACL 2020 Best Paper.
Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)
——
Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
https://www.aclweb.org/anthology/2020.acl-main.442/
Code:
https://github.com/marcotcr/checklist
——
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: http://eepurl.com/gl0BHL
More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
https://youtu.be/ioGry-89gqE
Explainable AI Cheat Sheet – Five Key Categories
https://www.youtube.com/watch?v=Yg3q5x7yDeM
The Narrated Transformer Language Model
https://youtu.be/-QH8fRhqFHM
Jay’s Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4
How GPT-3 Works – Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ
This is a very interesting approach that can be extended to vision models as well!
Great Video, But using a small test set for QA should be done carefully as with time model can over-fit on those datasets.
مميز كالعادة. بالتوفيق يا رجل
Really cool video Jay. Have you come across any equivalent approaches for tabular data?
Great master
This is a great topic! Thanks for presenting it so nicely! Well spoken and visualized! 💪