Behavioral Testing of ML Models (Unit tests for machine learning)

Jay Alammar

How can we empower machine learning models with powerful software engineering techniques like unit testing?

Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model’s capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper ‘Beyond Accuracy: Behavioral Testing of NLP Models with CheckList’, which was selected as the ACL 2020 Best Paper.

Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)

——

Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
https://www.aclweb.org/anthology/2020.acl-main.442/

Code:
https://github.com/marcotcr/checklist

——

Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: http://eepurl.com/gl0BHL

More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
https://youtu.be/ioGry-89gqE

Explainable AI Cheat Sheet – Five Key Categories
https://www.youtube.com/watch?v=Yg3q5x7yDeM

The Narrated Transformer Language Model
https://youtu.be/-QH8fRhqFHM

Jay’s Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4

How GPT-3 Works – Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ

<iframe></p> <p><a href="https://www.youtube.com/watch?v=Cse-3MM7mso">Source</a></p> <div class="be1e40beae42d993bafb8643f4ddde8b" data-index="3" style="float: none; margin:10px 0 10px 0; text-align:center;"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-9244112244416304" data-ad-slot="4549240677"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div style="font-size: 0px; height: 0px; line-height: 0px; margin: 0; padding: 0; clear: both;"></div> </div> </article> <div class="clearfix"></div> <ul class="default-theme-post-navigation"> <li class="theme-nav-previous"><a href="https://theengineeringofconsciousexperience.com/alan-watts-changing-your-consciousness/" rel="prev"><span class="meta-nav">←</span> Alan Watts ~ Changing Your Consciousness</a></li> <li class="theme-nav-next"><a href="https://theengineeringofconsciousexperience.com/mothers-plea-after-daughter-dies-swallowing-battery/" rel="next">Mother’s plea after daughter dies swallowing battery <span class="meta-nav">→</span></a></li> </ul> <div class="clearfix"></div> <h3 class='comment-reply-title'>Similar Posts</h3> <div class="mb-related-posts mb-simple-featured-posts mb-simple-featured-posts-wrap row"> <article class="mb-featured-article col-md-4 px-lg-3 post"> <a class="post-thumbnail" href="https://theengineeringofconsciousexperience.com/what-gpt-3-can-do-and-what-it-means-for-coding-4-use-cases/" aria-hidden="true" tabindex="-1"> <img width="501" height="282" src="https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault.jpg" class="attachment-magazinebook-featured-image-medium size-magazinebook-featured-image-medium wp-post-image" alt="" decoding="async" loading="lazy" srcset="https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault.jpg 1280w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault-300x169.jpg 300w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault-1024x576.jpg 1024w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault-768x432.jpg 768w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2021/05/1621483322_maxresdefault-520x293.jpg 520w" sizes="(max-width: 501px) 100vw, 501px" /> </a> <span class="cat-links"><a href="https://theengineeringofconsciousexperience.com/category/gpt-3/" rel="category tag">GPT 3</a></span> <header class="entry-header"> <h3 class="entry-title"><a href="https://theengineeringofconsciousexperience.com/what-gpt-3-can-do-and-what-it-means-for-coding-4-use-cases/" rel="bookmark">What GPT-3 Can Do and What it Means for Coding (4 Use Cases)</a></h3> <div class="entry-meta"> <span class="posted-on"><i class="far fa-calendar-alt"></i><a href="https://theengineeringofconsciousexperience.com/what-gpt-3-can-do-and-what-it-means-for-coding-4-use-cases/" rel="bookmark"><time class="entry-date published updated" datetime="2021-05-07T10:35:50-07:00">May 7, 2021</time></a></span><span class="byline"><i class="far fa-user-circle"></i><span class="author vcard"><a class="url fn n" href="https://theengineeringofconsciousexperience.com/author/e6b80cfa1e7707ff4812b1402db25270/">Quantilus Innovation</a></span></span> </div> </header> </article> <article class="mb-featured-article col-md-4 px-lg-3 post"> <a class="post-thumbnail" href="https://theengineeringofconsciousexperience.com/gpt-3e-mecbur-degiliz-kisa-versiyon/" aria-hidden="true" tabindex="-1"> <img width="501" height="282" src="https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault.jpg" class="attachment-magazinebook-featured-image-medium size-magazinebook-featured-image-medium wp-post-image" alt="" decoding="async" loading="lazy" srcset="https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault.jpg 1280w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault-300x169.jpg 300w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault-1024x576.jpg 1024w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault-768x432.jpg 768w, https://theengineeringofconsciousexperience.com/wp-content/uploads/2020/10/1603137381_maxresdefault-520x293.jpg 520w" sizes="(max-width: 501px) 100vw, 501px" /> </a> <span class="cat-links"><a href="https://theengineeringofconsciousexperience.com/category/gpt-3/" rel="category tag">GPT 3</a></span> <header class="entry-header"> <h3 class="entry-title"><a href="https://theengineeringofconsciousexperience.com/gpt-3e-mecbur-degiliz-kisa-versiyon/" rel="bookmark">GPT 3'E MECBUR DEĞİLİZ [KISA VERSİYON]</a></h3> <div class="entry-meta"> <span class="posted-on"><i class="far fa-calendar-alt"></i><a href="https://theengineeringofconsciousexperience.com/gpt-3e-mecbur-degiliz-kisa-versiyon/" rel="bookmark"><time class="entry-date published updated" datetime="2020-09-26T12:22:29-07:00">September 26, 2020</time></a></span><span class="byline"><i class="far fa-user-circle"></i><span class="author vcard"><a class="url fn n" href="https://theengineeringofconsciousexperience.com/author/c96478dd92b6ecab7eedcd31c82f85a5/">çağatay odabaşı</a></span></span> </div> </header> </article> <article class="mb-featured-article col-md-4 px-lg-3 post"> <a class="post-thumbnail" href="https://theengineeringofconsciousexperience.com/5-claves-que-haran-a-gpt-4-mucho-mas-potente/" aria-hidden="true" tabindex="-1"> <img width="501" height="300" src="https://theengineeringofconsciousexperience.com/wp-content/uploads/2023/11/1699472908_maxresdefault-501x300.jpg" class="attachment-magazinebook-featured-image-medium size-magazinebook-featured-image-medium wp-post-image" alt="" decoding="async" loading="lazy" /> </a> <span class="cat-links"><a href="https://theengineeringofconsciousexperience.com/category/gpt-3/" rel="category tag">GPT 3</a></span> <header class="entry-header"> <h3 class="entry-title"><a href="https://theengineeringofconsciousexperience.com/5-claves-que-haran-a-gpt-4-mucho-mas-potente/" rel="bookmark">5 CLAVES que harán a GPT-4 mucho MÁS POTENTE</a></h3> <div class="entry-meta"> <span class="posted-on"><i class="far fa-calendar-alt"></i><a href="https://theengineeringofconsciousexperience.com/5-claves-que-haran-a-gpt-4-mucho-mas-potente/" rel="bookmark"><time class="entry-date published updated" datetime="2023-04-09T08:24:25-07:00">April 9, 2023</time></a></span><span class="byline"><i class="far fa-user-circle"></i><span class="author vcard"><a class="url fn n" href="https://theengineeringofconsciousexperience.com/author/14773d538e7de4174ceeee2b87153210/">Dot CSV</a></span></span> </div> </header> </article> </div> <div id="comments" class="comments-area"> <h5 class="comments-title"> 6 thoughts on “<span>Behavioral Testing of ML Models (Unit tests for machine learning)</span>” </h5> <ol class="comment-list"> <li id="comment-243929" class="comment even thread-even depth-1"> <article id="div-comment-243929" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UCJocIf-vYOkAIVYlmkbT8Yw" class="url" rel="ugc external nofollow">Raman Dutt</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243929"><time datetime="2021-06-28T04:33:22-07:00">June 28, 2021 at 4:33 am</time></a> </div> </footer> <div class="comment-content"> <p>This is a very interesting approach that can be extended to vision models as well!</p> </div> </article> </li> <li id="comment-243928" class="comment odd alt thread-odd thread-alt depth-1"> <article id="div-comment-243928" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UCHozF-pyPanaDThmnnss4vA" class="url" rel="ugc external nofollow">Manav Madan</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243928"><time datetime="2021-06-28T05:18:16-07:00">June 28, 2021 at 5:18 am</time></a> </div> </footer> <div class="comment-content"> <p>Great Video, But using a small test set for QA should be done carefully as with time model can over-fit on those datasets.</p> </div> </article> </li> <li id="comment-243927" class="comment even thread-even depth-1"> <article id="div-comment-243927" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UC9M7dZqrA_inICuq3dSwoQg" class="url" rel="ugc external nofollow">Abu Bakr Soliman</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243927"><time datetime="2021-06-28T17:14:00-07:00">June 28, 2021 at 5:14 pm</time></a> </div> </footer> <div class="comment-content"> <p>مميز كالعادة. بالتوفيق يا رجل</p> </div> </article> </li> <li id="comment-243926" class="comment odd alt thread-odd thread-alt depth-1"> <article id="div-comment-243926" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UCMUvxuIYBrdIh99uwAr6JIg" class="url" rel="ugc external nofollow">Jason Costello</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243926"><time datetime="2021-06-29T03:46:02-07:00">June 29, 2021 at 3:46 am</time></a> </div> </footer> <div class="comment-content"> <p>Really cool video Jay. Have you come across any equivalent approaches for tabular data?</p> </div> </article> </li> <li id="comment-243925" class="comment even thread-even depth-1"> <article id="div-comment-243925" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UC237s2SsQxbSxhePsjSf7ow" class="url" rel="ugc external nofollow">Vaibhav Patil</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243925"><time datetime="2021-06-29T10:45:52-07:00">June 29, 2021 at 10:45 am</time></a> </div> </footer> <div class="comment-content"> <p>Great master</p> </div> </article> </li> <li id="comment-243924" class="comment odd alt thread-odd thread-alt depth-1"> <article id="div-comment-243924" class="comment-body"> <footer class="comment-meta"> <div class="comment-author vcard"> <b class="fn"><a href="https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA" class="url" rel="ugc external nofollow">AI Coffee Break with Letitia</a></b> <span class="says">says:</span> </div> <div class="comment-metadata"> <a href="https://theengineeringofconsciousexperience.com/behavioral-testing-of-ml-models-unit-tests-for-machine-learning/#comment-243924"><time datetime="2021-06-30T01:38:19-07:00">June 30, 2021 at 1:38 am</time></a> </div> </footer> <div class="comment-content"> <p>This is a great topic! Thanks for presenting it so nicely! Well spoken and visualized! 💪</p> </div> </article> </li> </ol> <p class="no-comments">Comments are closed.</p> </div> </main> </div> <div class="col-md-3 px-lg-3 "> </div> </div> </div> </div> <footer id="colophon" class="site-footer"> <div class="container"> <div class="row"> <div class="col-md-12 text-center"> <div class="site-info"> <span> Powered By: <a href="https://wordpress.org/" target="_blank">WordPress</a> </span> <span class="sep"> | </span> <span> Theme: <a href="https://odiethemes.com/themes/magazinebook/" target="_blank">MagazineBook</a> By OdieThemes </span> </div> </div> </div> </div> </footer> </div> <script>(function(){var advanced_ads_ga_UID="UA-88163215-1",advanced_ads_ga_anonymIP=!!1;window.advanced_ads_check_adblocker=function(t){var n=[],e=null;function a(t){var n=window.requestAnimationFrame||window.mozRequestAnimationFrame||window.webkitRequestAnimationFrame||function(t){return setTimeout(t,16)};n.call(window,t)}return a((function(){var t=document.createElement("div");t.innerHTML=" ",t.setAttribute("class","ad_unit ad-unit text-ad text_ad pub_300x250"),t.setAttribute("style","width: 1px !important; height: 1px !important; position: absolute !important; left: 0px !important; top: 0px !important; overflow: hidden !important;"),document.body.appendChild(t),a((function(){var a,o,i=null===(a=(o=window).getComputedStyle)||void 0===a?void 0:a.call(o,t),d=null==i?void 0:i.getPropertyValue("-moz-binding");e=i&&"none"===i.getPropertyValue("display")||"string"==typeof d&&-1!==d.indexOf("about:");for(var c=0,r=n.length;c<r;c++)n[c](e);n=[]}))})),function(t){"undefined"==typeof advanced_ads_adblocker_test&&(e=!0),null!==e?t(e):n.push(t)}}(),(()=>{function t(t){this.UID=t,this.analyticsObject="function"==typeof gtag;var n=this;return this.count=function(){gtag("event","AdBlock",{event_category:"Advanced Ads",event_label:"Yes",non_interaction:!0,send_to:n.UID})},function(){if(!n.analyticsObject){var e=document.createElement("script");e.src="https://www.googletagmanager.com/gtag/js?id="+t,e.async=!0,document.body.appendChild(e),window.dataLayer=window.dataLayer||[],window.gtag=function(){dataLayer.push(arguments)},n.analyticsObject=!0,gtag("js",new Date)}var a={send_page_view:!1,transport_type:"beacon"};window.advanced_ads_ga_anonymIP&&(a.anonymize_ip=!0),gtag("config",t,a)}(),this}advanced_ads_check_adblocker((function(n){n&&new t(advanced_ads_ga_UID).count()}))})();})();</script><div style="clear:both;width:100%;text-align:center; font-size:11px; "><a target="_blank" title="WP2Social Auto Publish" href="https://xyzscripts.com/wordpress-plugins/facebook-auto-publish/compare" >WP2Social Auto Publish</a> Powered By : <a target="_blank" title="PHP Scripts & Programs" href="http://www.xyzscripts.com" >XYZScripts.com</a></div><script type="text/javascript" src="https://theengineeringofconsciousexperience.com/wp-content/themes/magazinebook/js/navigation.js?ver=1.0.9" id="magazinebook-navigation-js"></script> <script type="text/javascript" src="https://theengineeringofconsciousexperience.com/wp-content/themes/magazinebook/js/skip-link-focus-fix.js?ver=1.0.9" id="magazinebook-skip-link-focus-fix-js"></script> <script type="text/javascript" src="https://theengineeringofconsciousexperience.com/wp-content/themes/magazinebook/js/jquery.easy-ticker.js?ver=3.1.0" id="magazinebook-news-ticker-js"></script> <script type="text/javascript" src="https://theengineeringofconsciousexperience.com/wp-content/themes/magazinebook/js/splide.min.js?ver=2.3.1" id="splide-js-js"></script> <script type="text/javascript" src="https://theengineeringofconsciousexperience.com/wp-content/themes/magazinebook/js/theme.js?ver=1.0.9" id="magazinebook-theme-js-js"></script> <script>!function(){window.advanced_ads_ready_queue=window.advanced_ads_ready_queue||[],advanced_ads_ready_queue.push=window.advanced_ads_ready;for(var d=0,a=advanced_ads_ready_queue.length;d<a;d++)advanced_ads_ready(advanced_ads_ready_queue[d])}();</script> </body> </html>