1littlecoder
In this Machine Learning Tutorial, We’ll see a live demo of using Open AI’s recent CLIP model. As they explain “CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.”
OpenAI’s Blog on CLIP: https://openai.com/blog/clip/
CLIP on Github: https://github.com/openai/CLIP
Zero-Shot Image Classification with CLIP on Colab: https://colab.research.google.com/github/openai/clip/blob/master/Interacting_with_CLIP.ipynb
Zero-Shot Text Classification with Hugging Face: https://www.youtube.com/watch?v=45rVF3t8OII
Great video, it could be done much better. There is slight confusion when you have directly changed the label description. It could be great like explaining the Generic working of the model first and then jump on to experimenting like changing label description. 😉