How to Make Your Images Talk: The AI that Captions Any Image

September 28, 2022Artis Modus

Pritish Mishra

HuggingFace Web App: https://bit.ly/3SDyOWt

Image captioning is the process of taking an image and generating a caption that accurately describes the scene. This is a difficult task for neural networks because it requires understanding both natural language and computer vision.

In this video, I discuss my complete approach to this problem. For visual understanding, we will use Inception V3, and for natural language understanding, we will first use RNN, but it will fail to generalize well on unseen data, therefore we will shift to Transformer. And as you will see, Transformer will nail it!

Source Code:
Image Captioning with RNN: https://bit.ly/3SBPoGi
Image Captioning with Transformer: https://bit.ly/3HToJRC
Image Captioning (on MS COCO Dataset): https://bit.ly/40t2da9

🔗 Social Media 🔗
📱 Twitter: https://bit.ly/3aJWAeF
📝 LinkedIn: https://bit.ly/3aQGGiL
📂 GitHub: https://bit.ly/2QGLVYV

Timestamps:
00:00 Introduction
00:16 Quick overview of Image Captioning
01:08 The Model Architecture (RNN)
01:56 Getting the Image feature vectors using Inception V3
04:39 What Attention Mechanism is doing?
05:10 Choosing the Dataset
05:56 Data Preprocessing
06:54 Training!!!
07:13 Checking the results
09:24 Over Dramatic Transformer Introduction
10:25 Why I used COCO Dataset
11:12 Side-by-side result of RNN and Transformer
11:59 Deploying model to HuggingFace so anyone can use it!

#artificialintelligence #ai #deeplearning #machinelearning #transformer #transformers

Thank You,
Pritish Mishra

Source

Similar Posts

49 thoughts on “How to Make Your Images Talk: The AI that Captions Any Image”

@PritishMishra says:

September 28, 2022 at 8:04 pm

Here's how I created a search engine for books using GPT3: https://youtu.be/SXFP4nHAWN8
@letitiasaragi441 says:

January 13, 2024 at 5:49 am

Hi Pritish, nice video! The first source code for image captioning with RNN doesn't work (https://drive.google.com/file/u/0/d/1-yWKVUs_zlAcS4S1Epgk8pYORWjryra-/), could you maybe provide an updated link? We are really interested in the preprocessing for a similar project. Thanks beforehand!
@LeoPaulose-g6n says:

January 15, 2024 at 9:44 pm

Bro the Image Captioning with RNN source code is not available
@sasidharank372 says:

February 9, 2024 at 12:03 am

I have a problem in caption key and image signature can pls help me in it
@s.dharanashs.dharanash5991 says:

February 13, 2024 at 1:41 am

RNN file does not exist bro pls upload
@ayush1344 says:

March 2, 2024 at 1:59 pm

Brother this video is really great and i loved your explanation but i am a beginner in aiml and want to learn this in detail
Can you please create a detail video on this topic
@ayushjindal4981 says:

March 4, 2024 at 3:05 am

Hi Pritish, Is it possible to use your model's results using web API calls?
@vamshynaidu says:

March 5, 2024 at 1:52 am

you nailed it bro
@drafatkarim8631 says:

March 6, 2024 at 6:09 am

Nice video. How long does it take you to train the transformer model?
@sridharreddy5714 says:

March 8, 2024 at 6:14 am

i want to do the image captioning with unsupervised or semi supervised bro if you have any reference code or implemented code if you share
it will be helpful to me
@tharunjansi6286 says:

March 11, 2024 at 6:56 am

ModuleNotFoundError: No module named 'tensorflow' i got this error in hugging face while building app how to install tensorflow in hugging face @PritishMishra
@YforYou2596 says:

March 15, 2024 at 2:59 am

bro unable to get the dataset brooo
@EM-nr9hj says:

March 21, 2024 at 6:51 pm

Bro unable to get , Image caption using RNN. The link is not working. Can you please check.
@swetanayak2005 says:

March 22, 2024 at 2:26 pm

How to get the code
@witchergaming5796 says:

March 28, 2024 at 7:15 am

The RNN source code link is not working please provide a link
@GANGADHARTHOTAKURA says:

March 30, 2024 at 11:31 pm

Image captioning with RNN source code is not opening dude please upload 😊.
@GANGADHARTHOTAKURA says:

March 30, 2024 at 11:33 pm

DUDE Please re-upload it RRN SOURCE CODE .
@Vikramx123 says:

April 7, 2024 at 9:04 am

How can we do it for videos bro ??
@kailashbalasubramaniyam230 says:

April 8, 2024 at 7:10 am

goog one buddy
@ghashianameen says:

April 13, 2024 at 4:38 am

bro can you help me in Video captioning project?
@swastiktyagi8246 says:

April 16, 2024 at 3:20 am

Can you share the link for pretraiend model ( h5 ) .please share it
@venkatavivek2895 says:

April 17, 2024 at 4:32 am

How to use the saved model weights model.h5 in another file to make inferences on new images
@BoloFofoPT says:

April 18, 2024 at 12:25 pm

Amazing video, where did you learn all of this? omg just saved me so much time. Life safer
@BoloFofoPT says:

April 19, 2024 at 1:56 am

Hi man can you help me out? What is the captions.txt file? is it the Flickr9k.token.txt?
@lukeshpraveen4763 says:

April 29, 2024 at 10:17 am

ur github link is saying that it is suspended
@AniKeth-wi7zb says:

May 1, 2024 at 6:29 am

Github link is not opening , it's says that it was uploaded from a suspended account
@LinhHuynh-lr5bz says:

May 24, 2024 at 10:42 am

Link of Images Captioning with RNN was dead, Can you update it to help me. Thank you. From VietNam with love <3
@ujjawalagrawal says:

May 26, 2024 at 12:48 am

Very nice explanation
@satyamtiwari3839 says:

June 3, 2024 at 1:07 am

hey none of your links are working
@hugehammer says:

June 24, 2024 at 7:40 pm

Awesome Video bro !! You explained Image captioning in a simple and fun way.
@fung1459 says:

June 26, 2024 at 11:37 am

Your RNN file is showing Page Not Found , can you reupload the file
@dishadubey8568 says:

July 6, 2024 at 1:53 am

Hey, great lecture! Just need a help, the link for the google colab for image captioning with rnn isn't working. It would be great help if you'll provide a new link. Thankyou!!
@shafqatkhiraam7343 says:

August 8, 2024 at 9:06 am

Source code link with RNN not working😢😢
@aady392 says:

October 13, 2024 at 4:27 am

Hi Pritish, amazing tutorials. Thank you. While running the transformers colab book getting error at –
—-> 4 pred_caption = generate_caption(img_path). TypeError: `x` and `y` must have the same dtype, got tf.uint8 != tf.float32.

Can you please help!
@pandoraowl7379 says:

November 18, 2024 at 6:43 am

bro your source code link is not working
@jigsaw841 says:

November 27, 2024 at 12:42 pm

Image captioning With RNN code isn’t available,could you please solve it :/
@shubhamilhe1452 says:

November 30, 2024 at 3:08 pm

that was fucking amazing
@AbhinavGavade says:

February 22, 2025 at 2:06 am

Can we use CNN + LSTM ? for better image feature extraction and structured answers rather than RNN ?
@mohammedabufarha4860 says:

March 22, 2025 at 3:47 pm

Absolute cinema
@atharvacreations3271 says:

March 23, 2025 at 3:15 am

i am finding difficulty to access codes through github link
@atharvacreations3271 says:

March 23, 2025 at 3:29 am

where can i get the codes from???
@JatinKumarPhogat says:

April 1, 2025 at 3:37 am

Bro i have one doubt
the second link that you have given is using transformer , i wanted to ask if it is trained on coco or flikcr , also 3rd link that you have given ((on MS COCO Dataset))in that you have used rnn or transformer … also which is the best??

Pls reply … thankyou ❤
@AbhishekKumarSingh-n2k says:

April 1, 2025 at 11:10 pm

Bro can you provide a drive link for your saved models because your LFS bandwidth is exceeded in GitHub. Please
@AbhishekKumarSingh-n2k says:

April 1, 2025 at 11:11 pm

Can anyone share the saved_models with me in a drive link?
@starslighten1100 says:

April 24, 2025 at 1:00 am

Pls update link for rnn
@saumyakesharwani5949 says:

May 8, 2025 at 1:30 am

very helpful …. Great work
@meetkorat05 says:

August 3, 2025 at 9:42 am

bro streamlit web page code send kardo please. anyone , who have the code for streamlit web page, please send me
@dilakshankamalathasan6070 says:

August 19, 2025 at 10:42 pm

no one is helping, how to label image and wirte image capiton in json file or a CSV file for image captioning please help on tht,
@manishtiwari4213 says:

September 12, 2025 at 6:35 am

bro can you provide a github repository for it

Comments are closed.