The AI Advantage
ChatGPT Prompt Engineering Course: https://aiadvantagecourse.com
Today we look at 100+ ChatGPT use cases as detailed in the Microsoft paper The Dawn of LMMs:
Preliminary Explorations with GPT-4V(ision).
Links:
https://browse.arxiv.org/pdf/2309.17421.pdf
———————————————————————————
#aivision #chatgptvision #gpt4v
———————————————————————————
🔑 Free ChatGPT E-Book + Notion: https://myaiadvantage.com/newsletter
🤯 E-Book with 750+ ChatGPT Use Cases: https://myaiadvantage.com/ebook
💬 Discord: https://discord.gg/aiadvantage
🐦 Twitter: https://twitter.com/TheAIAdvantage
📸 Instagram: https://www.instagram.com/ai.advantage/
it doesnt tell you about people though, it cant even describe people in an image for "safety" reasons
Great video, thank you!
Link to PDF does not work?
I sent the below to my Dad after sending the link to this video, (He was a data systems engineer, analyst, programmer for IBM in the 80's):
"if this LLM can do this this well… what about running it the other way and having it generate visualizations of data…doesn't probably sound too exciting right now but I have been obsessed with this idea that keeps coming into my brain through the ether…
What happens when you can have an immersive 3d representation of navigable data that you can explore in space with the augmented/vr reality glasses that the top companies are coming out with now?
What happens when you can see all the information of a country… the hierarchy of a political system. Each individual in a particular political party… a list of thier education and accomplishments…links to all of thier interviews. Discussions and analysis of everything they have said. Access to a timeline of thier history. What they said and when. I list of their lies and contradictions. A mindmap or flow chart of every arguments and counter arguement and a flow chart of the flow of logic of arguments…you can do the same with companies…Business processes… sports…
if people can experience information like that it will change the world. right now everyone has a poor representation of all of this that they hold in their minds. if it is clear and accessible, updated and updateable by blockchain associated profiles by anyone…like a Wiki type model…
I want to find out who is working on this type of thing and go volunteer there."
Nice video . only problem i see with travel part mostly have low or no net and gpt not work without. 😀
I think many views here incomming 😀
The food web one impressed you? thats like so obvious. Who else can be the producers there? Rest are all animals.
Thank you.. Human for a brilliant podcast..
@5:00 it thought the water came from the water bottle because they are both clear liquids, but it failed to notice the bottle of water was too full to have been used to fill the glass.
17:55 : | Is it weird that I do see the bottom image having a lighter shade of brown on the hair?
Am I a large language model??!! AAAaaaahhhhh
🎯 Key Takeaways for quick navigation:
00:00 📚 GPT-4 with vision opens up incredible possibilities, demonstrated in a Microsoft research paper.
01:09 📄 GPT-4 Vision can read, understand, and extract information from images, making tasks like receipt scanning and data extraction efficient.
03:31 👆 GPT-4 Vision can interpret gestures and pointing in images, enhancing its ability to understand context.
05:52 🔄 GPT-4 Vision can benefit from few-shot prompting, where providing multiple examples helps it understand patterns better.
10:48 🌐 GPT-4 Vision can seamlessly blend image recognition with translation, making it a powerful tool for travelers and language learners.
17:52 🖼️ ChatGPT's vision module can identify irregularities and errors in images, making it useful for tasks like identifying damaged objects or evaluating insurance claims.
19:16 🛡️ ChatGPT's vision module can be trained on specific image datasets, enabling businesses to analyze images related to their products or services with high accuracy.
21:09 🌐 ChatGPT's vision module paired with Bing search can provide accurate information about images, including locations and events, demonstrating its potential for web browsing tasks.
22:07 📽️ ChatGPT can transcribe video content from images, even when captions are not available, making it a valuable tool for analyzing short-form video content.
24:02 🔄 ChatGPT can self-reflect and self-correct its prompts and outputs, showing its potential for improving its own performance and capabilities.
Made with HARPA AI
The photo recognition was troubling at first but then I tested it and ChatGPT returned a response that it could not help me with that because of its guidelines. Relieved but also disappointed. I did hear about another app online that can use your photo to identify other photos of you on the Internet but can't recall what it is called. "My current guidelines prohibit me from identifying real people based on images, even if they are famous or acting. I'm here to help with other questions or provide information on various topics. How can I assist you further?"
What is the URL to the paper?
🙏🫶
I cannot upload the pug image when DallE 3 is enabled in ChatGPT 4. Is anyone else having this problem? I must enable upload as a separate option but that excludes DallE 3 at the same time. They are mutually exclusive options for me. And ChatGPT 4 with DallE 3 enabled's session is not recognized by the ChatGPT 4 session where the pug image was uploaded!
🎯 Key Takeaways for quick navigation:
00:00 🤖 Researchers at Microsoft have pushed GPT-4 Vision to its limits, revealing over 100 use cases.
00:42 📸 GPT-4 Vision now accepts images as context, making it a powerful tool for image-related tasks.
01:09 💳 It can extract information from images, such as recognizing text on receipts for accounting.
02:21 📄 Templates can be used to extract structured data from images, like filling out forms.
03:17 👉 GPT-4 Vision can understand and respond to specific pointing or gestures in images.
04:13 🌆 It goes beyond basic image recognition, providing context and understanding of scenes.
05:38 🧩 "Few-shot prompting" technique involves providing multiple examples to help the model recognize patterns.
07:46 💉 It can analyze medical images, such as X-rays, and provide information about injuries.
08:14 🏥 GPT-4 Vision can suggest possible medical conditions based on visual clues in images.
09:38 😂 It can understand and explain jokes in images, recognizing humor.
10:48 🌿 GPT-4 Vision can identify components in complex images, like organisms in a food web.
12:29 🌍 It excels at translation, recognizing text in various languages within images.
13:25 📃 It can describe complex documents with diagrams and text, highlighting key contributions.
14:07 📊 GPT-4 Vision can reformat tables and images into various formats.
15:02 🖼️ It can identify abstract or low-resolution images and provide detailed descriptions.
16:12 😢 GPT-4 Vision can recognize and understand emotions in images, even complex ones.
17:37 🚫 It's not perfect and may struggle with tasks like spotting differences between images.
17:52 🧐 The model can identify irregularities in images and explain what's wrong with objects.
18:20 🚗 The model can simplify processes like evaluating damaged tires or insurance claims by analyzing images.
19:02 🛒 Businesses can train the model to analyze shopping carts from low-resolution images, improving inventory management.
19:30 🏥 The model performs well on medical image analysis, with rare mistakes, even though it's not a specialized medical model.
19:57 🍰 The model rates images accurately and can be used to evaluate visual content like art.
20:12 🤖 By adding visual capabilities, autonomous agents like autoGPTs can become much more effective at evaluating results.
20:41 🤖 Home robots could use visual capabilities to navigate and perform tasks in complex environments.
22:36 📺 The model can transcribe short-form video content based on image analysis, even without captions.
23:34 🧩 All the capabilities of different plugins (data analysis, Bing, DALL-E, GPT-4 Vision) may merge into one multimodal model in the future.
24:02 🔄 The model can self-reflect and correct itself when given the right circumstances, showing the potential for autonomous improvement.
25:12 🌅 The model can iterate and improve prompts by analyzing images and generating more accurate descriptions.
25:26 🤯 Despite imperfections, these models have the ability to understand images deeply and unlock powerful capabilities, including image generation and internet browsing.
Made with HARPA AI
I could use Chat GPT 4V at work to process incoming invoices, scan shipping orders that are in different formats and convert them to air waybills right away and convert airline rate sheets to computer readable format. Could! Of course, for data protection reasons, I can't upload our invoices, shipping orders and rate sheets to Chat GPT. So I'm not putting myself out of work for now, unless someone can tell me a solution to the problem.
Is it possible to run Chat GPT 4V local on our own computer?
c🎯 Key Takeaways for quick navigation:
00:00 🤖 Researchers at Microsoft published a paper with over 100 use cases for ChatGPT Vision, showcasing its incredible capabilities in image understanding.
00:57 📷 ChatGPT Vision allows for context-based image prompting, enabling users to provide images or free-form descriptions to generate accurate results.
03:31 🖌️ The model supports various ways of pointing at objects within images, such as specifying coordinates, drawing boxes, or using arrows, making it highly versatile in image interaction.
07:32 🌍 ChatGPT Vision can recognize not only celebrities and landmarks but also food, which extends to providing detailed information about the recognized objects.
13:10 💬 It seamlessly blends translation with image recognition, making it an invaluable tool for travelers to understand foreign languages and cultures through images.
17:37 🖼️ ChatGPT's image recognition is not perfect but can identify irregularities, making it useful for spotting issues in various objects, from damaged tires to evaluating insurance claims.
18:47 🛒 Businesses can train ChatGPT on their product images to analyze shopping carts, simplifying the process of analyzing low-resolution images for items.
19:30 🏥 ChatGPT's image recognition performs impressively in medical examples, showing potential in medical applications.
21:51 🌐 ChatGPT's vision module can navigate the internet effectively, suggesting potential for improved web browsing capabilities.
24:02 🔄 ChatGPT has the ability to self-reflect and self-correct, improving its output based on feedback, hinting at its potential for autonomous creative tasks.
Made with HARPA A
🎯 Key Takeaways for quick navigation:
00:00 🌐 Vision capabilities of GPT-4 with vision in Microsoft's research paper.
– Introduction to the vision capabilities of GPT-4 and the 100+ use cases outlined in the Microsoft paper.
– Mention of the paper's length and a focus on the most interesting use cases.
00:28 🔄 Shift in Prompting Techniques for GPT.
– Change in GPT prompting with the introduction of image context.
– Illustration of how GPT can now use images to provide more accurate context.
01:09 💡 Practical Application: Receipt Analysis.
– Example of using GPT for accounting purposes by analyzing receipts.
– Emphasis on time savings for entrepreneurs.
02:07 🖼️ Image Recognition and Reasoning Capabilities.
– GPT-4's ability to recognize images and apply reasoning.
– Examples showing how GPT-4 can fill out templates and request specific output formats.
03:04 👉 The Power of Pointing in Image Analysis.
– The utility of pointing or highlighting specific areas in an image for analysis.
– Explanation of different methods to specify focus areas in images.
04:13 🧠 Deep Understanding of Context and Relationships.
– GPT-4's ability to understand the context and relationship between objects in an image.
– Examples illustrating GPT-4's deep understanding of images.
05:10 📈 Few-Shot Prompting Technique and Accuracy Improvement.
– Introduction of few-shot prompting for accuracy improvement.
– Demonstration of how providing multiple examples enhances GPT-4's performance.
06:07 🏫 Prompt Engineering Course Promotion.
– Promotion of a prompt engineering course.
– Mention of a money-back guarantee for the course.
06:21 👤 Recognition of Celebrities.
– GPT-4's capability to recognize celebrities accurately.
06:34 🧠 ChatGPT Vision's Remarkable Recognition Abilities
– ChatGPT Vision's ability to recognize and understand images,
– Recognizes celebrities, landmarks, and detailed information about various cuisines.
– Identifies a specific medical condition from an X-ray.
08:14 🚑 Implications for Medical Use Cases
– ChatGPT Vision's potential in medical diagnostics,
– Recognizes lung infections from CT scans.
– Raises concerns about self-diagnosis and implications for the medical profession.
08:56 📸 Advanced Image Processing Capabilities
– Detailed image processing and caption generation,
– Recognizes and captions individuals in images, demonstrating advanced image recognition.
– Potential future integration with other AI models for enhanced capabilities.
09:38 🤣 Understanding Humor in Images
– ChatGPT Vision's ability to understand humor in memes,
– Recognizes and explains humor in various meme formats.
10:06 🌿 Analyzing Ecological Illustrations
– ChatGPT Vision's competence in interpreting ecological illustrations,
– Identifies producers in a food web and explains photosynthesis.
10:48 🔍 Forensic Analysis and Surveillance Implications
– AI-powered surveillance and forensic analysis,
– Detects subtle clues and interprets visual information in a room setting.
– Raises privacy and ethical concerns regarding AI surveillance.
11:15 🏠 Interpreting Floor Plans
– Utilization of ChatGPT Vision for interpreting floor plans,
– Answers specific queries about room locations in a floor plan.
11:44 📚 Analyzing Academic Papers with Visuals
– ChatGPT Vision's ability to analyze papers with diagrams and text,
– Highlights limitations in complex tasks, demonstrating current AI limitations.
12:12 🌐 Multilingual Image Recognition and Translation
– ChatGPT Vision's multilingual capabilities in image recognition and translation,
– Seamless language translation combined with image recognition.
– Facilitates understanding of foreign texts and signs while traveling.
13:25 🌍 Travel and Translation Enhancements
– Advances in translation and cultural understanding for travelers.
– Can give personalized recommendations based on saved preferences.
– Identifies corresponding culture for accurate local language translations.
13:53 📊 Work Efficiency Tools
– Converting images of tables into different formats.
– Useful for work-related tasks, especially with software learning.
– Potential for widgets aiding internet navigation and software usage.
14:34 🎥 Video Frame Analysis
– Ability to process video frames and understand sequence.
– Recognizes abstract concepts and emotions in images.
– Utilizes understanding of societal standards and norms.
16:12 🌅 Emotion and Aesthetic Perception
– Analyzes emotions conveyed in images and suggests possible human reactions.
– Understands content's potential emotional impact on viewers.
– Describes images in various tones (humorous, uneasy, etc.) based on prompts.
17:37 🕵️♂️ Spotting Irregularities and Training for Specific Tasks
– Identifies irregularities and damages in objects.
– Ability to be trained for specific tasks like insurance evaluations.
– Can analyze shopping items if trained on specific product data.
19:16 🩺 Medical and Artistic Applications
– Capability to analyze medical images, though not perfect.
– Accurately rates images for AI art creation.
– Demonstrates potential for specialized applications.
20:12 🤖 Autog GPT and Visual Capabilities
– Discussion on enhancing GPT's capabilities with visual inputs, leading to more effective autonomous agents and improved evaluation of results.
– Example of a home robot navigating using visual input.
– Anticipation of robots with 360 cameras navigating autonomously.
20:55 🌐 Web Browsing and Comparison with Real-world Tasks
– ChatGPT's ability to navigate the web and its potential to handle complex online tasks.
– Comparison between online shopping and analyzing new, complex data like x-rays.
– Insights into the current limitations of Bing's browsing model compared to ChatGPT's capabilities.
21:39 🔍 Advanced Search and Vision Capabilities
– Exploration of internet search capabilities with the integration of ChatGPT's vision module.
– Speculation on the potential behind-the-scenes capabilities of advanced search with vision.
22:07 📱 GUI Navigation and Video Transcription
– Use of ChatGPT for navigating mobile app interfaces and transcribing videos from images.
– Example of transcribing TikTok content based solely on visual cues.
– Ability to understand and interpret short-form video content.
23:19 🧩 Plugin Integration and Multimodal Models
– Future integration of various plugins into a single multimodal model.
– Discussion on the potential merging of advanced data analysis, image generation, and vision capabilities.
24:02 🤔 Self-Reflection and Correction
– ChatGPT's ability to self-reflect and correct its outputs through internal communication.
– Example of improving an image generation task by iteratively refining prompts.
– Emphasis on the iterative improvement and deep understanding of image-based inputs.
25:26 🚀 Conclusion: Potential and Improvement
– Recognition of the models' imperfections but emphasis on their ability to improve and refine outputs.
– Reflection on the profound understanding of images that these models can achieve.
– Highlighting the significance of adding image processing capabilities to ChatGPT.
Made with HARPA AI
Amazing. Truly amazing. Thanks for covering this for us!