AI Explained
In this video, I will not only show you how to get smarter results from GPT 4 yourself, I will also showcase SmartGPT, a system which I believe, with evidence, might help beat MMLU state of the art benchmarks.
This should serve as your ultimate guide for boosting the automatic technical performance of GPT 4, without even needing few shot exemplars.
The video will cover papers published in the last 72 hours, like Automatically Discovered Chain of Thought, which beats even ‘Let’s think Step by Step’ and the approach that combines it all.
Yes, the video also touches on the OpenAI DeepLearning Prompt Engineering Course but the highlights come more from my own experiments using the MMLU benchmark, and drawing upon insights from the recent Boosting Theory of Mind, and Let’s Work This Out Step By Step, and combining it with Reflexion and Dialogue Enabled Resolving Agents.
Prompts Frameworks:
Answer: Let’s work this out in a step by step way to be sure we have the right answer
You are a researcher tasked with investigating the X response options provided. List the flaws and faulty logic of each answer option. Let’s work this out in a step by step way to be sure we have all the errors:
You are a resolver tasked with 1) finding which of the X answer options the researcher thought was best 2) improving that answer, and 3) Printing the improved answer in full. Let’s work this out in a step by step way to be sure we have the right answer:
Automatically Discovered Chain of Thought: https://arxiv.org/pdf/2305.02897.pdf
Karpathy Tweet: https://twitter.com/karpathy/status/1529288843207184384
Best prompt: Theory of Mind: https://arxiv.org/ftp/arxiv/papers/2304/2304.11490.pdf
Few Shot Improvements: https://sh-tsang.medium.com/review-gpt-3-language-models-are-few-shot-learners-ff3e63da944d
Dera Dialogue Paper: https://arxiv.org/pdf/2303.17071.pdf
MMLU: https://arxiv.org/pdf/2009.03300v3.pdf
GPT 4 Technical report: https://arxiv.org/pdf/2303.08774.pdf
Reflexion paper: https://arxiv.org/abs/2303.11366
Why AI is Smart and Stupid: https://www.youtube.com/watch?v=SvBR0OGT5VI&t=1s
Lennart Heim Video: https://www.youtube.com/watch?v=7EwAdTqGgWM&t=67s
https://www.patreon.com/AIExplained
I just found out that this prompt addition also works with Bing/edge chat image generation.
"a dslr photograph of a colossal swarm of birds descending on angkor wat, position the photographer on the ground looking up at the swarm, but also insure that the scenic ruins, roots, and trees are all in frame as well. Let's work this out in a step by step way to be sure we have the right answer."
—
Okay, let’s work on this together. First, we need to decide what kind of birds we want to use for the swarm. Do you have any preference? You can choose from these options:
Crows
Pigeons
Parrots
…
—
Then it continued asking me about where the photographer was positioned, how much of the sky was covered with parrots, the direction of the sun. — AND IT WORKED. I scream-laughed.
this talk is with breathing. It is really artificial.
Only 25 questionscaps every Three hours , !! IT IS a Flop
Looks like OpenAI have listened to you but instead of making SmartGPT, they opt to make it DumpGPT. I challenge you to show me same results with the current public Model used in ChatGPT.
i feel you'd have to be careful to avoid a "death by committee" effect, where this could kill creative outputs, but this is a very neat idea
Your approach would be interesting to test with the upcoming Orca local model trained on GPT3+GPT4 reasoning.
The post-generation self-reflection stages make a lot of sense. As ChatGPT does not really know what its full output will be once it starts writing, it can stray away and become inconsistent in its output.
But at the self-reflection stage, all the previous output is there as input context.
Naah bro , just try this on chatgpt 3.5:
Think out of the box:
Question: I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?
Easy as that.
furthering the idea: train GPT4.5 using corrections of itself made by GPT4 …
– some things will always need longer reasoning, but something like the jugs question really does not.
– I bet OpenAI are already doing that…
omg, you put so much effort into this … how do you not have an API key? They should outright employ someone as passionate…
I don't think MMLU at 95% would mean AGI — What we MAINLY need for smarter AI nowadays is context size
– No matter how smart, if it cannot consider all of the specification, and all of the (e.g.) "code base" it cannot
– There are some approaches to "searching" through text files for relevant stuff to then include only that in the context, but it's slow and now how humans think… and error prone, as it might turn out something it didn't realize was needed is now needed, then looping the process … irrespective of how intelligent, this will be so expensive to run, it might be cheaper to have a human do it. – And of course, realistically, it still makes many more mistakes for now.
Bing doesn't always agree to the "let's work this out in a step-by-step" that in the video's description. Have you find a solution for that?
brilliant! thank you!
Wow, super amazing! Is your code sample posted anywhere? Thanks
You essentially propose a form of ‘ensemble learning’. It’s been a ML research topic for decades and builds on the idea that an ensemble (or ‘council’) of models tends to perform better than a single model of same capability
Hello loved your videos btw. I don't see updates of smart GPT since we have GPT4 access now
woudnt both 30h and 5h be correct, depending on how you read it, or both could be false as well, there is no 1 answer, it depends on how you read it, therefor its not right or wrong in either case
Will you make a video about the Custom Instructions they added to GPT4 very recently?
7:09
It seems Your proposed architecture gives the same wrong answers to the dry clothes and jug measurement problems even running on gpt-4 …
Obviously the developers are using new models probably even chat GPT 10.