Computerphile
AI image generators are massive, but how are they creating such interesting images? Dr Mike Pound explains what’s going on.
Thumbnail image partly created by DALL-E with the prompt: “Computerphile YouTube Video presenter Mike Pound Explains Diffusion AI methods thumbnail with green computer style title text on a black background with grey binary”
https://www.facebook.com/computerphile
https://twitter.com/computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: https://bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran’s Numberphile. More at http://www.bradyharan.com
I find the name of this channel quite interesting
Is this simplified explanation of the noise diffusion process true?
Theoretically, it’s like inserting an ‘ice cream’ mosaic with hundreds of other tesserae (rectangular slabs used to create a mosaic) and then asking a highly intelligent artist to watch them being removed to restore the original image. During this process, the artist learns how to understand and reinterpret the ‘ice cream’ image in other mosaics. The artist is trained to do this with millions of other images in mosaics so that they can create entirely new ones determined by the requests (or text prompts) of the person commissioning them.
10:24 This is the part that boggles my mind intuitively. Because wouldn't the second step just try to remove the noise and get back to that imperfect result produced by the first step? Presumably when the first time around you imperfectly remove the noise and get back some vague shape kind of resembling the true image, you also lost some of the actual information of that image because you were removing the noise perfectly. So wouldn't the second step just remove noise to try to get back to that imperfect image, instead (and doing that imperfectly, so that it loses even more actual information)? I guess I just don't see how that would make it better and better each iteration, rather than worse and worse.
EDIT: Oh, but I guess the point is that you're using this specifically only when generating a new image from scratch, so there wasn't really any "true" image to begin with, and all that matters is making a crisp image that fits the prompt, so it doesn't matter if you're "losing information" about some true image, because there was none to begin with? If that is the case, would my above intuition still be true if this were the approach for removing noise from an actual existing image, rather than for generating something new?
You cannot subtract out random noise. Random noise minus random noise is just different random noise.
谁能给翻译一下
Why Noise? Because it is something like a pixelated disintegration of the image that can be stored mathematically? So it's easier to compare its structure with other images?
bok hibi
How does it "know" what a frog looks like when I give it the text "frog"?
gpt style transformer embedding
Can any of you ai losers tell me how the ai taking the jobs of actors and script writers is any different than that of ai art. Why is it you hold the labor and monetary gains of one group of artists above that of another, often poorer, group of artists? Many commission artists currently at this moment rely on the money they make doing their work to survive, shame many of them will be replaced by ai, equally shameful the fact you don't seem to care. The theft of labor for the gains of those who hold the ai capable of creating the greatest soulless piece of garbage, to me, in both cases of ai art as well as the acting and script writing ai, seem like a negative to me. Sorry to break the day dream but we still live in a capitalist society, meaning people still need money to live, and even if we didn't live in a capitalist society do you not believe the work of the laborers whose work is used within the ai should be met with compensation? Why is it that within a socialist community I have found those who are willing to rob others of their livelihood and labor all for ones own greed. Doesn't seem very socialist to me…
My final opinion on ai art is that if you want to use it, it should have to be licensed with a legal mandatory price charge, a payment for each artist's work that is put into it each and every individual time it is run through an ai. Too much for anyone to afford, let alone afford and be worth using or too difficult to trace as a result of taking too many people's work for scanning? That's just too bad, guess you'll have to either learn art yourself or buy from a person who actually creates through the labor that is used to function said ai. If we're to replace the creative for machines then the creative should be paid at full for their work so they can go on creating. And if such means cannot be met then the use of ai art within the media and the sale of ai art should be avoided through legal means as well as through TOS. As for the use of ai art as a means for ones own individual enjoyment, such as the creation of personal room posters, a personal phone screen, or anything kept to oneself or a small in group such as friends or family is essentially unavoidable and while still damaging to the art community, the actual art community, is a damage that cannot really be stopped in any reasonable fashion. I also believe it should be legally mandatory for ai art to be labeled as ai art, and should not be allowed in areas such as journalism and other forms of media that are meant to report on reality, I have already found highly misleading ai clickbait photos in news articles (and we wonder why we've been seeing a massive spike in right leaning conspiracy theorists and people detached from reality).
the whole "it injects an embedding from the input string" is a bit glossed over. so its just back to using a GAN, or what? the whole point is that it's generating images based on this input string, and it feels like you didn't talk about it at all how it does that
So, to sum it up in one word: PHOTOBASHING
(It would have been ethically better to have paid the artists instead of taking their work off the internet without permission to train these models)
This was a pretty awesome explanation! Thank you!
Computer Phile, this is very Good and intuitive 😊
This is how DALL-E works in a nutshell:
"Read user prompt. Decide it's against their arbitrary moral codex. Emit error."
Excellent vid btw. Explained something complex in a very easy way.
Amazing explanation. Thank you!
Hello, Dall-E and MidJourney can be considered a DCGANs?
Here one question, Let's assume we have a nice diffusion model trained. Then we have an image I and then we added noise to I so we got the I_noised. Then we apply the diffusion network to I_noised, can we really get the original image I? or just a realistic image simiar to the set of trained dataset images? I guess we cannot get the same original image, becuase whenever you train the images you add a "RANDOM" Gaussian noise, so if you train the image 100 epoches you will train with 100 different I_noised. And I_noised is in fact almost pure Multivariate Gaussian and does not have any information of the original image. The intermediate process could remember the statistics from t to t-1, but you cannot get the original image I_0 from the Image I_noised = I_T. So In short, the reseverse processing network learn how to convert the probability distribution of I_t to I_t-1 …. so that finally I_1 is similar to the distribtion of the original dataset. but cannot reconstruct the specific original image.
All i got from this video is…NOISE
I think a lot of fear artists have comes from the failure of sites like this to explain what algorithms are doing in the initial stages, how they are trained, and what is actually happening in each step to alter the image file. So far I haven't seen a single site explain that stuff as more than "ai is trained on many images" so then artist take that as "its stealing pieces of my art" when it isn't doing that at all. I'm honestly tired of explaining this stuff to angry idiots on social media.
Please explain that part better. Even lawyers and media companies don't seem to understand this aspect of these image generators, further feeding the fear of artists.
I love that he's doing all of this on 1980s printer paper. Proper geek
You did a great job explaining how the process works and provided visual examples. Nice work with this video.
I wish I could understand this. You must be a genius!
imagine if you had used dall-e to create visuals for this instead of just drawing the same "useful" box on paper 30 times?