Numenta
Michaelangelo Caporale presents a summary of two papers that apply self-attention to vision tasks in neural networks. He first gives an overview of the architecture of using self-attention to learn models and compares it with RNN. He then dives into the attention mechanism used in each paper, specifically the local attention method in “Stand-Alone Self-Attention in Vision Models” and the global attention method in “An Image is Worth 16×16 Words”. Lastly, the team discusses inductive biases in these networks, potential tradeoffs and how the networks can learn efficiently with these mechanisms from the data that is given.
Next, Lucas Souza gives a breakdown of a potential machine learning environment and benchmark Numenta could adopt – Interactive Gibson. This simulation environment provides fully interactive scenes and simulations which allows researchers to train and evaluate agents in terms of object recognition, navigation etc.
“Stand-Alone Self-Attention in Vision Models” by Prajit Ramachandran, et al.: https://arxiv.org/abs/1906.05909
“An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” by Alexey Dosovitskiy, et al.: https://arxiv.org/abs/2010.11929
iGibson website: http://svl.stanford.edu/igibson/
0:00 Michaelangelo Caporale on Self-Attention in Neural Networks
1:09:30 Lucas Souza on iGibson Environment and Benchmark
– – – – –
Numenta is leading the new era of machine intelligence. Our deep experience in theoretical neuroscience research has led to tremendous discoveries on how the brain works. We have developed a framework called the Thousand Brains Theory of Intelligence that will be fundamental to advancing the state of artificial intelligence and machine learning. By applying this theory to existing deep learning systems, we are addressing today’s bottlenecks while enabling tomorrow’s applications.
Subscribe to our News Digest for the latest news about neuroscience and artificial intelligence:
https://tinyurl.com/NumentaNewsDigest
Subscribe to our Newsletter for the latest Numenta updates:
https://tinyurl.com/NumentaNewsletter
Our Social Media:
https://twitter.com/Numenta
https://www.facebook.com/OfficialNumenta
https://www.linkedin.com/company/numenta
Our Open Source Resources:
https://github.com/numenta
https://discourse.numenta.org/
Our Website:
https://numenta.com/
9:31 My reaction to Hawkins' question (which seems to be provided for the benefit of viewers who haven't seen the other video on GPT3, with Omohundro): Yes, it works the way Omohundro described GPT3 to you. If the sentence said "The animal didn't cross the street because it was too wide." we'd expect "it" to refer to "the street," not "the animal" but it would do that in such a way (trained on $12M worth of training data/"general English corpus"/"all the words") that the relationships between "animal and it" and "street and it" are similar to some degree and then differentiated or "narrowed down" by the specific context provided by the final word of the sentence. …Like the earlier "Apple minus Fruit equals Computer" example.
12:09 "The animal didn't cross the street because it was too tired." (If you scroll down on the left-hand side the theocratic view is represented with a bunch of dark orange lines going from "it" to "God." …Or, that's the way it would be if Deep Mind was run by messianic evangelical Bible-thumpers who attribute all things to Marklar/God.) "The animal didn't cross the street because God was too tired." …God didn't give the animal sufficient God-mojo, so it didn't cross the street. Instead, it stopped in the middle of the street and was struck dead by Satan! ("Satan" is a car being driven by a lead-footed Uber driver with poor reflexes.) …All this human-language focus seems like an epic waste of time to me. We need animal intelligence that looks at reality, and feels reality with its own nerves…not something that learns that all humans have adopted irrational programming via inadequate language. We need robots, set free to explore. …The sooner the better.
23:48 Interesting. I think it'd be wise to make something that's just creatively linked brain modules. Don't even try to constrain it with evolution, just build a 3D engine with accurate physics into the brain, and have it already be populated with 3D data that, even if occluded can be stripped of context and rotated. (I.e. videos of a person screwing a nut onto a bolt to secure a panel, and then all those populated neural connections being linked to a re-created 3D animation that exactly mirrors the video as "a different brain module." The 3D module can strip away all the other things other than the nut, and rotate the nut in 3D along each axis. Then, the robot learns to pick up an actual 3D nut in a realistic physics simulation. You turn this robot baby loose in a basic workshop with nuts and bolts and panels (each of which have been trained into the new brain module) and see if it performs better than the babies. Why? Because that's how nature worked: it just kept "adding brain modules" and massively connecting them. This is like J. Storrs Hall's idea of a massively feedbacked, nested, series of loops added to a robot. (Something like Boston Dynamics' robots, but with a massive neural net at the top that can then fire part of the loop back downward to contract muscles.) Then, just turn it loose and have it repeatedly forget the weakest connections when it's charging. Anything that it really weights heavily becomes a new memory area, permanently recorded, even if it's just in a baby-gibberish brain, at first.
This sort of thing is coming. Might as well be first.
Cool. Always interesting