Videos

Connections between physics and deep learning



Center for Brains, Minds and Machines (CBMM)

Max Tegmark – MIT

Source

Similar Posts

22 thoughts on “Connections between physics and deep learning
  1. Max is absolutely brilliant, and a scientist of the absolute highest caliber, but his categorization of the different tasks within machine learning is incorrect. Modeling a joint probability, or p(x,y) is categorically referred to as a generative modeling, not unsupervised modeling, which is a different, though potentially overlapping, concept. Classification, correspondingly, is returning a class label for a given input, in standard notation, this is p(y|x). Prediction, or forecasting, is similarly p(x(t)|x(t-1),….x(1)). Unsupervised learning, by contrast, does not have some conventional notation, it refers to a scheme where a class label y is not fed to the training system. The joint probability that he wrote for unsupervised learning actually says nothing about the presence or absence of supervision, unless the y is a label, in which case the formalism is just plain wrong.

    I say this because there are lots of students looking at the work of brilliant scientists like Max, and they owe it to the students to have consistent and correct formalism, given that the students may still be learning.

  2. I personally see a similarity between physics and deep learning in the way the world is made of encapsulated layers of realies. For example, as shown in thermodynamics, the macroscopic layer don't need to know every position and velocity of every particles. It only need to know certains computed features like temperature and pressure.

  3. Max is a bit 'late', for it has been known, for quite a while, of neural networks' compression-bound nature:

    https://www.quora.com/How-are-hidden-Markov-models-related-to-deep-neural-networks/answer/Jordan-Bennett-9

    .

    .

    Albeit, we need subsume of larger problems, inclusive of Marcus Hutter's temporal difference aligned lemma, via hints from quantum mechanics, deep reinforcement learning (particularly deepmind flavoured) and causal learning (ie uetorch):

    http://www.academia.edu/25733790/Causal_Neural_Paradox_Thought_Curvature_Quite_the_transient_naive_hypothesis

    .

    .

    A code sample that initializes the confluence of temporal difference regime, abound the causal horizon:

    https://github.com/JordanMicahBennett/God

  4. I think it's a fascinating summary of the tie between the power of neural networks / deep learning and the peculiar physics of our universe. The mystery of why they work so well may be resolved by seeing the resonant homology across the information-accumulating substrate of our universe, from the base simplicity of our physics to the constrained nature of the evolved and grown artifacts all around us. The data in our natural world is the product of a hierarchy of iterative algorithms, and the computational simplification embedded within a deep learning network is also a hierarchy of iteration. Since neural networks are symbolic abstractions of how the human cortex works, perhaps it should not be a surprise that the brain has evolved structures that are computationally tuned to tease apart the complexity of our world.

    When he says "efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss," it reminds me of something I wrote in 2006: "Stephen Wolfram’s theory of computational equivalence suggests that simple, formulaic shortcuts for understanding evolution (and neural networks) may never be discovered. We can only run the iterative algorithm forward to see the results, and the various computational steps cannot be skipped. Thus, if we evolve a complex system, it is a black box defined by its interfaces. We cannot easily apply our design intuition to the improvement of its inner workings. We can’t even partition its subsystems without a serious effort at reverse-engineering." — from https://www.technologyreview.com/s/406033/technology-design-or-evolution/

  5. I am having difficulty understanding step 11 of the paper in which Max goes from the taylor series expansion form of the activation function to the multiplication approximator. Does anyone know of a more detailed explanation of this?

  6. The interleaving of linear evolution and non-linear functions is also how quantum mechanics works:
    1. The propagation step is perfectly linear, conservative, unitary, non-local and time-reversible. It is a continuous wave with complex amplitude specified by the Schrodinger equation. There is no loss of information. There are no localized particles in this step. There is no space in this step.
    2. The interaction step is discrete, non-linear, local and time-irreversible. It is a selection/generation/collapse of alternatives based on the Born Rule. There is a loss of information, as complex values are added and amplitudes squared to give non-negative real probabilities. The result is an interaction, the creation of space-time intervals from the previous interactions, identification of localized entities which might be called particles, and some outgoing waves that are correlated (entangled). Go to 1.

    Einstein complained that the non-locality of QM was "Spooky action at a distance", but in the Quantum Gravity upgrade, space is only created by interaction, so it becomes "Spooky distance at an action".

  7. In the beginning he says classification as the probability of the pixel data given a label y? But then he shows a convnet classifying with a probability of a label given the pixel data (the usual formulation). Which is the correct way to look at this?

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com