DEV Community

huimin liao
huimin liao

Posted on

"The Godfather of AI" Wins the Nobel Prize in Physics | What Magic Does the Neural Network Exactly Have?

The 2024 Nobel Prize in Physics was announced, awarding two scientists in the field of artificial intelligence
On October 8th, Beijing time, the 2024 Nobel Prize in Physics was announced at the Royal Swedish Academy of Sciences. This year, the award will be given to two scientists: Professor John J. Hopfield from Princeton University in the United States and Professor Geoffrey E. Hinton from the University of Toronto in Canada, in recognition of their "fundamental discoveries and inventions in using artificial neural networks for machine learning". The official Nobel website indicates that the two Nobel laureates in physics used physics to train artificial neural networks: John J. Hopfield created an associative memory that can store and reconstruct images and other types of patterns in data; Geoffrey E. Hinton invented a method that can autonomously search for attributes in data, thus performing tasks such as identifying specific elements in pictures, etc. [1]

Image description

Figure 1 The 2024 Nobel laureates in physics [2]

(1) John J. Hopfield [1]
John J. Hopfield was born in Chicago, Illinois, in 1933 and received his Ph.D. from Cornell University in the United States in 1958. He is currently a professor at Princeton University in the United States.
In 1986, John J. Hopfield co-founded the Doctoral Program in Computation and Neural Systems at the California Institute of Technology and discovered the associative memory neural network technology, usually called the "Hopfield network". The Hopfield network utilizes the principles describing the properties of matter in physics. The entire network is described in a way equivalent to the energy of a spin system in physics and is trained by finding values for the connections between nodes so that the stored images have low energy. When a distorted or incomplete image is input into the Hopfield network, it systematically traverses the nodes and updates their values, thus reducing the energy of the network. Therefore, the network will gradually find the stored image that is most similar to the input imperfect image.

(2) Geoffrey E. Hinton [1]
Geoffrey E. Hinton was born in London, England, in 1947 and received his Ph.D. from the University of Edinburgh in England in 1978. He is currently a professor at the University of Toronto in Canada.
Based on the Hopfield network, in the 1990s, Hinton and his colleague Terrence Sejnowski used the tools of statistical physics to create a new network with a different method: the Boltzmann machine, which can learn to recognize the characteristic elements in a given type of data. Hinton used the tools of statistical physics to train the machine by providing examples that are very likely to occur during operation. The Boltzmann machine can be used to classify images or create new examples of the pattern types it has been trained on. Hinton conducted further research on this basis, promoting the explosive development of current machine learning.

From the perceptron to MLLM (Multimodal Large Language Model), the evolution path of artificial neural networks
The development history of artificial neural networks can be traced back to 1957. At that time, American scientist Frank Rosenblatt proposed the concept of the perceptron [3]. The perceptron model is a binary classification algorithm that simulates the function of human neurons. It can linearly classify input data and learn by adjusting weights. However, because the perceptron can only solve linearly separable problems, it was later proven by Marvin Minsky and Seymour Papert in 1969 that it could not handle nonlinear problems, and the research in this field once stagnated [4].

Image description
Figure 2 Schematic diagram of the perceptron structure (Source: Synced)

Until the 1980s, the proposal of the backpropagation algorithm injected new vitality into artificial neural networks. The backpropagation algorithm developed by Geoffrey Hinton (the 2024 Nobel laureate in physics), David Rumelhart, and Ronald Williams, etc., solved the gradient calculation problem of multilayer neural networks, making it possible to train deep networks [5]. This breakthrough promoted the rise of deep learning and gradually established the core position of artificial neural networks in fields such as computer vision and natural language processing.

Image description

Figure 3 Schematic diagram of backpropagation (Source: deephub)

After entering the 21st century, the convolutional neural network (CNN) has become the mainstream model in the field of computer vision. In 2012, AlexNet developed by Hinton's student Alex Krizhevsky shone brightly in the ImageNet competition. Its deep convolutional structure significantly improved the accuracy of image classification and became an important milestone in promoting the deep learning wave [6]. Since then, deep learning has rapidly expanded to multiple fields such as speech recognition and machine translation. With the support of larger-scale data and stronger computing capabilities, the model structure of neural networks has continuously deepened, and the number of network layers has gradually increased. Architectures such as ResNet and VGG have emerged one after another.

Image description

Figure 4 Schematic diagram of the AlexNet network structure (Source: CSDN@不如语冰)

In recent years, the rise of multimodal large language models (MLLM) has further promoted the development of artificial neural networks. MLLM is not limited to processing single-modal data (such as text), but can also combine multiple modalities such as images and audio for unified understanding and generation. This breakthrough enables machines to better understand and process complex and diverse data environments [7]. For example, GPT-2 launched by OpenAI in 2019 demonstrated excellent language generation capabilities, and the subsequent GPT-3 further increased the number of parameters to 1750 billion and significantly enhanced the model's dialogue and reasoning capabilities [8]. With the release of GPT-4 in 2023, the multimodal capabilities of MLLM achieved a new breakthrough, being able to process text and image inputs simultaneously, opening up more application scenarios for intelligent assistants and generative AI [9].

Image description

Figure 5 Schematic diagram of MLLM generated by GPT-4

However, the surge in model size has also brought problems such as consumption of computing resources and energy. In this context, researchers have begun to explore new hardware architectures and model optimization techniques. Model compression techniques such as quantization and distillation have gradually emerged, helping to reduce the computational burden of training and inference. At the same time, the emergence of emerging hardware architectures such as in-memory computing has provided new solutions for the calculation of large-scale neural networks [10]. In the future, with the continuous evolution of multimodal large language models, we are expected to witness artificial neural networks demonstrating stronger intelligent capabilities in more complex tasks.

1.The model size and the number of parameters have surged, and in-memory computing empowers neural network accelerators
In the second part, we briefly introduced the evolution path of artificial neural networks from the perceptron to MLLM (Multimodal Large Language Model). With the continuous development of machine learning and large-scale neural networks, especially the breakthroughs of MLLM in various tasks, the size and the number of parameters of neural network models have shown an explosive growth. GPT-2 (2019) has 1500 million parameters and is OpenAI's second-generation generative pre-trained transformation model. Although its number of parameters is relatively small, it has already demonstrated strong capabilities in language generation tasks. Later, OpenAI launched GPT-3 (2023), which has 1750 billion parameters and became one of the most complex natural language processing models at that time. Subsequently, Google launched LaMDA (2021), which focuses on dialogue applications and has 1730 billion parameters; launched PaLM (2022), whose number of parameters reached 5400 billion and became one of the basic models for multimodal and multi-task learning. In 2023, OpenAI launched GPT-4, achieving a further breakthrough in multimodal large language models, with the number of parameters reaching 1.76 trillion. Compared with GPT-3, GPT-4 demonstrated stronger multimodal processing capabilities and can process multiple data forms such as text and images [11]. In just a few years, the number of model parameters has achieved a leap from hundreds of millions to trillions, which has also brought huge computing and hardware requirements. In the context of the continuous expansion of MLLM scale, how to develop efficient hardware accelerators has become a top priority.

Image description

Figure 1 The development trend of MLLM [12]

As a new computing architecture, in-memory computing (Computing In Memory, CIM) is considered a revolutionary technology with potential. The key is to integrate storage and computing, effectively overcome the bottleneck of the von Neumann architecture, and combine advanced packaging in the post-Moore era, new storage devices, etc., to achieve an order-of-magnitude improvement in computing energy efficiency. In the MLLM field, in-memory computing technology can provide significant computational acceleration during MLLM training and inference. Since the core of neural network traversal and inference is large-scale matrix multiplication and convolution operations, in-memory computing can directly perform matrix multiplication and addition operations in storage units and perform excellently during large amounts of parallel computing. WTMemory Technology, as a leading enterprise in the domestic in-memory computing chip field, has mass-produced the WTM-8 in-memory computing chip, which can achieve complex functions such as image AI super-resolution, frame interpolation, HDR recognition and detection; has mass-produced the WTM-2101 in-memory computing chip, which has already achieved functions such as voice recognition that meet the end-side computing power requirements. In the future, in-memory computing chips will bring more possibilities for MLLM training and inference acceleration, helping MLLM development to reach a new level.

Image description

Image description

References:

[1]The Nobel Prize in Physics 2024 - NobelPrize.org

[2]The 2024 Nobel Prize in Physics was awarded to two "Godfathers of AI", which is an important moment in the AI academic field _ Tencent News (qq.com)

[3]Rosenblatt, F. (1958). The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.

[4]Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.

[5]Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1968). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.

[6]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105.

[7]Bommasani, R., Hudson, D. A., Adeli, E., & others. (2017). On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv:2108.07258.

[8]Brown, T. B., Mann, B., Ryder, N., & others. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

[9]OpenAI. (2023). GPT-4 Technical Report. OpenAI.

[10]Cheng, J., & Wang, L. (2023). Computing-In-Memory for Efficient Large-Scale AI Models. IEEE Transactions on Neural Networks and Learning Systems.

[11]OpenAI GPT-4 Documentation.(https://openai.com/index/gpt-4-research/)

[12]A Survey on Multimodal Large Language Models.
(https://arxiv.org/abs/2306.13549)

Top comments (0)