The First International Congress of Basic Science (ICBS)

Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7\% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the *Optimal Scaling Thesis*, which connects infinite-size limits for general notions of “size” to the optimal design of large models in practice. I'll end with several concrete key mathematical research questions whose resolutions will have incredible impact on the future of AI.

Date	Time	Local Time	Room	Session	Role	Topic
2023-07-23	16:00-16:45	2023-07-23,16:00-16:45	Chaoyang Kexie Blue Hall	07-23 Afternoon Basic Science Lectures	Speaker	The unreasonable effectiveness of mathematics in large scale deep learning Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7\% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the Optimal Scaling Thesis, which connects infinite-size limits for general notions of “size” to the optimal design of large models in practice. I'll end with several concrete key mathematical research questions whose resolutions will have incredible impact on the future of AI.

ICBS 2023

京ICP备2022029550号-2

京公网安备 11011602001029号

ICBS 2023