2023-07-19 |
13:00-13:45 |
2023-07-19,13:00-13:45 | LR12 (A7 3F) |
07-19 Afternoon TCIS Lecture Room 12 (A7 3F)
|
Speaker |
Convergence of computer vision and natural language processing As mankind, we can accomplish various intelligence capabilities, such as vision, language, and science, simply by using a single neuron organ called the cerebral cortex. The pre-training of cortical neurons for different capabilities also relies heavily on a similar mechanism of predictive learning. These unified biological mechanisms have enabled human beings to adapt quickly and effectively to new environments and acquire new capabilities without millions of years' biological evolution. In artificial intelligence, architectures and learning methods in various domains are also converging. Transformer, which emerges in the field of NLP, is now taking over previous domain-specific architectures in several fields, such as computer vision, speech, science, etc. Generative pre-training such as GPT has also been shown to be very effective in all of NLP, vision, and speech. This talk will introduce the journey towards these convergences, as well as the representative works that have driven this trend. The talk will also present several representative research efforts by the speaker's team, including Swin Transformer V1/V2, SimMIM, etc.
|