The primary objective of many applications is to approximate or estimate a function using samples obtained from a probability distribution on the input space. Deep approximation involves approximating a function by composing numerous layers of simple functions, which can be seen as a sequence of nested feature extractors. The fundamental concept of deep learning networks is to convert these layers of compositions into layers of adjustable parameters that can be fine-tuned through a learning process, ultimately achieving a high-quality approximation based on the input data. In this presentation, we will delve into the mathematical theory behind this innovative approach and explore the approximation rate of deep networks. Additionally, we will highlight the distinctions between this new approach and traditional approximation theory, and demonstrate how this novel theory can be leveraged to comprehend and design deep learning network.