Background:
This is my homework for CS 540, a proof based graduate course in deep learning theory. The homework was designed by Prof. Telgarsky [Link to original problem set] Many of the proofs for this homework are regarding universal approximation and the limits of using networks to universally approximate functions.
Overview
- Show when universal approximation works well and when it doesn't
- a) Show that for any loss, the best neural network performs as well as the best continuous functions
- b) Show that bounding approximation error by 1-norm is not enough, and that a uniform norm is required. i.e. two functions may be close on average (1-norm), but if they are not bounded by a uniform norm (max), they could yield wildy different losses due to distribution of the data
- c) Show that functions that change infinitely like sin(x) cannot be approximated to bounded error
- d) Show the equivalence between a wide shallow network, and deep narrow network by construction
- General statements of universal approximation use a number of nodes proportional to the Lipschitz constant of the approximated function. However, parts of the function that change less should use less nodes. This shows an adaptive construction for universal approximation.
- a) Show that bounded variation norm is less than Lipschitz norm
- b) Cases where bounded variation norm is much smaller than Lipschitz norm
- c) The adaptive construction which uses number of nodes proportional to BV-norm instead of Lipschitz constant
- d) Show that ReLU networks can also be used for the approximation
- Show that two data-points which are close together are hard to approximate. When first layer weights are initialized randomly, there is a high probability that to separate the two points, second layer weights need to be picked with very high norm(increasing complexity).
- Show various properties of approximation
- Approximation of functions is closed under single derivative and linear combination
- Approximation of functions is closed under multiple derivatives (induction)
- Monomials can be aproximated
- exp(x) can be approximated