By redefining the conventional notions of layers, we present an alternative
view on finitely wide, fully trainable deep neural networks as stacked linear
models in feature spaces, leading to a kernel machine interpretation. Based on
this construction, we then propose a provably optimal modula