A key attribute that drives the unprecedented success of modern Recurrent
Neural Networks (RNNs) on learning tasks which involve sequential data, is
their ability to model intricate long-term temporal dependencies. However, a
well established measure of RNNs long-term memory capacity is lacking, and thus
formal understanding of the effect of depth on their a