Credit assignment in traditional recurrent neural networks usually involves back-propagating through a long chain of tied weight matrices. The length of this chain scales linearly with the number of time-steps as the same network is run at each time-step. This creates many problems, su