We owe a debt of gratitude to Max Bileschi, Roy Frostig, Zelda Mariet, Stan Bileschi, Mohammad Norouzi, Chris DuBois and Charles Sutton for reading the manuscript and providing valuable feedback.
We reused some experimental data for several plots that were originally produced by Naman Agarwal for other joint research.
We would like to thank Will Chen for invaluable advice on the presentation of the document.
We would also like to thank Rohan Anil for useful discussions.