and Meta proposed a new architecture called test-time training (TTT). Developed over a year and a half, TTT models promise to process more data than transformers while using much less compute power.