THE ALGORITHM IS SIMPLE

← go back

may 28 2025

Transformers are a skew form of intelligence, totally unlike intelligence of animals, but still a definitely a form of intelligence. I think one could even argue that for its domain, modern LLMs are a superhuman general intelligence. Modern LLMs are so amazing, in fact, we forget that transformers are probably not THE optimal solution to intelligence.

John Carmack's notes bring up some good reminders to stay beware of Transformers. He explained the methodology behind modern-day LLMs well: throw-everything-in-a-blender. It's the brute force approach to general intelligence. This approach works very well, is world changing, and has lots of people working on it. However, this form of intelligence requires billions of dollars in funding and has led to a very secretive industry with little to no open research. I think it is worth it to go the complete opposite route: use the Bitter Lesson and see if we can reimagine how we can create a more optimal solution to intelligence (with, of course, our research aided by an LLM).

Lets say you lived in a society that had already created AGI, and even ASI. If you were to interact with those models, you could probably get a 2025 laptop, and 2025 GPU, a cheap USB camera, some wheels, wire everything together, and then have a working Star Wars droid like R2D2.

What properties of LLMs still haven't emerged? What are some open problems?

1. Learning from a continuous stream of data, like humans and all animals.

2. Learning from watching others perform a task

3. Learning by interacting with the environment

4. Learning while doing a task (brains don't use backpropagation)

5. Learning with batch_size=1 without without catastrophic forgetting

6. Episodic memory (context windows don't count).

So, there's still a lot of interesting problems to be solved. At least 5 major pieces left.

We haven't solved Moravec's Paradox. Fruit flies have 200k neurons and can navigate the environment incredibly well. They also use a microwatt of energy, so The Algorithm should be a pretty simple algorithm.