Some interesting conversations about LLMs have reached my doorstep of late, and I’ve found myself wanting to have some essential/useful reading on hand to send to people in order to ground these conversations in the real.

So here are a few recommendations for reasoning about Large Language models. I’m vaguely ordering these by accessibility, ie. Talking about Large Language Models is very easy to read and they get more technical from there.

Talking about Large Language Models (Murray Shanahan, 2024) is a pretty good, though not especially deep overview of why LLMs are so convincing.

If you still can’t shake the feeling that “the model knows me” (or “I know the model”), have a read of On the Dangers of Stochastic Parrots (Bender et al, 2021).

I Include Attention Is All You Need (Vaswani et al, 2017) because almost every major advance in LLMs since 2017 traces back to this paper, but it’s not essential to reasoning about the characteristics of these models.

Stephen Wolfram has a pretty exhaustive explainer on how Large Language Models fundamentally work in What Is ChatGPT Doing … and Why Does It Work? (2023), but even if you skip past the technical stuff, there’s plenty in there covering how these models are (mathematically) predicting what comes next, not thinking.