Why the Same Prompt Gives You a Different Answer Every Time
LLMs do not pick the next word. They pick from a probability distribution. Understanding temperature and sampling explains why AI feels creative sometimes and robotic other times.
AI & Machine Learning
An AI that can explain quantum physics will confidently tell you there are two r's in strawberry. The reason is not stupidity. It is that the model never sees the word the way you do.
LLMs do not pick the next word. They pick from a probability distribution. Understanding temperature and sampling explains why AI feels creative sometimes and robotic other times.
Every token your AI generates has a memory cost. Understanding the KV Cache explains why long conversations are slower, more expensive, and why context windows have limits.
A model with hundreds of billions of parameters shouldn't be fast. Here is the architectural trick that makes it possible.