How LLMs Work

AI Foundations

Module 02

Foundation

Choosing the Right Model: Maya discovers that writing a product description costs ~800 tokens (Haiku: $0.0006) and answering a customer email costs ~400 tokens (Haiku: $0.0003). At 2,000 products and 500 emails/week, ShopMate's monthly bill will be under $50 -- far less than hiring a copywriter. Understanding tokens and costs makes the business case obvious.

Large Language Models are transformer-based neural networks trained to predict the next token in a sequence. Understanding the core mechanics helps you prompt more effectively, interpret outputs more accurately, and know when to trust -- or question -- a model's response.

Training Pipeline

From Raw Data to Deployed Model

Tokens, Not Words

LLMs process text as tokens -- roughly 0.75 words each. A token is a byte-pair encoding unit. Understanding tokens explains why models sometimes split words oddly, why context windows are measured in tokens, and why code often costs more tokens than prose.

Attention Mechanism

The transformer's self-attention mechanism lets every token attend to every other token in the context window. This is why LLMs can reason over long documents -- but also why inference cost scales quadratically with context length.

Temperature and Sampling

After computing a probability distribution over possible next tokens, the model samples from it. Temperature controls the sharpness of the distribution: 0 = always the most likely token (deterministic), 1 = sample proportionally to probabilities.

Knowledge Cutoffs

LLMs know nothing about events after their training cutoff. They also have no real-time access unless given tools. Always verify time-sensitive facts with retrieval or web search rather than trusting training-data recall.

The Hallucination Problem

LLMs generate plausible-sounding text by predicting likely next tokens -- they are not retrieval systems and have no concept of "truth." This means they can confidently generate false facts, fake citations, or incorrect code. Always verify outputs for high-stakes decisions.

<-- What is Generative AI? Next: Prompting Principles -->