← Back to Field Notes
Part 1 of the AI Systems series

AI as a System (Part 1): AI is a Stack

This is Part 1 of the AI as a System series.
See the full series here.

A clear mental model for understanding how modern AI actually works

Unless you’ve been keeping up with AI closely, trying to move beyond ChatGPT can feel overwhelming.

There’s ChatGPT, Cursor, GitHub Copilot, Grok, Claude, and more. All of these behave differently. Some answer questions, help plan vacations, or rewrite emails so you sound less annoyed at your coworker. Others autocomplete code or even generate entire applications.

So what’s actually different?

AI is Not a Single Product

Most people think of AI as a single product, like ChatGPT. That’s similar to thinking Chrome is the internet. It captures the interface, but not the system behind it.

AI isn’t a single thing. It’s a stack of systems working together.

AI Agents

An AI agent is a system that can take a goal and actually do something to achieve it, instead of just responding.

This is different from tools like ChatGPT, which primarily expose a model. You can think of an AI model as a smart advisor that answers your questions. You can think of the AI agent as an assistant that takes a task and handles it for you.

Instead of generating a single answer, an agent understands a goal, breaks it down into steps, and uses available tools - such as apps, APIs, or files - to take action. Along the way, it makes decisions, adjusts when something doesn’t work, and continues until the task is completed or blocked.

The AI Stack

At a high level, AI systems are built from four layers: the model, tools, instructions, and memory. Each layer plays a different role, and understanding those roles makes the whole system easier to reason about.

Using AI is like baking a cake. You need a recipe, equipment (oven, mixer, etc.), and instructions (mix the ingredients, pour into a pan, bake for 30 minutes). The recipe is the model, the equipment is the tools, and the instructions are, well, the instructions. Memory is what carries across each bake. It’s your notes, past experiences, and preferences (like knowing the oven runs hot), remembering that the last cake needed less sugar, or having a favorite recipe you always come back to. It allows you to adjust, improve, and stay consistent over time, instead of starting from scratch every time you bake.

Layer 1: Models

Models are the brain of the agent - the actual AI systems doing the work.

These are typically LLMs (Large Language Models). Examples include GPT from OpenAI, Claude from Anthropic, and Gemini from Google.

Models do not think in the way people do, even though that word - “think” - is often used. What they actually do is predict the next word or phrase based on patterns learned from large datasets using neural networks. In practical terms, AI is highly advanced pattern prediction.

A model’s responsibility is to interpret what you want to do, turn it into subtasks, reason through the process, and determine how to proceed.

  • OpenAI models (e.g., GPT) → Strong general reasoning, writing, and broad task support
  • Anthropic models (e.g., Claude) → Clear explanations, long-form writing, and structured thinking
  • Google models (e.g., Gemini) → Multimodal tasks and integration with Google ecosystem
  • xAI models (e.g., Grok) → Real-time information and conversational responses

Layer 2: Tools

A tool is any outside function or system an agent can call on to perform actions, such as APIs, databases, or web or internal services.

You typically don’t interact with models directly. Instead, you use tools and interfaces that wrap them.

Tools are what allow agents to take action beyond generating text. They connect the model to the outside world.

Examples include:

  • APIs (calendar, email, ticketing systems)
  • Databases and file systems
  • Search and retrieval systems
  • Internal company services

Layer 3: Instructions (The Behavior)

Instructions define the agent’s role and boundaries.

Instructions are sometimes called system prompts. Prompt engineering is the practice of designing these instructions effectively.

Layer 4: Memory

Memory is what allows an agent to carry context across steps and over time. It includes things like prior interactions, stored data, preferences, and past decisions. Instead of starting fresh each time, memory enables the agent to stay consistent, make better decisions, and improve outcomes based on what it already knows.

There are different levels of memory:

  • Short-term - This holds recent context, like the last few sentences.
  • Long-term - This persists across sessions. This is how tools like ChatGPT “remember” past interactions and improve future responses. For example, if you frequently have ChatGPT rewrite emails for you, it will typically provide 3-4 different options. If you tell ChatGPT “I’ll go with option 3,” it will remember that writing style for later. It can also retain details from the conversation, which may include information about your job (assuming it’s a work email), so be careful with what information you provide. I tend to replace key words with “company” or “person A”. For example:
Re-write this email:
"Hi person a! Here is the information you requested about (tool) that is being used by (company) even though it's an annoying system that doesn't work well. Let me know if you have any questions!"

This way, ChatGPT can re-write this to be more professional:

Hi [Person A],

Here is the information you requested regarding (tool), which is currently in use at (company). Please let me know if you have any questions.

Real-World Example of How the Layers Work Together

Consider this task: “Order more printer ink before we run out.”

  • Instructions define the agent’s role and constraints (e.g., preferred vendors, budget limits)
  • Memory provides context (previous orders, preferred vendors, printer model, typical usage)
  • The model interprets the request and determines what information is needed (printer type, current inventory)
  • Tools check inventory levels and identify the correct ink cartridge
  • The model evaluates options based on price, availability, and vendor preferences
  • Tools place the order through a purchasing system or website
  • Memory is updated with the new order and usage pattern
  • The model confirms: “Ink has been ordered and will arrive on Thursday.”

Interfaces

Interfaces are how users interact with AI systems. They wrap models and expose tools, but are not part of the underlying system itself.

ChatGPT is a consumer-facing interface built on top of a model, such as GPT-4. It is a web app with a chat interface that uses the model to make it useful for every-day tasks. It has extra features on top of the model, such as memory, analysis, web searching, etc.

Common interface types include:

  • Chat-based tools → Best for thinking, writing, and exploring ideas
    • ChatGPT → A general-purpose AI assistant. Best for writing, research, problem-solving, and broad tasks across domains.
    • Claude → A conversational AI tool focused on clarity and long-form reasoning. Best for writing, analysis, and structured explanations.
    • Google Gemini → Integrated across Google products. Best for search-connected tasks and workflows within Google’s ecosystem.
  • IDE-integrated tools → Best for assisting while you code
    • GitHub Copilot → An in-editor coding assistant. Best for real-time code suggestions, autocomplete, and speeding up development inside existing IDEs.
  • AI-first editors/IDEs → Best for building software with AI deeply embedded in the workflow
    • Cursor → An AI-first code editor. Best for writing, editing, and navigating code directly within a development environment.
    • Windsurf → A full AI-powered development environment. Best for end-to-end coding workflows, from generation to iteration within a single tool.

Which AI should I use?

Instead of asking “which AI should I use,” it’s more useful to think in terms of the system.

The underlying models are often similar across tools. What really changes is the interface and environment—how you interact with the model, what context it has access to, and what actions it can take.

A chat interface is optimized for thinking and writing. An IDE integration is optimized for coding workflows. An AI-first editor is built for end-to-end development with AI embedded throughout.

The better question isn’t which AI is “best,” but which combination of model and interface best fits the task.

Systems Thinking Insight

If you come from a cloud, DevOps, or systems background, this structure should feel familiar.

AI systems increasingly resemble distributed systems, service layers, and orchestration pipelines. The difference is that instead of services calling APIs directly, models reason about those interactions.

What’s Next

Now that the structure of the stack is clear, the next step is to look at each layer in more detail.