Action tags: making an LLM actually do things

An agent that can only talk is a chatbot. An agent that can do things is a product. Here’s how we get the second without giving the model arbitrary code execution.


The two extremes

On one end of the spectrum: pure chat. The model talks; the user does. This is safe and easy to debug but limited — the model can recommend you mark a task done, but you have to go do it yourself.

On the other end: arbitrary tool use. You hand the model a function-calling interface (or worse, a code interpreter), and it can invoke anything. This is powerful and a nightmare to operate. Every call is a potential bug, a potential security issue, and a potential reason your user’s data ends up somewhere it shouldn’t.

The pattern we use sits in the middle. We call them action tags.


The shape

An action tag is a tightly-formatted bracketed string the model emits as part of its natural-language response. The user never sees it directly — a post-processor strips it before display. The dispatcher recognizes a closed set of tag types, validates each parameter, and applies the corresponding mutation through the app’s normal data layer.

Examples from iForgetalot:

[MARK_DONE id:"abc123" type:"step"]
[CREATE_TASK title:"File 2025 taxes" category:"Wealth" priority:"high"]
[SET_REMINDER id:"task-xyz" datetime:"2026-04-15T09:00"]
[ADD_STEPS id:"task-xyz" steps:"Gather W-2s|Open TurboTax|Enter income"]
[NAVIGATE target:"goal" id:"goal-789"]

The model is taught these via the system prompt with examples. When it wants to take an action, it includes the tag inline with its response. The dispatcher parses, validates, and executes.


Why this works in practice

  • Closed set of operations. The dispatcher only recognizes ~15 tag types. The model can’t invoke anything not on the list, no matter what it emits.
  • Parameter validation. Every parameter is typed (id is a UUID, datetime is ISO-8601, category is an enum). Bad values fail loudly at parse time, not silently in production.
  • Observability is free. Every emitted tag is one log line. You can answer “what did the agent do today” with grep.
  • Bounded blast radius. Even if the model hallucinates a tag, it can only do what’s in the action set. No file system access, no network calls, no shell.
  • Easy to extend. Adding a new action is one entry in the dispatcher and one example in the prompt. No tool-calling framework, no schema dance.
  • Debug locally. The same tag emitted by a local quantized model and a cloud frontier model produce the same effect. The behavior is in your code, not the model.

The tradeoffs

Action tags are not free.

  • The model needs to learn the syntax. Smaller models occasionally drop a quote or invent a parameter; we handle that with a tolerant parser plus a retry pass for common errors.
  • You give up some flexibility versus full function calling. New capabilities require code changes, not just prompt updates. We see that as a feature.
  • Long parameter lists get unwieldy. We cap each tag at five parameters and break complex actions into sequences.

When to use this

Action tags shine when:

  • Your agent acts on a known, finite domain (tasks, calendar events, notes, expenses)
  • You need the same behavior across multiple model sizes and providers (local + cloud + future swaps)
  • Safety and auditability matter more than maximum flexibility
  • You want a system you can still operate when the model vendor breaks function-calling syntax in a minor version

If you’re building an agent that has to do open-ended things — research the web, write arbitrary code, control a browser — function calling or a code interpreter probably wins. But for the vast majority of vertical AI products, action tags are the path of least operational regret.


We design and ship action-tag systems as part of Agentic System Builds. Talk to us if you’re considering this pattern for a product.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.