What is an agentic workflow?
Standard AI: you ask, the AI responds. One exchange. An agentic workflow is different: you give a goal, the AI plans steps, calls tools, evaluates results, and loops until done, running multiple actions without you directing each one.
The loop is what makes it "agentic": plan → act → observe → plan again. Where a chatbot answers a question, an agentic workflow executes a process (pulling data, writing documents, checking results, revising) from a single goal input.
The word "agentic" comes from agency: the capacity to act independently and purposefully. An agentic system has agency: it doesn't wait for your next instruction at every step. IBM defines it as AI that can "accomplish a specific goal with limited supervision." Anthropic draws a sharper line: in an agentic system, the LLM dynamically directs its own processes and tool usage rather than following predefined code paths.
Three tiers of AI automation
Not all "AI automation" is the same. There are three meaningfully different tiers, and only the third is truly agentic.
Tier 1: Traditional automation
Rule-based. If X then Y. Robotic process automation (RPA) sits here. No LLM, no reasoning: just scripted logic that breaks when reality doesn't match the script. Reliable for simple, repetitive tasks; brittle for anything that requires judgement.
Tier 2: Non-agentic AI
An LLM generates a response from a prompt. One call, one output. Useful for drafting, summarising, or classifying, but the model doesn't take actions, use tools, or remember anything between calls. Your ChatGPT conversation is Tier 2.
Tier 3: Agentic workflow
An agent uses an LLM to plan, calls tools to act, reflects on results, and loops until the goal is met. It can adapt mid-task: if one tool fails, it substitutes another. IBM documented an example where a web search API failed mid-task and the agent automatically switched to a Wikipedia search tool and completed the job unchanged. The process is dynamic, not scripted.
The agent loop
What "tools" means
An agent's "tools" are functions it can call: web search, API requests, code execution, file reading/writing, email sending, database queries. The agent decides which tool to call based on what it needs to do next. You define what tools are available; the agent chooses when to use them. When an LLM selects and calls a tool, it's called function calling. This is how an agent reaches beyond its training data and interacts with the real world. LLMs by themselves can't directly interact with external tools or databases in real time. Only agents can.
Memory: how agents remember
A plain LLM has no memory between calls: each conversation starts blank. Memory is what turns a stateless model into a persistent, learning agent. Agentic frameworks use three distinct memory types, and understanding them matters when you're deciding what your agent can actually do.
Short-term memory
Conversation history and intermediate results held in the context window during a single task run. The agent can refer back to what it already tried, what tools returned, and what decisions it made. Ends when the task ends; nothing carries over to the next run. Also called in-context memory.
Long-term memory (persistent)
Information stored externally and retrieved in future sessions. A client brief an agent processed last month can be recalled for a new task today. Also called persistent memory. This is what lets an agent personalise responses, build on past work, and avoid relearning the same context each time it runs.
External / vector memory
Large knowledge stores queried by semantic similarity: a vector database holds embeddings of your documents, past outputs, or client data. The agent queries it like a search engine, pulling only the relevant chunks into context. Common storage options: vector stores (Pinecone, Weaviate) for unstructured content, key/value stores (Redis) for fast structured lookups, and knowledge graphs (Neo4j) for complex relational data where connections between facts matter.
Standard AI vs agentic workflow
| Standard AI prompt | Agentic workflow | |
|---|---|---|
| Input | Your question | Your goal |
| Steps | One response | Multiple planned steps |
| Tool use | No | Yes: web, APIs, files, databases |
| Memory | None between calls | Short-term + optional persistent |
| Adapts to failure | No | Yes: substitutes tools, retries steps |
| Human direction | Every exchange | Set goal, review output |
| Best for | Answering questions | Executing multi-step processes |
Five common workflow patterns
Most production agentic systems are built from a small set of composable patterns. You can combine them. Anthropic's engineering team (who works with hundreds of production agent deployments) found that the most effective systems use the simplest pattern that gets the job done, not the most sophisticated one available.
Prompt chaining
Each LLM call feeds its output into the next. Good for tasks with a fixed sequence of steps where each step is an easier problem than tackling the whole. Each step can include a programmatic check before proceeding. Example: research topic → write outline → check outline meets criteria → write draft.
Routing
An initial LLM classifies the input, then routes it to a specialised downstream path. Allows each path to be optimised for its type of input without one path's requirements degrading another. Example: a support triage agent that routes billing questions, technical bugs, and feature requests to three separate specialised workflows.
Parallelisation
Multiple agents work on independent sub-tasks simultaneously, then their outputs are combined. Two variants: sectioning (breaking a task into parallel independent parts) and voting (running the same task multiple times, then aggregating results for higher confidence). Useful when sub-tasks don't depend on each other: auditing five client accounts in parallel rather than one at a time.
Orchestrator-workers
A central "orchestrator" LLM dynamically breaks down a complex task and delegates sub-tasks to specialist "worker" LLMs. The orchestrator synthesises their outputs. Unlike parallelisation, the sub-tasks aren't predefined; the orchestrator determines them based on the specific input. Best for open-ended tasks where the exact steps aren't known upfront: complex research, multi-file code changes, or analysing information from multiple sources.
Evaluator-optimiser
One LLM generates output; a second LLM critiques it and provides feedback; the first revises. The loop runs until the evaluator is satisfied or a step limit is hit. Works well when LLM responses demonstrably improve given structured critique: writing, translation, complex analysis. Analogous to the iterative editing process a human writer uses to reach a polished final draft.
Multi-agent systems
A single agent has one context window: it can only hold so much information at once and can only use one set of tools at a time. Multi-agent systems solve this by running multiple specialised agents in parallel or in sequence, each with its own role, tools, and memory slice. Tasks that would overflow a single agent's context become tractable when split across agents.
IBM describes two main multi-agent architectures:
Vertical: conductor + workers
A conductor agent (typically a more capable LLM) oversees the task, handles planning, and delegates to simpler specialist worker agents. Workers handle specific subtasks: data retrieval, code execution, email sending, document writing. Fast for sequential workflows with clear handoffs, but the conductor is a single point of failure and a potential bottleneck.
Horizontal: peer agents
Agents operate as equals, each contributing domain expertise. A research agent, a writing agent, and a fact-checker might collaborate on a content piece, each reviewing and passing work back to the others. More resilient than vertical because there's no single bottleneck, but slower and harder to coordinate cleanly.
Coordination and risk
The orchestration layer handles handoffs, passing task state between agents, monitoring progress, managing failures. As more agents work in series, the risk of cascading errors grows: a flawed output from one agent can corrupt every downstream agent that depends on it. IBM notes that multi-agent systems can also produce traffic jams, bottlenecks, and resource conflicts. Guardrails, human checkpoints, and maximum step limits are not optional: they're the architecture.
Orchestration frameworks
You don't build agentic workflows from scratch. These frameworks handle the plumbing (LLM calls, tool definitions, memory management, agent coordination) so you focus on the logic that's specific to your use case.
| Framework | Best for | Style |
|---|---|---|
| LangChain | Chaining LLM calls with tools and retrieval | Code-first (Python / JS) |
| LangGraph | Stateful multi-agent graphs with cycles and branching | Code-first (Python) |
| CrewAI | Role-based multi-agent teams with defined tasks | Code-first (Python) |
| AutoGen | Conversational multi-agent collaboration (Microsoft) | Code-first (Python) |
| n8n | Visual workflow builder with AI node support | Low-code / visual |
Anthropic's engineering team recommends starting with direct LLM API calls before adopting a framework. Frameworks add abstraction layers that can hide what's happening under the hood and make debugging harder. Start simple, graduate to a framework once you understand the shape of your problem.
Agency use cases
Weekly client reporting
Agent pulls data from Google Ads and GA4, writes a plain-English summary, flags anomalies, and drops a draft for account manager review. That replaces 2 hours of manual work per client per week.
Competitive monitoring
Agent searches competitor mentions across news and social, categorises by sentiment and relevance, and produces a weekly briefing without a human touching it until review. No dashboard to check, no news tabs to maintain.
Post-launch QA
Agent crawls every page of a newly launched site, checks for broken links, missing meta tags, slow images, and console errors, then produces a prioritised fix list before the account manager's morning coffee.
Content production pipeline
Brief goes in: agent researches the topic via web search, drafts a structure, writes a first draft, self-critiques for tone and accuracy (evaluator-optimiser pattern), and flags any claims it couldn't verify with a source. A human reviews the flagged items. The whole draft takes minutes, not a day.
Lead research and enrichment
Agent takes a prospect list, queries company websites and news sources for each name, and returns a scored profile (company size, recent news, tech stack signals, open roles) ready to push into your CRM. Replaces 20–30 minutes of manual pre-call research per prospect.
Proposal and SOW generation
Agent reads discovery call notes, retrieves relevant case studies from your knowledge base via vector search, maps the project scope to your service tiers, and assembles a first-draft proposal document. The account manager edits the judgement calls and signs off; the agent handles the assembly and research retrieval.
When not to use agentic workflows
Anthropic's engineering team is direct about this: "we recommend finding the simplest solution possible, and only increasing complexity when needed." Agentic workflows trade latency and cost for capability. That trade isn't always worth making.
Simple, deterministic tasks
If a single LLM call (or a simple rule) reliably does the job, adding an agent loop is unnecessary complexity and cost. Draft a one-paragraph summary? One call. Research and produce a 10-source briefing? Agent. The question to ask: does this task actually require planning and tool use, or does it just require good prompting?
Latency-sensitive operations
Each loop iteration means another LLM call. Multi-agent systems multiply that by the number of agents. A task that needs an answer in under two seconds can't wait for three rounds of planning and reflection. Use direct LLM calls for real-time, user-facing responses.
High cost per iteration
LLM API calls cost tokens. A multi-agent workflow that runs 20 tool calls and reflection passes to produce a report costs 20× the token budget of a single call. For high-volume, low-margin tasks (think bulk email personalisation at scale) the economics can make agents impractical.
Compounding errors in long pipelines
Each step in an agentic workflow can introduce error, and errors compound. In a 10-step pipeline, a wrong assumption at step 3 propagates through steps 4–10. For consequential outputs (client-facing documents, financial data, legal text) the autonomous nature of agents increases the blast radius of mistakes. Human review checkpoints within the workflow, not just at the end, are the mitigation.
Poorly specified goals
Agents optimise for the goal they're given. If the goal is underspecified (say, "increase engagement" instead of "increase replies from decision-makers") the agent will find ways to technically satisfy it that you didn't intend. IBM calls this reward hacking. Precise goal definition isn't optional; it's the most important design decision in any agentic system.
Frequently Asked Questions
Is agentic AI the same as AI automation?
What tools enable agentic workflows?
How much human oversight is needed for an agentic workflow?
What's the difference between an agentic workflow and a chatbot?
Can agentic workflows replace account managers?
Related Terms
An AI technique where the model searches your own documents or data before generating a response, so answers are grounded in your specific information, not just the model's training.
Read more → AI AgentAn AI system that can perceive its environment, make decisions, and take actions autonomously to achieve a goal. Unlike a chatbot that just responds, an agent acts.
Read more → Human-in-the-LoopAn AI system design where a human reviews, validates, or approves AI outputs at key decision points, rather than letting the AI act fully autonomously.
Read more →Sagely
Put it into practice
Sagely helps agencies manage clients without the chaos: branded portals, approval workflows, and structured communication in one place.
Start free trialAlso in the Handbook
- Client Portal
- Retrieval-Augmented Generation
- AI Agent
- Human-in-the-Loop
- Content Approval Workflow
- Net Promoter Score
- Model Context Protocol
- Prompt Engineering
- Website Project Delivery
- Scope of Work
- Statement of Work
- Change Order
- Resource Allocation
- Project Charter
- Capacity Planning
- Discovery Call