From One-Off Agents to Workflows That Run Without You

Introduction

Many teams have adopted an AI agent for a single job: review this PR, draft this doc, suggest a fix. The next step—workflows that run without you, with clear handoffs and optional human approval—feels out of reach. It isn’t. The gap is progression: knowing where you are, what blocks the next stage, and what unblocks it. This post maps that path in tool-independent terms so you can level up regardless of which agent or platform you use.

Where We Are

Past: Work was manual or scripted. Then came one-off LLM prompts (explain, draft, suggest). First “agent” tools did one task per invocation—synchronous, human-triggered every time. The ceiling was one session, one task.

Present: Multi-step workflows exist: pipelines with clear steps and handoffs (e.g. discover → qualify → enrich → deliver). Orchestrator-plus-specialist designs are common in research and early adopters. Workflow-as-code and optimization (e.g. which steps matter for which task) are emerging. Long-running, background execution is possible in some systems (durable state, webhooks, no timeout)—but not yet mainstream. Production-grade practice—decomposition, single-responsibility agents, tool-first design, deployment and governance—is still early adopter. Most teams sit at single-task or defined multi-step with a human at the start; human-in-the-loop at approval points remains the norm.

Future: Automated workflow design and optimization will reduce manual wiring. Durable, event-driven runs and stronger observability and governance will spread. The “humans steer, agents execute” model may become default for some orgs.

The Progression Chain

You can place yourself and your team on this chain. Each stage is a clear step up.

No agent use — All work is manual or scripted; no LLM agents.
Ad-hoc single-task use — Occasional one-off prompts or single agent calls (explain, draft, suggest).
Repeated single-task patterns — The same kind of task run often (e.g. “review this PR,” “generate this doc”) but each run is separate and human-initiated.
Defined multi-step workflow (human-triggered) — A pipeline of steps with handoffs (e.g. discover → qualify → enrich); a human starts the run; it may be synchronous or batch.
Long-running / background workflow — Same as (4) but async: triggered by event or schedule, state persisted, result via webhook or polling; human only at approval points or on failure.
Production-grade multi-agent — Multiple agents, clear orchestration, tool-first design, deployment and observability, governance; may include background and event-driven runs.

Sticking Points and How to Progress

Single-task → Multi-step workflow

What blocks you: No clear step boundaries; no handoff contract (inputs/outputs); state only in one session; “one big prompt” doesn’t scale; trust (“what did the agent actually do?”).

What unblocks: (1) Decompose into steps with explicit inputs and outputs per step. (2) Give each step one responsibility (single-responsibility agents). (3) Use handoff artifacts (e.g. structured payload) between steps. (4) Add validation at step boundaries (schema, checks). (5) Keep prompts and config outside the workflow logic so you can change copy without changing code.

Multi-step (human-triggered) → Long-running / background

What blocks you: Session timeouts; no persistent state across runs; a human has to be there to start; no way to get results when the run is long; unclear failure recovery and retries.

What unblocks: (1) Introduce durable execution (persistent state, checkpointing, replay). (2) Define triggers (event, schedule, API) so runs can start without a human in the loop. (3) Define result delivery (webhook, polling, queue) so humans don’t block on the run. (4) Put human-in-the-loop only at approval steps or escalation. (5) Design idempotency and retries for external calls (e.g. deliver once).

Any stage → Production-grade

What blocks you: Reliability, observability, security, drift in prompts/config, deployment and scaling.

What unblocks: (1) Tool-first design: define interfaces (ports) before choosing providers (adapters). (2) Single-responsibility agents and pure-function-style tool calls where possible. (3) Externalized, versioned prompt and config. (4) Clean separation between workflow logic and tool servers. (5) Containerized, environment-aware deployment; Responsible AI and governance considered up front. (6) Keep it simple so you can maintain and debug it.

Conclusion

Progression from one-off agents to workflows that run without you is a matter of clear steps, handoffs, durable state, triggers, and human-in-the-loop only where it matters. The same ideas apply whether you use open-source runtimes, vendor platforms, or your own orchestration. Start by drawing one step boundary in your current setup—one clear input and output—then add another. For a prioritised list of which agentic workflows to implement first (PR review, code + draft PR, task breakdown, and more), see Which agentic workflows every team should implement.

Context as Code — Structure and methodology so your repo is a system of record agents can use. Packs and docs that support progression toward harness-style engineering.