Building AI Agents in Ruby with the Anthropic SDK
Build AI agents in Ruby on Rails using the official Anthropic SDK. The agent loop, tool design, tool_runner, streaming, authorization, and production patterns.

An AI agent is a language model that takes actions, not just produces text. You hand it a goal and a set of tools, functions it is allowed to call, and it decides which to use, runs them, reads the results, and keeps going until the task is done. That loop of deciding, acting, and observing is the whole difference between an agent and a single prompt. A support agent that looks up a customer’s invoices and drafts a reply, or an internal tool that pulls data from three systems to answer a question, is an agent in this sense.
Rails is an excellent place to build an agent, and in some ways a better one than the alternatives, because the hard parts of a production agent are authorization, background processing, observability, and clean domain logic, not the API calls themselves. A mature Rails app already has all of those. The agent layer, done right, is a thin adapter on top of code you already trust.
The official Anthropic Ruby SDK ships with streaming, connection pooling, and a tool runner that handles the agent loop for you. This post covers what an agent actually is, how to structure one in Rails, how to design tools the model can use reliably, and the production concerns that separate a demo from something you can put in front of users.
What an Agent Actually Is
Strip away the marketing and an agent is simple. In Anthropic’s words, “agents are typically just LLMs using tools based on environmental feedback in a loop” (Building Effective Agents). That is the whole idea. The model receives a goal, decides whether it needs to call a tool, you execute the tool and feed the result back, and the loop repeats until the model decides it is done.
The same article draws a distinction worth internalizing before you write any code. “Workflows are systems where LLMs and tools are orchestrated through predefined code paths,” while “agents … are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.” Workflows are predictable and consistent; agents are flexible at the cost of higher latency, higher token spend, and the potential for compounding errors. The most consequential decision you will make is which of these you actually need, and the honest answer is usually “less agent than you think.”
Anthropic’s own guidance is to find “the simplest solution possible, and only increasing complexity when needed,” adding complexity “only when it demonstrably improves outcomes.” For many features, a single well-prompted model call with good context beats an autonomous agent on every axis that matters. Reach for a true loop only when the task is open-ended enough that you genuinely cannot predict the steps in advance.
The Minimal Agent Loop in Ruby
Start with the official gem. Add it to your Gemfile:
# Gemfile
gem "anthropic"
The client is threadsafe and maintains its own connection pool, so create it once and reuse it. In Rails, an initializer is the natural home:
# config/initializers/anthropic.rb
ANTHROPIC = Anthropic::Client.new(
api_key: ENV.fetch("ANTHROPIC_API_KEY")
)
A single model call looks like this:
message = ANTHROPIC.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Summarize Q1 in one sentence." }]
)
puts message.content
That is not yet an agent, because there is no loop and no tools. The loop is what makes it agentic, and conceptually it is just this: send the conversation to the model, check whether the model wants to use a tool, run the tool if so, append the result to the conversation, and repeat until the model stops asking for tools. Written out by hand, the loop is only a dozen lines, and it is worth seeing it in full once before you let the SDK hide it, because understanding what is under the abstraction is the difference between debugging an agent in ten minutes and debugging it in two days.
def run_agent(client:, tools:, messages:, model: "claude-sonnet-4-6")
loop do
response = client.messages.create(
model: model,
max_tokens: 1024,
tools: tools.map(&:definition),
messages: messages
)
# The model is done when it stops asking to use tools.
break response if response.stop_reason != "tool_use"
messages << { role: "assistant", content: response.content }
tool_results = response.content
.select { |block| block.type == "tool_use" }
.map { |block| execute_tool(tools, block) }
messages << { role: "user", content: tool_results }
end
end
This is the core of every agent. Everything else is refinement: better tools, streaming, error handling, observability, and guardrails. The model drives, your code executes, and ground truth from each tool result flows back so the model can assess its own progress.
Designing Tools the Model Can Actually Use
The lesson that catches most engineers off guard: you will spend more time on your tools than on your prompts. When Anthropic built their own coding agent, they spent more time optimizing the tools than the overall prompt. A tool definition is an interface, and the model is the consumer of that interface. A confusing tool produces a confused agent.
Anthropic frames this as the agent-computer interface, or ACI: “Think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI).” A tool definition should read like a docstring written for a competent new engineer with no other context: what it does, when to use it, what each parameter means, and where the edges are.
The official SDK lets you define tools as Ruby classes, with a typed input schema, which is a natural fit for Rails. Here is a real one:
class LookupInvoicesInput < Anthropic::BaseModel
required :customer_id, Integer
optional :status, Anthropic::InputSchema::EnumOf[:draft, :open, :paid, :overdue]
optional :limit, Integer
end
class LookupInvoices < Anthropic::BaseTool
description <<~TEXT
Look up invoices for a single customer. Use this when the user asks about
a specific customer's billing, outstanding balance, or payment history.
Returns at most `limit` invoices (default 20), newest first. Does not
search across customers; call once per customer.
TEXT
input_schema LookupInvoicesInput
def call(input)
scope = Invoice.where(customer_id: input.customer_id)
scope = scope.where(status: input.status) if input.status
scope.order(created_at: :desc)
.limit(input.limit || 20)
.as_json(only: %i[id number status amount_cents due_on])
end
end
Several choices here are deliberate.
The description tells the model when to use the tool, not just what it does, and it explicitly states a boundary (“does not search across customers”). Models make mistakes at exactly these boundaries, so naming them in the description prevents whole classes of error. This is the agent equivalent of a technique Anthropic used in their own work, where switching a tool to require absolute file paths rather than relative ones eliminated a recurring model mistake. Make the right usage the obvious usage.
The input schema is typed and uses an enum for status, which means the model cannot invent a status value that your code does not handle. Constrain the inputs so that it is hard to make a mistake. This is the software equivalent of mistake-proofing: the best way to handle a bad input is to make it unrepresentable.
The return value is a deliberately narrow projection, not the full ActiveRecord object. This matters for two reasons. First, every field you return is tokens the model has to read and you have to pay for, so returning columns the task does not need is pure waste. Second, your database rows often contain fields you do not want in the model’s context at all. Be intentional about what crosses the boundary.
A good rule: start with a few thoughtful tools that target specific high-impact tasks, not a sprawling library of thin wrappers around every endpoint you have. A handful of well-designed tools that compose well beats fifty that overlap and confuse.
Let the SDK Run the Loop: the Tool Runner
Once your tools are classes, the SDK can run the entire agent loop for you. The tool_runner calls the model, executes any tools the model requests, feeds the results back, and continues until the model produces a final answer, all without you hand-writing the loop shown earlier.
runner = ANTHROPIC.beta.messages.tool_runner(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "What does customer 4471 still owe?" }],
tools: [LookupInvoices.new]
)
runner.each_message do |message|
# Each turn of the conversation streams through here:
# assistant tool-use requests, your tool results, and the final answer.
Rails.logger.info(message.content)
end
This is the right default for most agents, because the loop logic is identical across every agent and there is no value in reimplementing it. Write the loop by hand only when you need to do something the runner does not, such as injecting a human approval step in the middle, enforcing a custom stopping condition, or persisting state between turns in a specific way. Even then, understanding the manual loop from the previous section is what lets you decide.
One production note: the tool runner lives under the beta.messages namespace, as the official auto_looping_tools example shows. Anything under beta can move between releases, so pin your version and read the changelog before upgrading.
Choosing the Right Model for Each Step
Not every step of an agent needs your most capable model, and using one everywhere is how token bills balloon. Anthropic’s lineup splits by job. Haiku is the fast, inexpensive model; Sonnet is the balanced workhorse; Opus is the most capable for hard reasoning. A common and effective pattern is to route by difficulty: send the cheap, high-volume classification or extraction steps to Haiku, run the main agent loop on Sonnet, and reserve Opus for the rare step that needs deep reasoning.
This maps directly onto the routing workflow that Anthropic recommends: classify the input first, then direct it to the right model. In a Rails agent, that often looks like a cheap first call that decides what kind of request this is, followed by a more capable call (or a full agent loop) only for the requests that warrant it. The cheapest agent step is the one you never make.
Streaming for Responsive Interfaces
If your agent talks to a user in real time, you want to stream tokens as they are generated rather than making the user wait for the whole response. The SDK supports server-sent events, and the streaming helpers make it ergonomic:
stream = ANTHROPIC.messages.stream(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Draft a payment reminder email." }]
)
stream.text.each do |chunk|
# Push each chunk to the browser over a Turbo Stream or ActionCable channel.
broadcast_chunk(chunk)
end
In a Rails app, this pairs with Turbo Streams or ActionCable: each text chunk becomes a broadcast, and the user watches the response appear. The streaming interface also exposes accumulation helpers and event-level access when you need to react to specific events rather than just the text, which is useful when you want to show the user “calling tool: looking up invoices” as it happens.
Run Agents in the Background
A real agent loop can run for many turns, and each turn is a network round trip to the model. That can easily exceed the time budget of a web request, and tying up a Puma worker for thirty seconds while an agent thinks is a good way to exhaust your connection pool under load. Agents belong in background jobs.
Enqueue the agent run, stream results back over a channel, and let your existing job infrastructure handle retries and concurrency.
class AgentRunJob < ApplicationJob
queue_as :agents
def perform(conversation_id)
conversation = Conversation.find(conversation_id)
runner = ANTHROPIC.beta.messages.tool_runner(
model: "claude-sonnet-4-6",
max_tokens: 2048,
messages: conversation.to_messages,
tools: conversation.permitted_tools
)
runner.each_message do |message|
conversation.append!(message)
conversation.broadcast_latest
end
end
end
If you are on Rails 8 with Solid Queue, this fits the default stack with no extra infrastructure. The agent becomes just another job, with all the retry, monitoring, and concurrency control you already have. I have written separately about scheduling and operating Solid Queue, and everything there applies directly to agent workloads.
Authorization
This is the part that decides whether an agent is safe to ship. When an agent calls a tool, whose permissions apply? An agent that can read any customer’s invoices because it runs as a privileged service account is a data breach waiting to happen. The model can be steered by its input, and if a user can influence the prompt, they can influence which tools the agent tries to call.
Tools must execute with the permissions of the user they act for, never with ambient service-account access. In Rails, this means the authorization layer you already have - Pundit policies, scoped queries, the current account or tenant - must apply inside your tools exactly as it does in your controllers. The agent layer is a thin adapter; the authorization lives in the domain, where it always did.
class LookupInvoices < Anthropic::BaseTool
def initialize(current_user:)
@current_user = current_user
super()
end
def call(input)
# Scope through the same policy the rest of the app uses.
# The agent can only ever see what this user could see.
scope = InvoicePolicy::Scope.new(@current_user, Invoice).resolve
scope.where(customer_id: input.customer_id)
.order(created_at: :desc)
.limit(input.limit || 20)
.as_json(only: %i[id number status amount_cents due_on])
end
end
Instantiate your tools per-request with the current user, and let your existing policies do the work. This is why a mature Rails monolith works well for agents: the scoping, policies, and tenant isolation already exist and are tested. You are reusing security, not building it.
The same caution extends to write actions. An agent that can issue refunds or send emails should treat those as deliberate, gated operations - ideally with a human approval checkpoint for anything irreversible, not as just another tool call the model makes on a whim. Read-only by default, writes behind explicit confirmation, is the right starting posture.
Error Handling and Retries
The SDK raises a typed hierarchy of errors, all descending from Anthropic::Errors::APIError, which lets you handle each failure mode deliberately:
begin
message = ANTHROPIC.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: messages
)
rescue Anthropic::Errors::RateLimitError
# HTTP 429: back off and retry, or shed load.
raise
rescue Anthropic::Errors::APIConnectionError => e
# Network problem reaching the API.
Rails.logger.error("Anthropic unreachable: #{e.cause}")
raise
rescue Anthropic::Errors::APIStatusError => e
Rails.logger.error("Anthropic returned #{e.status}")
raise
end
The SDK already retries certain failures for you: by default it retries twice, with a short exponential backoff, on connection errors, request timeouts, 409 conflicts, 429 rate limits, and 5xx errors. You can tune this per client or per request with the max_retries option, and set it to zero when you want to handle retries entirely in your own job layer. Requests time out after ten minutes by default, which is generous for a single call but worth lowering for interactive paths where you would rather fail fast.
For agents specifically, there is a second class of error beyond HTTP failures: the model does something you did not expect, like calling a tool with arguments that fail validation or looping without converging. Always set a maximum iteration count as a stopping condition, even when using the tool runner, so a confused agent fails loudly instead of running up a bill. Treat your tool code defensively, validate inputs, and return a clear error string to the model when something is wrong rather than raising, because a well-worded error in the tool result often lets the model correct itself on the next turn.
Observability: Make the Agent’s Thinking Visible
Anthropic’s three core principles for building agents are to “maintain simplicity in your agent’s design,” to “prioritize transparency by explicitly showing the agent’s planning steps,” and to “carefully craft your agent-computer interface (ACI) through thorough tool documentation and testing.” Transparency is the one most teams skip, and it is how you debug. An agent that fails silently is nearly impossible to diagnose; an agent that logs every tool call, every argument, and every result is straightforward.
Log each tool invocation with the tool name, the arguments, the user on whose behalf it ran, and the result. In practice this log becomes three things at once: your debugging trace, your audit trail, and your cost-attribution record. Capture token usage from each response too, because that is how you understand and control spend. The model returns usage figures on every message; persist them against the conversation so you can see which agents and which users are expensive.
A simple wrapper around tool execution gives you this for free:
def execute_tool(tool, block)
started = Process.clock_gettime(Process::CLOCK_MONOTONIC)
result = tool.call(block.input)
AgentToolCall.create!(
tool_name: block.name,
arguments: block.input,
user_id: Current.user&.id,
duration_ms: ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - started) * 1000).round
)
result
rescue => e
Rails.logger.error("Tool #{block.name} failed: #{e.message}")
"Error: #{e.message}" # Hand a usable error back to the model.
end
Patterns and When to Use Them
Anthropic’s catalog of agentic patterns, from Building Effective Agents, maps onto Rails work. The article describes routing as a pattern that “classifies an input and directs it to a specialized followup task,” and orchestrator-workers as one where “a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.” Here is the short version, with the Rails-shaped use case for each.
| Pattern | What it is | Good Rails use case |
|---|---|---|
| Single augmented call | One model call with tools, retrieval, or memory | Most features; try this first |
| Prompt chaining | Output of one call feeds the next, with checks between | Generate then validate then refine a document |
| Routing | Classify the input, send it to a specialized path | Triage support tickets to the right handler and model |
| Parallelization | Run subtasks or votes concurrently, aggregate results | Run guardrail checks alongside the main response |
| Orchestrator-workers | A lead model delegates dynamic subtasks to workers | Multi-step research or multi-record changes |
| Evaluator-optimizer | One model generates, another critiques, in a loop | Iterative drafting against clear quality criteria |
| Autonomous agent | The model drives a tool loop until done | Open-ended tasks where steps cannot be predicted |
The progression is deliberate. Start at the top. Move down only when a simpler pattern demonstrably falls short, because every step down costs latency, tokens, and a little more unpredictability.
When Not to Use an Agent
Agents are not the right tool when the task has a predictable structure. If you can write down the steps in advance, use a workflow instead: cheaper, faster, and easier to test and debug. Reach for an agent only when the steps vary based on the model’s intermediate findings.
Be cautious about agents with write access. Every write action an agent can take is an action it can take incorrectly at scale. Audit agents thoroughly before granting write permissions, and prefer requiring explicit human confirmation for anything irreversible - sending emails, issuing refunds, modifying records in bulk.
Watch your context window. Agent loops accumulate conversation history, and long-running agents can hit context limits or generate surprisingly large token counts. Set max_tokens conservatively on individual calls and cap iterations at a reasonable maximum.
The Bottom Line
Start with the official anthropic gem and a single model call. Confirm the simplest version works before adding a loop. Define a small number of carefully described tools as Ruby classes with typed, constrained inputs, and spend real effort on those descriptions - they are the interface the model actually uses. Let the tool runner own the loop unless you have a specific reason not to, and always cap iterations. Run agents in background jobs. Scope every tool through your existing authorization layer. Log every tool call; that one habit gives you debugging, auditing, and cost attribution.
An agent is a thin, model-driven layer over the domain logic, authorization, and infrastructure you already have. The teams that ship are the ones who kept it simple, got the tool descriptions right, and reused what was already there. Prompt cleverness is rarely what makes the difference. Rails, as it happens, is very good at the boring foundations.
Need help designing or building an AI agent on top of your Rails application? I work with teams on agent architecture, tool design, and the authorization and observability patterns that make an agent safe to ship. Reach out at nikita.sinenko at gmail.com.
Further Reading
- Solid Queue in Rails 8: Install, Migrate, and Deploy - the background job layer to run agents on
- Service Objects Are Not an Architecture - the domain layer your tools should call into
- Rails Monoliths Encode Organizational Assumptions - why the monolith already has the security an agent reuses
- PostgreSQL Optimization in Rails: Cut Query Times by 95% - keeping the queries behind your tools fast
- Odoo API Integration in 2026: JSON-2, Webhooks, Dashboards - giving an agent real business data to act on
- Anthropic: Building Effective Agents - the source for the workflow/agent distinction and the agentic patterns
- anthropic-sdk-ruby on GitHub - the official gem, including the
auto_looping_toolsexamples referenced above
Anthropic product details, SDK methods, and model names in this post reflect the official documentation and the anthropic-sdk-ruby source at the time of writing. The tool_runner lives under the beta namespace, so verify the current method signatures before building.