"Building an OpenAI Harness (1/3) — Understanding Responses API, Tools, and the Agents SDK as an Operational Stack"

5월 18, 2026

If you understand OpenAI only as "a model API," it becomes hard to see where the real agent system should be assembled. In practice, Responses API, tools, function calling, remote MCP, and the Agents SDK look like one connected surface, but they play different roles. Once you separate those roles, it becomes much easier to see where the harness should stay thin and where it needs structure.

Key Takeaways

The Responses API is not just a text-generation endpoint. It is the base operational surface for responses, tool calls, and conversation-state linkage.
Tools expand the model's action space through built-in tools, custom function calling, and remote MCP, but the decision of when a tool should be available is still a harness design problem.
The Agents SDK does not replace model calls. It sits above them as a layer for turn management, handoffs, guardrails, tracing, and session orchestration.
In practice, it is often best to keep Responses API at the bottom, attach a narrow tool surface and verification rules above it, and bring in the Agents SDK only when orchestration complexity genuinely appears.
So the central OpenAI harness question is not "which model should we use?" but rather "which operational responsibility belongs on which layer?"

1. Why OpenAI should be read as layers, not just a product

From a harness perspective, the OpenAI surface is not one thing.

response generation
conversation-state linkage
tool calling
file, web, and external-system access
multi-agent execution flow management

If you collapse those into one mental bucket, implementations bloat quickly. If you separate them into layers, it becomes much clearer what should be delegated to the API and what should remain your responsibility.

This is a useful split:

Layer	Primary role
`Responses API`	input, response generation, tool calls, multi-turn linkage
`Tools`	web search, file search, function calling, remote MCP
Your harness	approvals, policy, retries, verification, logging, cost control
`Agents SDK`	orchestration across turns and agents

The point of this post is not to list features. It is to read them as operational responsibilities.

2. The Responses API is really a runtime boundary

In the current official OpenAI docs, the center of gravity is the Responses API. It handles not just visible output, but also tool use and follow-up turn linkage in the same overall flow.

Three details matter most in practice:

A response can include output items and tool calls, not just a plain string.
Multi-turn linkage can be maintained through mechanisms such as previous_response_id.
The same request surface exposes control points like tools, parallel_tool_calls, and max_tool_calls.

That means that as soon as you adopt the Responses API, you are already touching part of an agent loop. It is therefore better understood as the runtime boundary of a minimal harness, not merely a completion-style endpoint.

3. Built-in tools are an action surface, not a feature checklist

The official tools guide now presents built-in tools, function calling, and remote MCP inside the same tool surface. That matters because it assumes the model may need to move from answering into acting.

Some of the most operationally meaningful tools are:

web_search: for freshness-sensitive or externally verifiable work
file_search: for uploaded files or vector-store retrieval
function calling: for your own code and internal systems
mcp: for external tool servers exposed through a standard interface

But the practical question is not "should we enable everything?" In most systems, the right answer is no.

Enabling web_search by default for non-freshness tasks adds cost and variability.
Search tools that dump long raw results pollute the next turn's context.
Loosely defined functions make tool choice less reliable.
Too many MCP servers increase connectivity and confusion at the same time.

Tools are therefore not just a capability surface. They are also a blast-radius surface.

4. Function calling is boundary design

Many teams think of function calling as "the model can call a function." That is true but incomplete. In a harness, function calling is the boundary between model judgment and system action.

Good function schemas usually have these traits:

names make roles obvious
parameters stay narrow and predictable
different risk levels are not mixed together
outputs return only the minimum structure needed for the next decision

For example, "search docs" and "edit docs" should not usually live behind one vague tool abstraction. What feels convenient to a human often becomes ambiguous to a model.

That is why harness quality is often more sensitive to schema clarity than to the sheer number of functions.

5. Remote MCP improves connectivity, not design

It is significant that remote MCP now appears in the same official tool surface. In OpenAI-based harnesses, external tool connectivity is increasingly becoming a standard part of the platform story rather than an ad hoc side channel.

But that does not mean MCP designs the harness for you.

You still need to decide:

which servers should be exposed
which ones should remain read-only
which results should be summarized before returning
which actions should require explicit approval
which traces and audit records should be stored

MCP lowers friction for extension. It does not remove the need for operational architecture.

6. The Agents SDK is best understood as a higher-level loop manager

If you read the official OpenAI documentation together with the SDK docs, the Agents SDK looks much less like a replacement for the Responses API and much more like an orchestration layer above it.

The recurring concepts are:

tools
handoffs
sessions
guardrails
tracing
streaming

That combination makes the role fairly clear. The Agents SDK is about managing agent execution flows, not just sending prompts.

This framing usually reduces confusion:

Question	Better matching layer
Which tools may the model use in this request	`Responses API` + tools
How is this conversation linked to the next turn	`Responses API`
Can this agent hand off to a specialist agent	`Agents SDK`
Where do traces and run flow live	`Agents SDK` or your observability layer
Where should pre/post guardrails sit	your harness plus SDK support

So the SDK is not the default starting point. It is an upper structure for systems that have already become orchestration-heavy.

7. When the Responses API is enough, and when the SDK helps

The answer depends on the team, but the pattern is usually straightforward.

Cases where the Responses API is often enough

a single agent or a thin loop is sufficient
the tool surface is small and clear
conversation continuity is relatively simple
request-by-request handling matters more than handoffs

Cases where the Agents SDK becomes attractive

specialist agents need to hand work between each other
session and handoff behavior repeat often
tracing and guardrails need to be reused consistently
multi-step execution should be managed as one run unit

The key point is this:

The SDK is not "the advanced way to use OpenAI." It is a way to reduce orchestration chaos once operational complexity reaches a certain level.

8. What your side still has to own

Even as OpenAI provides broader runtime surfaces, the core harness responsibilities do not disappear. Your system still needs to own:

which tools are allowed for which jobs
whether freshness checks are required
how external search output is summarized and verified
when to retry versus stop
how cost and tool-call ceilings are enforced
where traces, audit records, and approvals live

In other words, OpenAI increasingly offers strong agent runtime parts, but the product-level harness is still your design.

9. Practical checklist before wiring an OpenAI harness

Is this task just response generation, or does it really need a tool loop?
Does it require freshness, or is local context enough?
How should built-in tools and custom functions be separated?
Are function schemas split by responsibility and risk level?
Is remote MCP actually needed, or are internal functions enough?
How much multi-turn state should remain in the API versus your own artifacts?
Has orchestration complexity grown enough to justify the SDK?

10. The more clearly you see the layers, the simpler the harness gets

As the OpenAI surface expands, it can look more confusing to newcomers. But from a harness perspective, the structure is actually fairly manageable.

Responses API is the base runtime boundary
tools expand the action surface
function calling and MCP connect external systems
the Agents SDK manages more complex loops

This framing also removes the pressure to adopt everything at once. Good harnesses usually start smaller:

build a thin loop with the Responses API
attach only the tools that are necessary
keep verification and approvals in your own harness
bring in the Agents SDK only when complexity warrants it

That is why the real OpenAI harness question is not model choice. It is this:

Which operational responsibility belongs on which layer?

Part 2 moves that same question to Claude, where the separation between CLAUDE.md, skills, hooks, and permissions becomes even more concrete.

References

OpenAI Docs, Responses API
https://platform.openai.com/docs/api-reference/responses/create?api-mode=responses
OpenAI Docs, Using tools
https://platform.openai.com/docs/guides/tools?api-mode=responses
OpenAI Docs, Conversation state
https://platform.openai.com/docs/guides/conversation-state?api-mode=responses
OpenAI Docs, Agents SDK guide
https://platform.openai.com/docs/guides/agents-sdk
OpenAI Agents SDK Docs, Agents
https://openai.github.io/openai-agents-python/agents/
docs/blog_series_하네스엔지니어링_총괄_design.md
sources/260518_하네스엔지니어링_15장_블로그활용노트.md

This is Part 1/3 of the OpenAI and Claude Harnesses series. Suggested next reading: building a Claude harness, then comparing OpenAI and Claude harness design philosophies.

Series overview: Harness Engineering Series Guide

이 블로그 검색

MaJu Tech Notes