"Building an OpenAI Harness (1/3) — Understanding Responses API, Tools, and the Agents SDK as an Operational Stack"
If you understand OpenAI only as "a model API," it becomes hard to see where the real agent system should be assembled. In practice,
Responses API,tools,function calling,remote MCP, and theAgents SDKlook like one connected surface, but they play different roles. Once you separate those roles, it becomes much easier to see where the harness should stay thin and where it needs structure.
Key Takeaways
- The
Responses APIis not just a text-generation endpoint. It is the base operational surface for responses, tool calls, and conversation-state linkage. Toolsexpand the model's action space through built-in tools, custom function calling, and remote MCP, but the decision of when a tool should be available is still a harness design problem.- The
Agents SDKdoes not replace model calls. It sits above them as a layer for turn management, handoffs, guardrails, tracing, and session orchestration. - In practice, it is often best to keep
Responses APIat the bottom, attach a narrow tool surface and verification rules above it, and bring in theAgents SDKonly when orchestration complexity genuinely appears. - So the central OpenAI harness question is not "which model should we use?" but rather "which operational responsibility belongs on which layer?"
1. Why OpenAI should be read as layers, not just a product
From a harness perspective, the OpenAI surface is not one thing.
- response generation
- conversation-state linkage
- tool calling
- file, web, and external-system access
- multi-agent execution flow management
If you collapse those into one mental bucket, implementations bloat quickly. If you separate them into layers, it becomes much clearer what should be delegated to the API and what should remain your responsibility.
This is a useful split:
| Layer | Primary role |
|---|---|
Responses API |
input, response generation, tool calls, multi-turn linkage |
Tools |
web search, file search, function calling, remote MCP |
| Your harness | approvals, policy, retries, verification, logging, cost control |
Agents SDK |
orchestration across turns and agents |
The point of this post is not to list features. It is to read them as operational responsibilities.
2. The Responses API is really a runtime boundary
In the current official OpenAI docs, the center of gravity is the Responses API. It handles not just visible output, but also tool use and follow-up turn linkage in the same overall flow.
Three details matter most in practice:
- A response can include output items and tool calls, not just a plain string.
- Multi-turn linkage can be maintained through mechanisms such as
previous_response_id. - The same request surface exposes control points like
tools,parallel_tool_calls, andmax_tool_calls.
That means that as soon as you adopt the Responses API, you are already touching part of an agent loop. It is therefore better understood as the runtime boundary of a minimal harness, not merely a completion-style endpoint.
3. Built-in tools are an action surface, not a feature checklist
The official tools guide now presents built-in tools, function calling, and remote MCP inside the same tool surface. That matters because it assumes the model may need to move from answering into acting.
Some of the most operationally meaningful tools are:
web_search: for freshness-sensitive or externally verifiable workfile_search: for uploaded files or vector-store retrievalfunctioncalling: for your own code and internal systemsmcp: for external tool servers exposed through a standard interface
But the practical question is not "should we enable everything?" In most systems, the right answer is no.
- Enabling
web_searchby default for non-freshness tasks adds cost and variability. - Search tools that dump long raw results pollute the next turn's context.
- Loosely defined functions make tool choice less reliable.
- Too many MCP servers increase connectivity and confusion at the same time.
Tools are therefore not just a capability surface. They are also a blast-radius surface.
4. Function calling is boundary design
Many teams think of function calling as "the model can call a function." That is true but incomplete. In a harness, function calling is the boundary between model judgment and system action.
Good function schemas usually have these traits:
- names make roles obvious
- parameters stay narrow and predictable
- different risk levels are not mixed together
- outputs return only the minimum structure needed for the next decision
For example, "search docs" and "edit docs" should not usually live behind one vague tool abstraction. What feels convenient to a human often becomes ambiguous to a model.
That is why harness quality is often more sensitive to schema clarity than to the sheer number of functions.
5. Remote MCP improves connectivity, not design
It is significant that remote MCP now appears in the same official tool surface. In OpenAI-based harnesses, external tool connectivity is increasingly becoming a standard part of the platform story rather than an ad hoc side channel.
But that does not mean MCP designs the harness for you.
You still need to decide:
- which servers should be exposed
- which ones should remain read-only
- which results should be summarized before returning
- which actions should require explicit approval
- which traces and audit records should be stored
MCP lowers friction for extension. It does not remove the need for operational architecture.
6. The Agents SDK is best understood as a higher-level loop manager
If you read the official OpenAI documentation together with the SDK docs, the Agents SDK looks much less like a replacement for the Responses API and much more like an orchestration layer above it.
The recurring concepts are:
- tools
- handoffs
- sessions
- guardrails
- tracing
- streaming
That combination makes the role fairly clear. The Agents SDK is about managing agent execution flows, not just sending prompts.
This framing usually reduces confusion:
| Question | Better matching layer |
|---|---|
| Which tools may the model use in this request | Responses API + tools |
| How is this conversation linked to the next turn | Responses API |
| Can this agent hand off to a specialist agent | Agents SDK |
| Where do traces and run flow live | Agents SDK or your observability layer |
| Where should pre/post guardrails sit | your harness plus SDK support |
So the SDK is not the default starting point. It is an upper structure for systems that have already become orchestration-heavy.
7. When the Responses API is enough, and when the SDK helps
The answer depends on the team, but the pattern is usually straightforward.
Cases where the Responses API is often enough
- a single agent or a thin loop is sufficient
- the tool surface is small and clear
- conversation continuity is relatively simple
- request-by-request handling matters more than handoffs
Cases where the Agents SDK becomes attractive
- specialist agents need to hand work between each other
- session and handoff behavior repeat often
- tracing and guardrails need to be reused consistently
- multi-step execution should be managed as one run unit
The key point is this:
The SDK is not "the advanced way to use OpenAI." It is a way to reduce orchestration chaos once operational complexity reaches a certain level.
8. What your side still has to own
Even as OpenAI provides broader runtime surfaces, the core harness responsibilities do not disappear. Your system still needs to own:
- which tools are allowed for which jobs
- whether freshness checks are required
- how external search output is summarized and verified
- when to retry versus stop
- how cost and tool-call ceilings are enforced
- where traces, audit records, and approvals live
In other words, OpenAI increasingly offers strong agent runtime parts, but the product-level harness is still your design.
9. Practical checklist before wiring an OpenAI harness
- Is this task just response generation, or does it really need a tool loop?
- Does it require freshness, or is local context enough?
- How should built-in tools and custom functions be separated?
- Are function schemas split by responsibility and risk level?
- Is remote MCP actually needed, or are internal functions enough?
- How much multi-turn state should remain in the API versus your own artifacts?
- Has orchestration complexity grown enough to justify the SDK?
10. The more clearly you see the layers, the simpler the harness gets
As the OpenAI surface expands, it can look more confusing to newcomers. But from a harness perspective, the structure is actually fairly manageable.
Responses APIis the base runtime boundarytoolsexpand the action surfacefunction callingandMCPconnect external systems- the
Agents SDKmanages more complex loops
This framing also removes the pressure to adopt everything at once. Good harnesses usually start smaller:
- build a thin loop with the Responses API
- attach only the tools that are necessary
- keep verification and approvals in your own harness
- bring in the Agents SDK only when complexity warrants it
That is why the real OpenAI harness question is not model choice. It is this:
Which operational responsibility belongs on which layer?
Part 2 moves that same question to Claude, where the separation between CLAUDE.md, skills, hooks, and permissions becomes even more concrete.
References
- OpenAI Docs,
Responses API
https://platform.openai.com/docs/api-reference/responses/create?api-mode=responses - OpenAI Docs,
Using tools
https://platform.openai.com/docs/guides/tools?api-mode=responses - OpenAI Docs,
Conversation state
https://platform.openai.com/docs/guides/conversation-state?api-mode=responses - OpenAI Docs,
Agents SDK guide
https://platform.openai.com/docs/guides/agents-sdk - OpenAI Agents SDK Docs,
Agents
https://openai.github.io/openai-agents-python/agents/ docs/blog_series_ํ๋ค์ค์์ง๋์ด๋ง_์ด๊ด_design.mdsources/260518_ํ๋ค์ค์์ง๋์ด๋ง_15์ฅ_๋ธ๋ก๊ทธํ์ฉ๋ ธํธ.md
This is Part 1/3 of the OpenAI and Claude Harnesses series. Suggested next reading: building a Claude harness, then comparing OpenAI and Claude harness design philosophies.
Series overview: Harness Engineering Series Guide
๋๊ธ
๋๊ธ ์ฐ๊ธฐ