Add GPT 5.5 support without modifying the system prompt by youknowriad · Pull Request #3328 · Automattic/studio

youknowriad · 2026-05-04T11:31:12Z

Related issues

Sibling: #3244 (the larger, system-prompt-rewriting alternative).

The idea of this PR is to try to land GPT 5.5 support but without impact on Claude (default behavior). So it focuses on adding the alternative agent runtime but doesn't change the system prompt.

The problem is that our current system prompt doesn't produce great sites with GPT 5.5. It's something we can improve but I prefer if we improve it on a separate PR.

Proposed Changes

Two-runtime dispatch in apps/cli/ai/agent.ts (pickRuntime + cross-family resume guard).
New apps/cli/ai/runtimes/{anthropic,openai}/. The Anthropic runtime is a pure relocation of the SDK setup that today lives inline in agent.ts. The OpenAI runtime uses @mariozechner/pi-agent-core for the tool loop and @mariozechner/pi-ai's OpenAI Chat Completions provider, talking to the wpcom AI proxy.
tools/common/ai/models.ts — AI_MODELS now an array of {id, label, family}; gpt-5.5 listed under family: 'openai'.
apps/cli/ai/providers.ts — defineProvider shape, OPENAI_* env vars on the wpcom path. Anthropic env vars are wire-identical to trunk (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_CUSTOM_HEADERS, CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS, CLAUDE_CODE_MAX_RETRIES); the constant rename WPCOM_AI_FEATURE_HEADER → WPCOM_AI_FEATURE_HEADER_ANTHROPIC keeps the same 'studio-assistant-anthropic' value.
apps/cli/ai/tools.ts split one-tool-per-file under apps/cli/ai/tools/; the OpenAI runtime imports specific factories by name. Matches the convention noted in feedback_studio_tools_one_per_file.
New hand-rolled Skill and AskUserQuestion tools for the OpenAI runtime (apps/cli/ai/tools/{skill,ask-user-question}.ts); Anthropic continues to use the SDK preset's built-ins via the PreToolUse hook.
apps/ui model picker plumbing + cross-family confirm dialog (apps/ui/src/components/session-view/composer/family-switch-confirm-dialog.tsx).
apps/cli/ai/eval-runner.ts — STUDIO_EVAL_MODEL env override so probes can target GPT directly.
@mariozechner/pi-{agent-core,ai,coding-agent,tui}@0.70.2 pinned to exact versions per CLAUDE.md.

What did NOT change

apps/cli/ai/system-prompt.ts is byte-identical to trunk. Verified: git diff trunk -- apps/cli/ai/system-prompt.ts is empty.

Open risk — verified by GPT probe
Trunk's prompt at line 144 says "Run the `site-spec` skill … FIRST." With the new `Skill` tool registered for GPT but the prompt unchanged, GPT could either call `Skill('site-spec')`, improvise a discovery question, or skip discovery. The probe (Testing Instructions below) shows GPT calling `Skill('site-spec')` and following the runbook correctly — i.e. the best of the three outcomes, no prompt rewrite needed.

Testing Instructions

Run studio code
switch models to GPT 5.5 using /model
Try some basic prompts.

🤖 Generated with Claude Code

Slimmed-down alternative to #3244. Adds two-runtime dispatch (Anthropic SDK + pi-agent-core for OpenAI), the gpt-5.5 model entry, hand-rolled Skill / AskUserQuestion tools for the OpenAI path, and the apps/ui model picker plumbing — but leaves apps/cli/ai/system-prompt.ts byte-identical to trunk so Claude's behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Block themes don't auto-load style.css the way classic themes do, so without an explicit wp_enqueue_scripts hook the editor renders styled (via the existing add_editor_style rule) but the frontend renders unstyled. Claude infers the hook from WordPress priors; GPT 5.5 follows the literal rules and skips it. One-line addition next to the editor-styles rule so the pair reads naturally together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The full version restated the WordPress mechanic and the failure mode; the rule alone is enough to land the behavior. Mirrors the terseness of the editor-styles line right above it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wpmobilebot · 2026-05-04T12:38:02Z

📊 Performance Test Results

Comparing 759c995 vs trunk

app-size

Metric	trunk	`759c995`	Diff	Change
App Size (Mac)	1557.92 MB	1653.22 MB	+95.30 MB	🔴 6.1%

site-editor

Metric	trunk	`759c995`	Diff	Change
load	1536 ms	1513 ms	23 ms	⚪ 0.0%

site-startup

Metric	trunk	`759c995`	Diff	Change
siteCreation	8088 ms	8080 ms	8 ms	⚪ 0.0%
siteStartup	4932 ms	4945 ms	+13 ms	⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

youknowriad · 2026-05-04T12:48:43Z

This is not perfect but for me, this is ready to land.

It doesn't impact claude models (our current production models).
It ships an initial version of the GPT 5.5 model and pi based harness.

…rmed headers Two cleanups in the OpenAI runtime: - Synthesized assistant SDKMessages tagged the runtime literal 'openai' in `message.model`. Nothing reads that field today, but it diverges from the Anthropic SDK's behavior (it carries the real id) and would silently mislead any future consumer that does read it. Thread the configured model id through `translateEvent` and the two factories. - `parseHeaderEnv` swallowed JSON.parse failures and a non-object payload silently. STUDIO_OPENAI_DEFAULT_HEADERS is produced by Studio, so the only realistic failure modes are bugs in the producer or a manual override — both worth surfacing, since the consequence (missing X-WPCOM-AI-Feature) shows up downstream as an opaque 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous read used `inputValue()` (one-shot DOM read) and `expect(string).toBe(...)` (synchronous, no retry), so the assertion fired before the SITE_EVENTS.UPDATED round-trip from the CLI _events subprocess back into renderer Redux had landed. Switch to Playwright's auto-waiting `toHaveValue` matcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous attempt assumed the dropdown would self-update once Redux caught up, but the Edit dialog seeds its dropdown from useState(selectedSite.phpVersion) at mount time and never resyncs on later prop changes. So once the dialog mounts with stale Redux state, the dropdown is locked to the stale value indefinitely. The Settings tab body, on the other hand, binds the displayed PHP version directly to selectedSite — it flips as soon as the SITE_EVENTS.UPDATED round-trip (CLI _events socket → main → renderer Redux) lands. Wait on that before reopening the Edit dialog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

youknowriad added the Proof of Concept label May 4, 2026

github-actions Bot assigned youknowriad May 4, 2026

youknowriad removed the Proof of Concept label May 4, 2026

youknowriad marked this pull request as ready for review May 4, 2026 11:35

youknowriad and others added 2 commits May 4, 2026 13:12

This was referenced May 4, 2026

Add OpenAI/GPT model support to the Studio code agent #3244

Closed

Unify Anthropic and OpenAI runtimes on pi-agent-core #3246

Closed

youknowriad requested review from Poliuk and epeicher May 4, 2026 12:48

youknowriad and others added 4 commits May 4, 2026 14:29

Merge remote-tracking branch 'origin/trunk' into add-gpt-runtime-minimal

91cd1b5

youknowriad merged commit 872fcba into trunk May 4, 2026
10 checks passed

youknowriad deleted the add-gpt-runtime-minimal branch May 4, 2026 17:49

This was referenced May 4, 2026

CLI agent: align pi-tui with other pi-* deps and restore annotation tools #3336

Merged

Unify CLI agent on a single pi-based runtime #3337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT 5.5 support without modifying the system prompt#3328

Add GPT 5.5 support without modifying the system prompt#3328
youknowriad merged 7 commits into
trunkfrom
add-gpt-runtime-minimal

youknowriad commented May 4, 2026 •

edited

Loading

Uh oh!

wpmobilebot commented May 4, 2026 •

edited

Loading

Uh oh!

youknowriad commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

youknowriad commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues

Proposed Changes

Testing Instructions

Uh oh!

wpmobilebot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Performance Test Results

app-size

site-editor

site-startup

Uh oh!

youknowriad commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

youknowriad commented May 4, 2026 •

edited

Loading

wpmobilebot commented May 4, 2026 •

edited

Loading