close
Skip to content

Add GPT 5.5 support without modifying the system prompt#3328

Merged
youknowriad merged 7 commits into
trunkfrom
add-gpt-runtime-minimal
May 4, 2026
Merged

Add GPT 5.5 support without modifying the system prompt#3328
youknowriad merged 7 commits into
trunkfrom
add-gpt-runtime-minimal

Conversation

@youknowriad
Copy link
Copy Markdown
Contributor

@youknowriad youknowriad commented May 4, 2026

Related issues

Sibling: #3244 (the larger, system-prompt-rewriting alternative).

The idea of this PR is to try to land GPT 5.5 support but without impact on Claude (default behavior). So it focuses on adding the alternative agent runtime but doesn't change the system prompt.

The problem is that our current system prompt doesn't produce great sites with GPT 5.5. It's something we can improve but I prefer if we improve it on a separate PR.

Proposed Changes

  • Two-runtime dispatch in apps/cli/ai/agent.ts (pickRuntime + cross-family resume guard).
  • New apps/cli/ai/runtimes/{anthropic,openai}/. The Anthropic runtime is a pure relocation of the SDK setup that today lives inline in agent.ts. The OpenAI runtime uses @mariozechner/pi-agent-core for the tool loop and @mariozechner/pi-ai's OpenAI Chat Completions provider, talking to the wpcom AI proxy.
  • tools/common/ai/models.tsAI_MODELS now an array of {id, label, family}; gpt-5.5 listed under family: 'openai'.
  • apps/cli/ai/providers.tsdefineProvider shape, OPENAI_* env vars on the wpcom path. Anthropic env vars are wire-identical to trunk (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_CUSTOM_HEADERS, CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS, CLAUDE_CODE_MAX_RETRIES); the constant rename WPCOM_AI_FEATURE_HEADER → WPCOM_AI_FEATURE_HEADER_ANTHROPIC keeps the same 'studio-assistant-anthropic' value.
  • apps/cli/ai/tools.ts split one-tool-per-file under apps/cli/ai/tools/; the OpenAI runtime imports specific factories by name. Matches the convention noted in feedback_studio_tools_one_per_file.
  • New hand-rolled Skill and AskUserQuestion tools for the OpenAI runtime (apps/cli/ai/tools/{skill,ask-user-question}.ts); Anthropic continues to use the SDK preset's built-ins via the PreToolUse hook.
  • apps/ui model picker plumbing + cross-family confirm dialog (apps/ui/src/components/session-view/composer/family-switch-confirm-dialog.tsx).
  • apps/cli/ai/eval-runner.tsSTUDIO_EVAL_MODEL env override so probes can target GPT directly.
  • @mariozechner/pi-{agent-core,ai,coding-agent,tui}@0.70.2 pinned to exact versions per CLAUDE.md.

What did NOT change

  • apps/cli/ai/system-prompt.ts is byte-identical to trunk. Verified: git diff trunk -- apps/cli/ai/system-prompt.ts is empty.

Open risk — verified by GPT probe
Trunk's prompt at line 144 says "Run the `site-spec` skill … FIRST." With the new `Skill` tool registered for GPT but the prompt unchanged, GPT could either call `Skill('site-spec')`, improvise a discovery question, or skip discovery. The probe (Testing Instructions below) shows GPT calling `Skill('site-spec')` and following the runbook correctly — i.e. the best of the three outcomes, no prompt rewrite needed.

Testing Instructions

  • Run studio code
  • switch models to GPT 5.5 using /model
  • Try some basic prompts.

🤖 Generated with Claude Code

Slimmed-down alternative to #3244. Adds two-runtime dispatch (Anthropic
SDK + pi-agent-core for OpenAI), the gpt-5.5 model entry, hand-rolled
Skill / AskUserQuestion tools for the OpenAI path, and the apps/ui
model picker plumbing — but leaves apps/cli/ai/system-prompt.ts
byte-identical to trunk so Claude's behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@youknowriad youknowriad marked this pull request as ready for review May 4, 2026 11:35
youknowriad and others added 2 commits May 4, 2026 13:12
Block themes don't auto-load style.css the way classic themes do, so
without an explicit wp_enqueue_scripts hook the editor renders styled
(via the existing add_editor_style rule) but the frontend renders
unstyled. Claude infers the hook from WordPress priors; GPT 5.5
follows the literal rules and skips it. One-line addition next to the
editor-styles rule so the pair reads naturally together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The full version restated the WordPress mechanic and the failure mode;
the rule alone is enough to land the behavior. Mirrors the terseness
of the editor-styles line right above it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wpmobilebot
Copy link
Copy Markdown
Collaborator

wpmobilebot commented May 4, 2026

📊 Performance Test Results

Comparing 759c995 vs trunk

app-size

Metric trunk 759c995 Diff Change
App Size (Mac) 1557.92 MB 1653.22 MB +95.30 MB 🔴 6.1%

site-editor

Metric trunk 759c995 Diff Change
load 1536 ms 1513 ms 23 ms ⚪ 0.0%

site-startup

Metric trunk 759c995 Diff Change
siteCreation 8088 ms 8080 ms 8 ms ⚪ 0.0%
siteStartup 4932 ms 4945 ms +13 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

@youknowriad
Copy link
Copy Markdown
Contributor Author

This is not perfect but for me, this is ready to land.

  • It doesn't impact claude models (our current production models).
  • It ships an initial version of the GPT 5.5 model and pi based harness.

@youknowriad youknowriad requested review from Poliuk and epeicher May 4, 2026 12:48
youknowriad and others added 4 commits May 4, 2026 14:29
…rmed headers

Two cleanups in the OpenAI runtime:

- Synthesized assistant SDKMessages tagged the runtime literal 'openai'
  in `message.model`. Nothing reads that field today, but it diverges
  from the Anthropic SDK's behavior (it carries the real id) and would
  silently mislead any future consumer that does read it. Thread the
  configured model id through `translateEvent` and the two factories.
- `parseHeaderEnv` swallowed JSON.parse failures and a non-object
  payload silently. STUDIO_OPENAI_DEFAULT_HEADERS is produced by
  Studio, so the only realistic failure modes are bugs in the producer
  or a manual override — both worth surfacing, since the consequence
  (missing X-WPCOM-AI-Feature) shows up downstream as an opaque 401.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous read used `inputValue()` (one-shot DOM read) and
`expect(string).toBe(...)` (synchronous, no retry), so the assertion
fired before the SITE_EVENTS.UPDATED round-trip from the CLI _events
subprocess back into renderer Redux had landed. Switch to Playwright's
auto-waiting `toHaveValue` matcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt assumed the dropdown would self-update once Redux
caught up, but the Edit dialog seeds its dropdown from
useState(selectedSite.phpVersion) at mount time and never resyncs on
later prop changes. So once the dialog mounts with stale Redux state,
the dropdown is locked to the stale value indefinitely.

The Settings tab body, on the other hand, binds the displayed PHP
version directly to selectedSite — it flips as soon as the
SITE_EVENTS.UPDATED round-trip (CLI _events socket → main → renderer
Redux) lands. Wait on that before reopening the Edit dialog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@youknowriad youknowriad merged commit 872fcba into trunk May 4, 2026
10 checks passed
@youknowriad youknowriad deleted the add-gpt-runtime-minimal branch May 4, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants