Design Spec · Draft v4

Image-gen MCP Connector

2026-05-11 · Kyle · branch feat-image-gen-mcp

TL;DR

Add the gateway's image-gen MCP server as a regular row in Settings → Connectors, authenticated by the user's CoreSpeed JWT through the existing local credential broker — the same path X already uses. The only behavioral divergence from X is that image-gen needs no per-provider OAuth dance (a CoreSpeed identity is the credential), encoded structurally by omitting connectURL from the catalog entry. Inline base64 images render directly in the tool-call view and in agent markdown — no disk caching, no broker response transform, no new settings tab.

Code added add

  • 1 entry in connector-catalog.json
  • Local-disable per-id boolean + one early-return per call site in ConnectorStore
  • account.onSignInStateChanged reactive wiring
  • ConnectorTile UI gating for no-OAuth rows
  • Path-prefix derivation from mcpURL.path
  • Bumped broker URLSession timeouts
  • MarkdownSanitizer data:-URI allowlist
  • ToolCallView .image ContentBlock branch
  • Cherry-pick 3c5f560 for the icon asset

Closed PR surface dropped drop

  • InlineImageCache + base64→file conversion (~455 LOC)
  • ImageGenSaveImagesTransform + broker response-transform plumbing
  • New "Built-in Tools" settings tab + MCPServersSettingsPane (217 LOC)
  • New .gatewayApiKey InjectionPolicy case
  • ResponseTransformMode enum + buffered/streaming split
  • ConnectorEligibility, StateChangeMulticast, ConnectorDiagnostics (~500 LOC)
  • Per-route gatewayApiKeySession and bespoke timeout split
  • Image-cache regex pre-passes in MarkdownSanitizer / StreamingMarkdownText
Broker untouched. CredentialBroker, BrokerRoute, the JWT refresh path, and the 401-retry mechanism are unchanged. The only broker-level edit is bumping URLSession.shared to a broker-owned URLSession with 300s / 600s timeouts so slow image-gen models (Imagen, Gemini-3 4K, gpt-image-2 with C2PA) don't trip the default 60s wall.

Why we're doing this

  1. Image generation belongs alongside other gateway tools. Linear, Notion, X, Apify all live in connector-catalog.json and surface through one consistent UI. Image-gen is just another gateway MCP — making it a connector row keeps the mental model intact.
  2. The closed PR over-rotated on a non-problem. Converting inline base64 to disk files (the InlineImageCache path) added 455 LOC of file lifecycle plus markdown-sanitizer cleanup just to avoid putting base64 in the rendered text. Direct base64 decode into NSImage at render time avoids the file lifecycle entirely.
  3. JWT auth + broker refresh already exists. The X connector demonstrated the full pattern: localhost broker route, X-Broker-Auth gating, JWT injection, 401-retry, automatic token refresh. Image-gen needs zero new auth machinery.
  4. Path convention is moving. The gateway is deprecating /<id>/mcp in favor of /mcp/<id>. Image-gen ships at /mcp/image-gen from day one. Deriving the broker prefix from mcpURL.path lets the catalog URL own the convention; X migrates later with a one-line catalog change.

Before / After at a glance

ConcernClosed PR (rejected)This spec
Surface area 5244 LOC across 33 files ~120 LOC across 7 files + 1 icon
Where image-gen appears New Built-in Tools tab in Settings Existing Connectors list, sibling row to X
Image rendering Broker transforms base64 → file, agent emits file:// markdown links Inline base64 → NSImage(data:) at render time, no disk I/O
Auth injection New .gatewayApiKey InjectionPolicy with x-api-key header Existing .coreSpeedJwt InjectionPolicy
Response transport ResponseTransformMode enum split (buffered JSON vs streaming SSE) Unchanged buffered path; image-gen returns application/json
Markdown sanitizer Regex pre-passes to strip image-cache references Allow data:image/... URIs; strip everything else
Disconnect surface Server-side DELETE (X-style OAuth revocation) Local opt-out in UserDefaults; metered tool requires a narrower off-switch than sign-out
Broker path scheme Hardcoded /<id>/mcp Derived from mcpURL.path (/mcp/image-gen for new, /x/mcp for legacy X)
URLSession timeouts Per-route bespoke gatewayApiKeySession One shared 300s/600s session for all .coreSpeedJwt routes

Settings → Connectors (new row)

Image-gen lives in the existing connector list. The row behavior depends on CoreSpeed sign-in state and on whether the catalog entry carries a connectURL.

Connectors

Signed in — image-gen enabled (default)

Image generation· CoreSpeed identity
Generate and edit images. Charged per request.
Disconnect Connected
X (Twitter)· CoreSpeed identity
Disconnect Connected
Linear· OAuth 2.0
Disconnect Connected

Signed in — image-gen locally disabled

Image generation· CoreSpeed identity
Generate and edit images. Charged per request.
Connect Disconnected

Signed out — image-gen needs sign-in

Image generation· CoreSpeed identity
Generate and edit images. Charged per request.
Sign in to CoreSpeed
X (Twitter)· CoreSpeed identity
Connect Disconnected
Row behaviors — same buttons, different semantics X's Disconnect calls the gateway's DELETE endpoint and revokes per-user X credentials server-side; reconnecting re-runs the OAuth dance at twitter.com. Image-gen's Disconnect is a local opt-out — there's no server-side state to revoke (JWT identity is per-request) — persisted in UserDefaults so the choice survives app restarts. This matters because image-gen is charged per request: users need an off switch narrower than "sign out of CoreSpeed entirely."

Auth flow — same broker pattern as X

Agents talk to 127.0.0.1, not to gateway.ai.corespeed.io. The broker rewrites the request, injects a fresh JWT, and forwards it. Refresh and 401-retry are transparent to the agent.

Request lifecycle
Agent (Claude / Codex) MCP client CredentialBroker 127.0.0.1:<port>/mcp/image-gen AI Gateway gateway.ai.corespeed.io/mcp/image-gen CoreSpeedAccount Keycloak JWT + refresh identity middleware JWKS verify · resolve sub → userId X-Broker-Auth Bearer <JWT> accessToken() verify(JWT) application/json (multi-MB base64 OK) On 401 forceRefresh JWT retry once transparent to agent

The mapping inside the broker is unchanged: 127.0.0.1:PORT/<pathPrefix><suffix> → <upstreamBase><suffix>. With pathPrefix derived from mcpURL.path, prefix and upstream-path agree by construction, so the suffix forwards cleanly even when query strings or future sub-paths appear.

Path convention — derive from mcpURL.path

Today's hardcoded "/\(entry.id)/mcp" only works because every .corespeed connector happens to mount at /<id>/mcp upstream. Image-gen breaks that assumption: it ships at /mcp/image-gen. Promote the catalog URL to source of truth.

Before

BrokerRoute(
  pathPrefix: "/\(entry.id)/mcp",    // hardcoded
  upstreamBase: entry.mcpURL,
  injection: .coreSpeedJwt,
  ...
)

// X:        /x/mcp           → gateway/x/mcp           ✓
// image-gen /image-gen/mcp   → gateway/mcp/image-gen   ✗

After

BrokerRoute(
  pathPrefix: entry.mcpURL.path,        // catalog-driven
  upstreamBase: entry.mcpURL,
  injection: .coreSpeedJwt,
  ...
)

// X:        /x/mcp           → gateway/x/mcp           ✓
// image-gen /mcp/image-gen   → gateway/mcp/image-gen   ✓
// X migration later: change mcpURL only.

ConnectorStore.coreSpeedConfigForSession matches: the localhost URL it emits becomes http://127.0.0.1:\(port)\(entry.mcpURL.path) instead of .../\(entry.id)/mcp. Skip any .corespeed entry whose mcpURL.path is empty or "/" so a malformed catalog can never produce a route that matches every request.

Key code snippets

Catalog entry

{
  "id": "image-gen",
  "displayName": "Image generation",
  "description": "Generate and edit images. Charged per request.",
  "iconAsset": "mcp-image-gen",
  "mcpURL": "https://gateway.ai.corespeed.io/mcp/image-gen",
  "auth": "corespeed"
}

No connectURL — absence is the structural signal that this connector skips the per-provider OAuth dance. Staging URL substitution is already handled generically by ConnectorCatalog.loadFromBundle().

New optional field: description. ConnectorCatalogEntry gains an optional description: String?. ConnectorTile renders it as a second sub-line below the existing name+cred line, never replacing the auth-method label. Image-gen uses this slot to surface the cost model explicitly ("Charged per request") so the user knows why the Disconnect button matters. Other catalog entries (X, Linear, …) leave description nil; their rows stay single-line and unchanged.

ConnectorStore — per-id boolean + one early-return per call site

// Per-id UserDefaults boolean. Idiomatic, KVO-friendly.
private static let disabledKeyPrefix = "connector."   // suffix: ".disabled"

func isDisabled(id: String) -> Bool {
  UserDefaults.standard.bool(forKey: "\(Self.disabledKeyPrefix)\(id).disabled")
}

func setDisabled(id: String, _ disabled: Bool) {
  UserDefaults.standard.set(disabled, forKey: "\(Self.disabledKeyPrefix)\(id).disabled")
  onStateChanged.fire()
}

// hydrate() — one ternary, every .corespeed entry
case .corespeed:
  let locallyDisabled = isDisabled(id: entry.id)
  if entry.connectURL == nil {
    statuses[entry.id] = (account.isSignedIn && !locallyDisabled)
      ? .connected(expiresAt: nil)
      : .disconnected
  } else {
    statuses[entry.id] = .disconnected   // reconciled async
  }

// connect(id:) — clear flag, early-return for local-only
setDisabled(id: id, false)
guard entry.connectURL != nil else {
  statuses[id] = account.isSignedIn ? .connected(expiresAt: nil) : .disconnected
  return
}
// existing OAuth dance below for entries with connectURL

// disconnect(id:) — set flag, early-return for local-only
setDisabled(id: id, true)
statuses[id] = .disconnected
guard entry.connectURL != nil else { return }
// existing server-side DELETE below
Restart hint comes free from the existing pane footer. ConnectorsSettingsPane's footer already reads "Changes take effect for new sessions; existing sessions keep their current tools until reloaded." No per-row hint needed — the panel-level copy handles it. The closed branch had to add a per-row hint because it lived in a separate Built-in Tools tab without this footer.

Reactive status from sign-in

// In ConnectorStore.init, after hydrate():
account.onSignInStateChanged = { [weak self] in
  guard let self else { return }
  for entry in self.catalog.connectors
    where entry.auth == .corespeed && entry.connectURL == nil {
    self.statuses[entry.id] = account.isSignedIn
      ? .connected(expiresAt: nil)
      : .disconnected
  }
}

Broker URLSession timeout bump

private static let upstreamSession: URLSession = {
  let config = URLSessionConfiguration.default
  config.timeoutIntervalForRequest  = 300   // 5 min per-chunk idle
  config.timeoutIntervalForResource = 600   // 10 min total wall-clock
  return URLSession(configuration: config)
}()

Replaces every URLSession.shared.data(for:) call inside the broker's .coreSpeedJwt path. Fast routes (X) are unaffected — bumped timeouts just delay failure detection on the slow tail.

Image rendering surfaces

MarkdownSanitizer — allow data: URIs

The sanitizer currently strips every image to prevent untrusted markdown from issuing tracking-pixel / exfiltration requests. Loosen the predicate to inline data:image/... URIs only — they carry their bytes inline and make zero network requests. HTTPS images stay stripped (preserves the original security guarantee).

mutating func visitImage(_ image: Image) -> Markup? {
  guard let source = image.source,
        source.hasPrefix("data:image/") else {
    return nil
  }
  return image
}

ToolCallView — render .image ContentBlocks

Agents typically don't re-emit multi-megabyte base64 inline. The primary surface where users see generated images is the tool-call result panel. Today's toolCallContentView only handles .text; add an .image branch:

case .content(let block):
  if case .text(let t) = block {
    // existing
  } else if case .image(let img) = block {
    if let data = Data(base64Encoded: img.data),
       let nsImage = NSImage(data: data) {
      Image(nsImage: nsImage)
        .resizable()
        .scaledToFit()
        .frame(maxHeight: 400)
        .cornerRadius(8)
    }
  }

No InlineImageCache, no disk writes, no broker response transform. NSImage(data:) decodes PNG / JPEG / WebP / GIF natively on macOS 15+. The base64 stays in memory for the message's lifetime; SwiftUI caches the rendered image.

Acceptance criteria

  1. Image-gen appears in Settings → Connectors

    With the placeholder icon and displayName "Image generation".

  2. Signed in + enabled (default) shows "Connected" + Disconnect

    Status dot is green. Disconnect button visible.

  3. Disconnect is a local opt-out, no network

    Clicking Disconnect flips the connectors.locallyDisabled UserDefaults flag, row transitions to "Disconnected" + Connect. No browser, no DELETE call. Re-clicking Connect clears the flag immediately.

  4. Signed-out row shows "Sign in to CoreSpeed"

    Both buttons suppressed. Status dot gray. The local-disable flag persists across the signed-out window and is honored on next sign-in.

  5. Session spawn honors enabled state

    With image-gen enabled, MCPServerConfig URL is http://127.0.0.1:<brokerPort>/mcp/image-gen + X-Broker-Auth. With image-gen locally disabled, no config is emitted. X still produces .../x/mcp — both derived from mcpURL.path.

  6. generate_image succeeds end-to-end

    Agent invokes the tool; broker injects JWT; gateway validates and forwards; agent receives application/json with inline base64.

  7. Generated image renders inline in the tool-call view

    At a reasonable max-height (~400pt), scaled-to-fit, with corner radius.

  8. Markdown data: images render

    If the agent emits ![](data:image/png;base64,...) in its response text, the renderer displays it.

  9. HTTPS images stay stripped

    If the agent emits https://...png markdown, the renderer still drops it — preserving the original security guarantee.

  10. JWT expiry mid-call is transparent

    A 401 during a long image-gen call triggers the broker's existing refresh-and-retry path. The agent sees a successful response.

  11. Sign-out flips status reactively

    Without restarting Sarea. The row transitions to "Sign in to CoreSpeed" within one observation tick.

Scope estimate

ComponentLinesWhere
Catalog entry~7Sources/Resources/connector-catalog.json
ConnectorStore local-disable boolean + early-returns~25Sources/Connectors/ConnectorStore.swift
onSignInStateChanged reactive wiring~10Sources/Connectors/ConnectorStore.swift
ConnectorTile UI gating~10Sources/Views/Connectors/ConnectorTile.swift
Path-prefix derivation from mcpURL.path~6Sources/Stores/AppStore.swift, Sources/Connectors/ConnectorStore.swift
Broker URLSession timeout bump~6Sources/Brokering/CredentialBroker.swift
MarkdownSanitizer data-URI allowlist~5Sources/Utilities/MarkdownSanitizer.swift
ToolCallView .image branch~20Sources/Views/Chat/ToolCallView.swift
Icon asset (cherry-pick 3c5f560)~30Sources/Assets.xcassets/mcp-image-gen.imageset/
Total~120~45× smaller than the closed PR's 5244

Non-goals & risks

Non-goals

Risks

Large MCP responses over stdio A 3 MB base64 PNG goes broker → agent stdio in one JSON-RPC line. LineReader (Claude) and CodexLineReader handle arbitrarily large lines per CLAUDE.md, but stress-test once during implementation. If a framing issue appears it's a codec fix, not a broker fix.
Empty mcpURL.path would match every request Skip .corespeed entries whose mcpURL.path is empty or "/" at route-build time and log a warning. ~3 lines.
Markdown sanitizer regression surface Changing visitImage from "strip all" to "allow data:" alters the sanitizer's contract. Add a unit test asserting that ![](https://attacker/track) still strips and ![](data:image/png;base64,AAAA) passes through.
NSImage decode failures fail-silent Malformed base64 or an unsupported MIME returns nil from NSImage(data:); the tool-call view shows nothing for that block. Acceptable in v1 — log to console; surfacing a "broken image" placeholder is a follow-up if it ever happens in practice.
Future: live-session credential swap Out of scope here, but worth tracking. Token rotation only affects future sessions because subprocess env is frozen at spawn. If the gateway moves to per-request credential renegotiation we revisit. Today's broker-side refresh covers JWT expiry.