Validation Report · Complete

Memory Candidate Validation

2026-05-14 · Docker-only candidate tests · branch feat-memory

Summary

Docker-only validation supports the v1 recommendation: use Mem0 behind a CoreSpeed Memory Service, expose a CoreSpeed-owned MCP wrapper, and keep Graphiti as a phase 2 challenger for organization and temporal memory.

Passed validated

  • Mem0 self-hosted API started in Docker after temp-only image/env fixes.
  • Add/search/list/update/get/delete passed with real memory IDs using a Dockerized OpenAI-compatible mock.
  • User A and User B stayed isolated through filters.user_id and list paths.
  • CoreSpeed-style MCP wrapper exposed the intended tools over Streamable HTTP.
  • Disabled mode returned a controlled MCP error.

Not proven follow-up

  • Real extraction quality, deduplication, and semantic ranking with the intended model backbone.
  • Production ownership checks for update/delete by memory id.
  • Sessionful MCP client behavior beyond stateless Streamable HTTP requests.
  • Graphiti write/read path, blocked by the mock lacking OpenAI /v1/responses.

Environment

DockerDocker version 29.4.0, build 9d7ad9f
ComposeDocker Compose version v5.1.2
Real model keyNo real key was used. A Docker-only OpenAI-compatible mock was published on host port 8894.
Temp workspace/private/tmp/sarea-memory-spike
Network policySandboxed curl could not reach Docker-published localhost ports; HTTP probes required escalated local network access.

Mem0 Results

Mem0 was run from a temp checkout at /private/tmp/sarea-memory-spike/vendors/mem0. The bundled Docker setup needed temp-only fixes: dashboard build moved from node:20-alpine to node:22-alpine, pnpm@9.15.9 was pinned, and explicit Postgres environment values were added so blank example values did not override server defaults.

ProbeResultNotes
StartuppassAPI reached /docs after mock credentials and Postgres env fixes.
AddpassUser A and B writes produced real UUID memory IDs.
Search/listpassPOST /search passed with filters.user_id; top-level user_id is not accepted by current Mem0 search.
User isolationpassUser A search returned only A; User B search returned only B; deleting A did not delete B.
Update/deletepassUpdate, get-after-update, delete, and list-after-delete passed against a real user A memory id.
Metadatapassuser_id, agent_id, run_id, session metadata, source, candidate, and update marker were preserved.
Extraction qualitymockedThe mock proves extraction wiring and persistence path, not production quality.
Isolation evidence. The tested REST paths support the v1 personal-memory requirement when the CoreSpeed service always supplies filters.user_id from the authenticated token subject.

MCP Wrapper Results

A throwaway CoreSpeed-style wrapper ran at /private/tmp/sarea-memory-spike/wrapper, backed by the Docker-only Mem0 and mock OpenAI stack.

ProbeResultNotes
HealthpassGET /health returned memoryEnabled: true, Mem0 URL, and the fixed spike user id.
InitializepassStateless MCP initialize returned HTTP 200 with Streamable HTTP SSE framing.
Tool discoverypassExposed search_memory, remember, update_memory, delete_memory, and list_memory.
Remember/search/listpassThe wrapper wrote and found a memory with provenance metadata.
Update/deletepassParsed the returned id, updated the text, deleted it, and verified the list was empty.
Disabled modepassMEMORY_ENABLED=false returned HTTP 403 with JSON-RPC error -32000 and message Memory is disabled.
JSON-only Acceptfail expectedThe MCP SDK rejects Accept: application/json alone with HTTP 406.
Integration requirement. Claude/Codex clients must send Accept: application/json, text/event-stream and parse data: events. Successful responses are SSE-framed, not bare JSON.
Production blocker to fix. The PoC update/delete tools accepted only a memory id and called Mem0 without an explicit ownership check. Production must verify the id belongs to the authenticated user before mutation.

Graphiti Challenger

Graphiti was run from /private/tmp/sarea-memory-spike/vendors/graphiti using its FalkorDB Docker compose path. Host ports were patched only because local containers already owned 6379 and 3000.

ProbeResultNotes
StartuppassFalkorDB and Graphiti MCP containers reported healthy.
HealthpassGET /health returned {"status":"healthy","service":"graphiti-mcp"}.
MCP discoverypassRequired an mcp-session-id from initialize for subsequent tools/list.
ToolsobservedGraph-oriented tools included add_memory, search_nodes, search_memory_facts, get_episodes, clear_graph, and get_status.
Write/readblockedBackground processing failed because current Graphiti uses OpenAI /v1/responses, which the mock did not implement.
Scope isolationwrapper neededNatural isolation uses group_id / group_ids; Sarea would need sanitized derived groups and ownership checks.
Concrete integration hazard. A hyphenated group id such as sarea-user-a triggered a FalkorDB RediSearch syntax error; a non-hyphenated sarea_user_a returned a clean empty result.

Graphiti remains interesting for later org memory because it models episodes, entity nodes, relationship facts, temporal metadata, and invalidated facts. That extra power is also why it is too heavy for the first personal-memory release.

Recommendation

Proceed with Mem0 for v1. Keep Graphiti as a v2 challenger after gateway-backed validation with a Responses-compatible model provider and an explicit Sarea wrapper design.

  1. Mem0 meets the v1 shape

    It passed scoped REST CRUD and search/list paths with real memory ids in Docker.

  2. The CoreSpeed wrapper model works

    The PoC exposed the intended tools, preserved provenance, and enforced disabled mode.

  3. Self-hosting is viable

    Mem0 started from source with temp-only Docker/env fixes; no hosted memory dependency is required for v1.

  4. Graphiti should wait

    It is operationally heavier, graph-shaped, sessionful over MCP, and needs more scope-safety work.

Open Risks

Real model quality untested. Run a gateway-backed validation for extraction, deduplication, and retrieval ranking before production rollout.
Mem0 search shape. Current Mem0 requires filters.user_id for scoped search. Do not rely on top-level user_id for search.
Dashboard healthcheck noise. The Mem0 dashboard was externally reachable, but Compose still reported its internal healthcheck as unhealthy.
Ownership enforcement. Production update/delete must verify ownership before mutating by id. This is a service-wrapper responsibility, not something agents can be trusted to supply.

Evidence

Report sourcedocs/superpowers/spikes/2026-05-14-memory-candidate-validation.md
Spec sourcedocs/superpowers/specs/2026-05-13-memory-system-design.md
Spike artifacts/private/tmp/sarea-memory-spike/results
Final report commit9b8d2b7 docs: finalize memory validation recommendation
CleanupSpike containers were stopped; artifacts remain under /private/tmp/sarea-memory-spike for inspection.