GriSearch Feature Expansion

Competitive gap analysis → 7-phase build plan + critical fix · 5x adversarial-tested

Createds200 (2026-04-06)
Phases0 – 8 + Critical Fix
Est. Effort21 – 30 sessions
Pressure Tests5 rounds, 69 findings

Executive Summary

Build ProgressPhases 0, 1, 3, 5, CF complete — Phase 6: News + Product done (s214), Phase 2 built (testing)
PhaseThemeEffortStatus
0Context + Config Surface2–3 sessionsComplete (s204)
1Polish & Quick Wins + JS Extract2–3 sessionsComplete (s205)
2Voice Output (TTS)2–3 sessions2A-2D built, browser testing pending
3Agentic Deep Research4–6 sessionsComplete (s205–s206)
CFFollow-Up Query Resolution (Critical Fix)2–3 sessionsComplete (s210–s211)
4Visual & Rich Content2–3 sessions4B+4C done, 4A in progress (s212)
5Advanced Organization3–5 sessionsComplete (s213)
6Specialty Search1–2 each6C News + 6A Product done (s214)
7Tabular Data & Spreadsheets3–5 sessionsNot Started
8Context Intelligence3–5 sessionsNot Started

Build order: Phase 0 → 1 → 3 → 2 → CF (Critical Fix) → 4 → 5 → 6. CF jumps the queue — follow-up queries are fundamentally broken without query resolution.

1. Don't chase parity for parity's sake. Only build features that serve the actual research workflow.
2. Preserve the speed advantage. Quick mode must stay under 3s.
3. Build on what's unique. Group context, personalization, multi-provider diversity are moats.
4. Incremental value delivery. Every phase ships something usable.
Multi-Provider Search Architecture

GriSearch sends every query to three independent search providers simultaneously, merges and deduplicates results, then re-ranks the unified pool. This is the same multi-retrieval pattern used by Perplexity, Google AI Mode, and ChatGPT search.

ProviderStrengthIndexLatency
BraveFastest latency, strong keyword precision, independent 30B+ page indexOwn~670ms
ExaSemantic understanding, spam filtering, high-signal authoritative contentOwn~2s
ParallelStrong accuracy-to-cost ratio, independent ranking perspectiveOwn~5-14s

Pipeline

User Query
ASYNC FAN-OUT
Brave
~670ms · keyword-strong
Exa
~2s · semantic search
Parallel
~5-14s · independent rank
Merge + Dedup
Re-Rank
Ranked Results
LLM Synthesis
Answer + Citations

Why Three Providers?

BenefitMechanism
Better recallThree indexes catch what one misses
Better precisionCross-provider agreement filters noise
ResilienceIf one API goes down, the other two still work
SpeedAsync fan-out = as fast as the fastest provider (with timeouts)
No vendor lock-inCan swap providers without rewriting the system
Quality signalDedup overlap acts as an implicit relevance vote

Benchmark data (2025-2026) shows the top 4 search APIs are statistically indistinguishable on quality individually. The winning strategy is to use multiple providers and let the combination outperform any single one.

Already Built (Pre-Plan)

GriSearch core was built across s197–s200 before this expansion plan was created.

s201: Plan page, /plans index, inbox replay fix, config schema (38 keys), settings cleanup, thread archival + health logging, defaults unified.
s204: Phase 0 complete (0A-0D). Per-mode context scaling, XML-tagged exchanges, full config surface + tuning panel, metrics logging + rolling averages, limit warnings, mode badges, Opus option, table rendering, JS extraction to static file, validation hooks.
s205: Phase 1 complete (1A-1D). Phase 3A shipped. Progress indicators, export, result previews, thread context menu, project creation. Research mode: Sonnet planner, Haiku quality gate, multi-step loop, cost controls, Sonnet synthesis.
s206: Phase 3B-3D shipped. GraySearch → GriSearch rebrand. Research timeline table, stop & summarize, structured report cards, "Dig deeper" buttons, first-use explainer. Persistent research data. Cloudflare Pages deploy. Citation table UI.
s208: Citation numbering fix. Collapsible sources list. Spine crash root cause fixed (importlib.reload memory leak). Health logging added.
s213: Phase 5 complete (all 4 sub-phases in 1 session). 5A: cross-thread corpus search. 5B: research notebook (pins, collections, export, 9 endpoints). 5C: desktop DnD (thread moves, reorder persistence). 5D: branched conversations. Also: research retry/rephrase/escalation, synthesis failure caching, "Retry Search" button, XSS fix, notebook info panel, "New Search" pill.
s214: Phase 6C News Mode + Phase 6A Product Research complete. Brave News API with freshness params, news query detection + auto-suggest, _NEWS_SYSTEM prompt. Product mode with review site boosting (11 domains), enriched query expansion, _PRODUCT_SYSTEM Sonnet synthesis with comparison tables. 15 new config schema keys. Two new mode pills (orange News, teal Product).

Table of Contents

  1. Executive Summary
  2. Multi-Provider Search Architecture
  3. Already Built (Pre-Plan)
  4. Phase 0A: Per-Mode Context Scaling
  5. Phase 0B: Context Format Upgrade
  6. Phase 0C: Unified Config Surface
  7. Phase 0D: Config UI (Tuning Panel)
  8. Phase 1A: Observability & Cleanup
  9. Phase 1B: Export / Report Generation
  10. Phase 1C: Search Progress Enhancement
  11. Phase 1D: Search Result Previews
  12. Phase 1E: Extract JS to Static File
  13. Phase 2: Voice Output (TTS)
  14. Phase 3A: Research Agent Architecture
  15. Phase 3B: Progress Streaming
  16. Phase 3C: Report Generation
  17. Phase 3D: UI Integration
  18. Research Retry & Recovery
  19. Critical Fix: Follow-Up Query Resolution
  20. Phase 4: Visual & Rich Content
  21. Phase 5: Advanced Organization
  22. Phase 6: Specialty Search
  23. Phase 7: Tabular Data & Spreadsheet Intelligence
  24. Phase 8: Context Intelligence
  25. Adversarial Review Record
Phase 0APer-Mode Context Scaling

Replace the single max_exchanges=5 / 600 char truncation with per-mode strategy. Current usage is 1.6–8.2% of the 200K context window.

Modemax_exchangesanswer_truncationToken Budget
Quick5800 chars~1,000 tokens
Deep+Summary102,000 chars~5,000 tokens
Deep+Full204,000 chars~10,000 tokens
Research30No truncation~15,000 tokens
  • Refactor get_thread_context() to accept max_exchanges + max_answer_chars params (s204)
  • Route handler passes mode-appropriate limits from cfg (s204)
  • Add input token logging: synthesis + expand_query (s204)
  • Quick mode query unchanged (<600 extra chars, preserves <3s target)
  • Per-search metrics logging -- rolling 20/mode to search_metrics.json (s204)
  • Rolling averages in tuning panel (muted orange, per applicable control) (s204)
  • Graceful limit handling -- amber inline warnings when limits hit (s204)
  • Opus model option + Basic/Advanced tier toggle + descriptions (s204)
  • Mode-colored labels in tuning panel matching inline badge colors (s204)
  • Modified-from-default indicator (green *) on changed values (s204)
  • Per-field tradeoff descriptions with click-to-expand (s204)
  • Averages expanded to cover all 38 config fields (s204)
  • group_context_chars metric added to all pipelines (s204)
Round 2 C-1: Token budget estimates measure conversation context ONLY. Full prompt = system (~200 tok) + user context (~750) + group context (~500-700) + search passages (up to ~10,000) + conversation context. Research synthesis could reach 30,000+ tokens ($0.10-0.50). Token+cost logging must ship before expanding limits.
Phase 0BConversation Context Format Upgrade

Upgrade from plain User:/Assistant: to XML-tagged exchanges with mode and citations.

  • XML-tagged exchanges with mode attribute (s204)
  • Include exchange mode tag (quick vs deep calibration) (s204)
  • Include citation URLs in <sources> block, top 5 per exchange (s204)
  • Mode badge on each response (color-coded top + bottom with token counts) (s204)
  • Improved thread title generation (few-shot prompt, answer-rejection guard) (s204)
  • Research mode button placeholder (disabled, Phase 3) (s204)
  • Color-coded mode selector buttons (s204)
  • Markdown renderer: tables, ### headings, --- dividers, tighter spacing (s204)
<exchange n="1" mode="quick">
<query>Best espresso machine under $500?</query>
<answer>The Breville Barista Express...</answer>
<sources><url>https://example.com/review</url></sources>
</exchange>
Phase 0CUnified Config Surface ("Sliders")

Centralize all tunable limits. Code defaults in git-tracked settings.yaml. Browser-written overrides in .gitignored config/grisearch_tuning.yaml. Runtime merges both, tuning takes precedence. Config snapshot pattern prevents mid-search TOCTOU races.

GroupKeysExamples
Context Limits8max_exchanges, max_answer_chars per mode
Models6synthesis model per mode, planner, quality gate
Token/Cost7max_tokens per mode, cost ceiling, Brave rate limit
Research Agent4max_rounds, sub_questions, wall time, concurrency
Search Providers8timeouts, max results, max extract pages
Auto-Brief4exchanges/thread, truncation (normal vs research)
Thread Health1size warning threshold (KB)
  • Add all 38 schema keys to settings.yaml under grisearch: (s204)
  • Create config/grisearch_tuning.yaml (.gitignored) for browser overrides (s204)
  • Update _get_settings(): merge defaults + tuning overrides
  • Remove dead Settings() no-arg call from _get_settings()
  • Build GRISEARCH_CONFIG_SCHEMA (38 keys, 8 groups) as single source of truth
  • Config snapshot: pipelines call _get_settings() once, pass cfg downstream (s204)
  • All cfg.get() fallbacks reference _default() from schema
  • Replace Path(__file__).parent.parent with env var (s204)
  • Hide config keys for unbuilt features until they ship (s204, R3-13)
Phase 0DConfig UI (Tuning Panel)

In-browser config editor on the GriSearch page. Gear icon opens settings panel. Both API endpoints behind web auth. Cost previews labeled as estimates with tooltip caveat.

Config TypeControlExample
Integer limitsSlider + stepper5 [---o-----] 30
Cost ceilingsStepper ($0.05)$0.50 [-] [+]
Model selectionDropdown[claude-haiku-4-5 v]
0 = unlimitedToggle + stepper[x] Limit: 4000
  • GET /api/search/config returns config + schema metadata (s204)
  • POST /api/search/config validates + merges overrides into tuning YAML (s204)
  • POST /api/search/config/reset clears all overrides (s204)
  • Config schema with type/range validation (s201)
  • Grouped sections, auto-generated from schema (s204)
  • Live cost/token impact preview (deferred — averages in tuning panel serve this need)
  • Instant apply -- no restart needed (s201)
  • "Reset all" button + modified values highlighted green (s204)
  • Tuning panel via hammer icon in header (s204)
  • JS extracted to static/js/search.js (no more {{}} escaping) (s204)
  • PostToolUse hook for rendered JS validation on views/*.py (s204)
  • Pre-restart validation: scripts/validate_views.py (s204)
Phase 1AObservability & Cleanup
  • Add log.info for model/mode in synthesize() (s204)
  • Per-search cost logging: input_tokens, output_tokens, model, cost (s204)
  • Fix Dict[tuple, Any] type annotation (s204)
  • Fix _REPO_ROOT = Path(__file__).parent.parent (s204)

Thread Health Monitoring

  • Log file size + exchange count on every save_exchange()
  • Primary: exchange count color dot (green <10, yellow 10-20, red >20) (s205)
  • Thread list shows exchange count indicator per thread (s205)
  • MCP get_stack_status includes thread health summary (deferred — operational tooling)

Thread Archival

  • Archival runs inside save_exchange() (atomic, no race conditions)
  • After N exchanges (configurable, default 20), move older to archive
  • load_thread_full() for complete history (deferred — no threads near archive threshold)

Roadmap: Per-exchange storage (solution C) if archival proves insufficient.

Phase 1BExport / Report Generation

Markdown Export (MVP)

  • GET /api/search/thread/{id}/export?format=md (s205)
  • Title as H1, exchanges as H2, citations as footnotes (s205)
  • Export button on thread bar + mobile share sheet (s205)
  • File named {title}_{date}.md (s205)

HTML Export (stretch)

  • Same endpoint with format=html, print-friendly (deferred — Markdown covers the need)

Platform UX: Desktop: browser download. Mobile (Safari): navigator.share() with fallback.

Phase 1CSearch Progress Enhancement
  • During expanding: yield sub-queries as detail line (s205)
  • During reading: yield URLs, show unique domains (s205)
  • During searching: show providers ("Searching Brave + Exa...") (s205)
  • Elapsed time display (running timer, 500ms update) (s205)
Phase 1DSearch Result Previews
  • Preview cards: favicon + title + domain + date + snippet (2-line clamp) (s205)
  • Cards collapse to compact chips on synthesis start (s205)
  • Mobile: 44px min-height, vertical stack (s205)
Phase 1EExtract JS to Static File

Prerequisite for Phase 2+. views/search.py = 1,113 lines of double-brace-escaped JS in Python template strings.

  • Extract search JS into static/js/search.js (s204)
  • PostToolUse validation hook + validate_views.py (s204)
  • Extract remaining JS from willy.py, pages.py, dashboard.py (deferred — S-14)
  • Migrate createScriptProcessorAudioWorkletNode (deferred — Phase 2 prerequisite)
Phase 2Voice Output (TTS)

Complete the voice loop. SSE with base64 audio chunks (proven tunnel-compatible).

2A. TTS Provider

  • Evaluate: Deepgram Aura, ElevenLabs, OpenAI TTS, Cartesia (s208)
  • Criteria: <500ms TTFB, natural voice, <$0.01/search (s208)
  • Build lib/tts.py — Deepgram Aura-2 REST streaming (s208)

2B. Streaming Pipeline

  • SSE audio_start/chunk/done events (base64 MP3) (s208)
  • Web Audio API decode + queue playback (s208)
  • End-to-end test: SSE pipeline streams audio chunks (s208)
  • Tap/click interrupt: toggle, indicator click, new search (s208)

2C. Voice Flow

  • "Voice mode" toggle (speaker button, green active state) (s208)
  • Auto-listen after TTS finishes: mic activates after ducking delay, AUTO toggle button (s212)
  • Audio ducking: _gsTTSPlaying gate on mic start + audio processor, configurable delay (500ms default, 800ms for Bluetooth) (s212)
  • Bluetooth auto-detect via enumerateDevices(), extends ducking delay (s212)
  • Voice preferences: localStorage persistence for voice mode, auto-listen, ducking delay (s212)

2D. Smart TTS

  • Strip citations/URLs/markdown before TTS (_strip_for_tts) (s208)
  • Truncate long answers at sentence boundary (4000 char cap) (s208)
  • Mode-aware TTS length: Quick/Deep+S full (4000), Deep+F/Research first ~2 paragraphs (1500) (s212)
  • Table-to-prose conversion (_table_to_prose) for natural reading (s212)
  • Code block stripping, inline code cleanup, Sources: line removal (s212)
  • Dangling preposition cleanup after URL removal (s212)
  • Truncation indicator: audio_done.truncated flag + muted UI notice (s212)
  • Live browser test + voice quality tuning (deferred to tonight)
Phase 3AResearch Agent Architecture

Multi-step autonomous research via non-streaming wrappers over existing search functions.

User query → [Planner/Sonnet] → [Quality Gate/Haiku]
  → [Research Loop] → [Synthesizer] → [Structured Report]

Search Wrappers

  • search_and_summarize(): consumes async generator, returns dict (s205)

Cost Control (Round 2 C-2)

  • Shared Brave rate limiter (asyncio.Semaphore)
  • Per-research cost ceiling (default $0.50) (s205)
  • Hard cap: research_max_brave_calls (default 20) (s205)
  • Cost estimate shown before research starts

Lifecycle (Round 2 R-4)

  • Cancellation flag via asyncio.Event (s206)
  • Concurrent limit config: research_concurrent_limit (s205)
  • Cancellation on tab close (beforeunload)
  • Persist partial findings to disk

Agent Loop

  • SSE streaming (non-blocking via async generator) (s205)
  • Sonnet planner + Haiku quality gate (s205)
  • Max 5 rounds, 5 min wall time, 3-8 sub-questions (s205)
  • Per-sub-question mode selection (quick vs deep_summary) (s205)
  • Structured scratchpad per sub-question (s205)
Phase 3BProgress Streaming
  • SSE research_progress event (step, total, sub_question, status) (s205)
  • Vertical timeline with status indicators (pending/spinner/check/fail/skipped) (s206)
  • Running timer + step counter in panel header (s206)
  • "Stop and summarize" button + POST /api/search/research/cancel (s206)
  • "Also consider..." redirect input (mid-research constraint injection)
Phase 3CReport Generation
  • Sonnet final synthesis from all findings (s205)
  • Structured report: Summary, Findings, Open Questions, Sources (s205)
  • Saved as thread with mode: "research" (s205)
  • Auto-export to group directory
  • Brief weighting config: 4 exchanges/thread, 800-char truncation (s205)

Roadmap: Separate brief section (C) after evaluating real output.

Phase 3DResearch UI Integration
  • Fourth mode pill: "Research" (green, #10b981) (s205)
  • First-use explainer via localStorage (s206)
  • Full-width report card (.gs-report, 95% width, green border) (s206)
  • "Dig deeper" button on subsection headings (switches to Deep+Full) (s206)
EnhancementResearch Retry & Recovery (s213)

Comprehensive failure recovery for the research agent and all search modes. Previously, failed sub-questions were silently skipped and synthesis failures lost all findings.

Sub-Question Retry / Rephrase / Escalation

  • _rephrase_sub_question() — Haiku rephrases failed sub-questions from a different angle
  • _execute_research_step() — 3-tier recovery: retry → rephrase+retry → mode escalation (quick→deep_summary), up to 4 attempts per sub-question
  • UI: retry/rephrase/escalation badges on research timeline steps, "N attempts exhausted" on final failure

Research Synthesis Failure Recovery

  • cache_research_findings() — caches findings on synthesis failure so sub-question work isn't lost
  • retry_research_synthesis() — retries final synthesis from cached findings (skips all sub-question searches)
  • /api/search/retry auto-detects research findings cache and routes to correct retry function
  • UI: "Retry Synthesis" button with "Uses cached findings (skips sub-question searches)" hint

Non-Research Retry

  • "Retry Search" button on Quick/Deep/Deep+Full failures (re-submits same query, same mode)
Critical FixFollow-Up Query Resolution

Discovered: s210 (2026-04-07). Follow-up queries in threads produce garbage search results because query expansion has no access to conversation history. Every major AI platform rewrites follow-ups before searching — GriSearch did not. Effort: 2–3 sessions. Pressure tested: 2 rounds, 28 findings, all resolved.

CF.0 Discovery & Analysis (s210)

  • Identified bug: Exchange 5 in Iran war thread returned FedEx/K-pop results for "Deliver updates since the last review"
  • Root cause analysis: 7 blind spots across 4 search modes where conversation_context is in scope but not forwarded to query expansion
  • Mapped full query flow: expand_query(), _research_plan(), get_thread_context(), all 4 search modes, SSE endpoint
  • Confirmed synthesis receives context (answer referenced prior briefing) but search queries were decontextualized

CF.1 Industry Research (s210)

  • Researched 7 platforms: ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, OpenAI Deep Research
  • Reviewed academic SOTA: CHIQ history enhancement, conversational query reformulation, RAG multi-turn patterns
  • Key finding: every platform rewrites follow-ups; only Perplexity Pro and Copilot show the rewrite to users
  • Key finding: Google/Elastic use original + rewritten in fan-out (never fully replace)
  • Key finding: CHIQ topic switch detection is academic SOTA for preventing stale context pollution
  • Identified 4 gaps plan must address: query fan-out, topic switch, rolling summarization, error accumulation

CF.2 Plan Design & Adversarial Testing (s210)

  • Designed 5-phase fix: query resolution, context summarization, SSE events + frontend, research mode, cost tracking
  • 16-issue walkthrough with RG — each issue presented, discussed, agreed or modified
  • Round 4 adversarial: 15 findings (1 critical, 4 high, 6 medium, 4 low)
  • Round 5 adversarial: 13 findings (1 critical, 4 high, 5 medium, 3 low)
  • All 28 findings resolved and incorporated into final plan

CF.3 Query Resolution (backend)

  • New config keys: synthesis_model_resolution (Sonnet default), enable_follow_up_resolution, resolution exchange/char limits
  • Add skip_resolution param to all 3 search functions + search_and_summarize()
  • New resolve_follow_up(): structured JSON return with topic_switch detection, robust 5-step JSON parsing fallback chain
  • Resolution-specific context truncation (5 exchanges, 800 chars — independent of per-mode synthesis limits)
  • Wire into search_deep_full using search_query variable pattern (original bug trigger)
  • Wire into search_quick and search_deep_summary
  • Search fan-out: [original, resolved] + expand(resolved) — original as safety net per Google/Elastic pattern
  • Pass resolved query to synthesize() (not raw original)
  • Similarity check: Jaccard of lowercased word sets, suppress display if >0.85

CF.4 Context Summarization (summary-beyond-window)

  • New ensure_context_summary() async function (keeps get_thread_context() synchronous)
  • Summary cache in thread JSON, keyed by max_exchanges, hash-based invalidation (handles archival, deletion, mode switches)
  • Config schema caps: Quick max=5, Deep+S max=10, Deep+F and Research uncapped
  • All modes: older exchanges summarized beyond window (never dropped)
  • get_thread_context() reads cached summary, prepends <context_summary> block

CF.5 SSE Events + Frontend

  • New query_resolved SSE event + emerald green .gs-msg-resolved block ("INTERPRETED AS")
  • New topic_switch_detected SSE event + amber .gs-msg-switch prompt
  • Topic switch UX: "Continue in this thread" / "Start new search" buttons
  • force_continue data flow: _gsForceResume flag → POST body → skip_resolution
  • Clean fetch abort on topic switch (prevent SSE race condition)
  • save_exchange(): persist resolved_query field (omit when empty)
  • Thread history: render green block for stored resolved_query with backward compat guard

CF.6 Research Mode + Metrics

  • Pass search_query to _research_plan() directly (no conversation_context needed — resolved query is self-contained)
  • Sub-loop: skip_resolution=True via search_and_summarize()
  • Final synthesis: compact preamble ("follow-up, resolved to: ..."), not full history
  • Resolution cost tracking: resolution_input_tokens, resolution_output_tokens, resolution_cost_usd in metrics + answer meta line

CF.7 Verification

  • Live test: Iran war thread follow-up ("Deliver updates since the last review")
  • Live test: topic switch detection ("What's the weather in San Diego" in Iran thread)
  • Live test: similar query suppression (no green block for already-specific queries)
  • Live test: force_continue flow
  • Live test: summary caching across modes

Roadmap (not in this build): Embedding-based topic switch detection (cosine similarity). Error accumulation monitoring (resolution quality tracking over long threads).

Full spec: ~/.claude/plans/grisearch-follow-up-context-fix.md

Phase 4Visual & Rich Content

4A. Image Search

  • Brave Image Search API: _search_brave_images() + ImageResult model (s212)
  • Parallel image search in Deep+Summary and Deep+Full pipelines (s212)
  • SSE image_results event with thumbnail grid (3-col desktop, 2-col mobile) (s212)
  • Click-to-expand: full image overlay + source link + dimensions (s212)
  • Collapsible IMAGES header (s212)
  • Image upload for reverse search → moved to Phase 9 (Document & Image Upload System)
  • Storage lifecycle → moved to Phase 9 (two-tier: ephemeral 7-day + persistent indefinite)

4B. Rich Result Cards

  • Query-type detection: weather, quick fact, comparison, timeline via regex patterns (s212)
  • Synthesis format hints: type-specific prompt suffix guides structured output (s212)
  • Comparison: "X vs Y" triggers table-formatted synthesis (existing table renderer handles it) (s212)
  • Timeline: "history of" triggers chronological list; vertical timeline CSS renderer with date dots (s212)
  • Quick facts: bold answer lead, compact format hint (s212)
  • Weather: structured conditions + forecast hint (s212)

4C. Location-Aware

  • search_location + search_country_code config keys in tuning panel (str type) (s212)
  • "Near me" / "nearby" / "local" detection via regex, replaces with configured location (s212)
  • Brave country param wired to all 3 search pipelines (quick, deep_summary, deep_full) (s212)
  • Location badge in header (muted, auto-hidden when empty) (s212)
  • Badge updates on tuning save (s212)
  • Dynamic browser geolocation: "Use Location" toggle in mode row, detects on new thread, Nominatim reverse geocode, badge updates (s212)
Phase 5Advanced Organization

5A. Cross-Thread Search Complete (s213)

  • GET /api/search/corpus — full-text search across all threads + archives, title/query/answer scoring (3x/2x/1x) + recency boost, group filter
  • Lazy-build corpus index, persisted to corpus_index.json (5-min TTL), invalidated on save/delete, ~0.4ms cached load
  • History panel search input with 300ms debounce, results replace thread list, click loads thread + scrolls to exchange with highlight
  • "Search within this group" filter via group_id API param
  • Race condition protection (sequence counter), XSS fix (javascript: URL blocking in markdown links)

5B. Research Notebook Complete (s213)

  • Pin button (★) on every answer bubble — pin/unpin toggle, duplicate prevention, yellow highlight
  • Pin data layer: pinned.json with thread_id, exchange_index, query, answer_snippet, collection_id
  • Collections CRUD: create, delete, rename, list with pin counts, assign pins to collections
  • Notebook section in history panel (yellow theme, collapsible collections, unsorted section)
  • Collection Markdown export — download with collection name, notes, each pin as section
  • 9 new API endpoints: pin, unpin, list pins, assign pin, list/create/delete/rename collections, export
  • Auto-suggest pins after Deep+Full/Research (deferred)

5C. Drag-and-Drop Thread Organization Complete (s213)

  • Desktop: HTML5 DnD with drag handles + drop targets on group headers (UA-gated, mobile excluded)
  • Drop targets: group headers glow blue on dragover, "Recent" = unassign from group
  • Thread reorder within groups (thread_order in groups.json) + POST /api/search/groups/{id}/reorder
  • Group reorder (group_order in groups.json) + POST /api/search/groups/reorder
  • History panel respects both order fields (fallback to updated_utc/most-recent)
  • Mobile: context menu flow preserved (no DnD)

Reuses existing api_search_group_assign for moves — no new backend for basic group assignment. Sort order APIs are new. Desktop-only initially; mobile DnD (long-press or polyfill) evaluated after desktop ships.

5D. Branched Conversations Complete (s213)

  • "Branch here" button on each exchange in thread replay — creates new thread with exchanges up to branch point
  • Branch metadata (branched_from) + group inheritance + corpus index invalidation
  • Branch does NOT bump brief counter — copied exchanges already counted
  • Cyan branch icon on branched threads in history panel
Phase 6Specialty Search

6C. News Mode Complete (s214)

  • Brave News API (_search_brave_news) + dual-source (News + Web)
  • News query detection with auto-freshness hints (pd/pw/pm)
  • Recency-first sorting + news source boost
  • _NEWS_SYSTEM synthesis prompt (what changed, attribution)
  • Orange News mode pill (#f97316) + mode auto-suggest
  • 8 config schema keys (model, tokens, freshness, etc.)
  • Timeline rendering (deferred to 4B)
  • "Follow this topic" (deferred to 6D)

6A. Product Research Complete (s214)

  • _detect_product_query with patterns + false-positive exclusions
  • Enriched query expansion targeting review sites (Wirecutter, RTINGS, Reddit)
  • _PRODUCT_SYSTEM synthesis: recommendation, comparison table, pros/cons
  • Review domain boost (11 review sites scored higher)
  • Teal Product pill (#14b8a6) + mode auto-suggest
  • 7 config keys (Sonnet model, 3000 tokens, 8 extract pages)

6B. Academic/Technical

  • Semantic Scholar API + citation scoring

6D. Recurring Search

  • "Watch this" (max 5, Quick only, cost estimate)
  • 6-hour re-run + URL dedup + unread badges
Phase 7Tabular Data & Spreadsheet Intelligence

Theme: Accept, analyze, transform, and export structured data across all modes. Dedicated Data mode for analysis-heavy workflows. Effort: 3–5 sessions.

7A. Planning & Requirements

  • Competitive analysis (ChatGPT, Gemini, Copilot tabular UX)
  • Catalog RG's actual tabular workflows from ChatGPT history
  • Formula scope ranking by usage
  • Adversarial review of spec

7B. Tabular Input (all modes)

  • Paste detection (TSV/CSV) with table preview
  • Context injection as fenced CSV block
  • File upload: CSV (client-side) + Excel (openpyxl)
  • Size limits (~5K rows / 500KB)

7C. Tabular Output & Export (all modes)

  • CSV download button on each rendered table
  • Copy table as TSV to clipboard
  • Excel export (.xlsx via openpyxl)
  • Multi-table support + "Download all"

7D. Formula Generation

  • Synthesis prompt for formula requests (Excel vs Sheets toggle)
  • Monospace code blocks with copy button + explanation
  • All major categories: lookup, conditional, financial, array, text, date
  • Optional formula validation (verify output)

7E. Data Mode (dedicated)

  • "Data" mode pill — no web search, direct Claude analysis
  • Specialized analysis prompt (stats, insights, suggest visualizations)
  • Multi-turn analysis with table context carried in thread
  • Computed columns: generates formula AND fills values

7F. Future (not building yet)

Chart generation, Google Sheets integration, SQL-like queries, pivot table builder, data persistence across sessions.

Phase 8Context Intelligence (Active + Passive Learning)

Theme: Make GriSearch progressively smarter about user preferences and research patterns. Active interviews + passive extraction + enhanced auto-briefs. Effort: 3–5 sessions.

8A. Planning & Requirements

  • Audit current context injection chain (user, project, thread, conversation)
  • Catalog preference types (source, format, domain, constraint, fact)
  • Review ChatGPT memory system (learn from their mistakes)
  • Adversarial review of spec

8B. Passive Preference Extraction (all modes)

  • Post-synthesis Haiku extraction: 0-3 new preferences per exchange
  • Category tagging: format, source, domain, constraint, fact
  • Dedup + merge against existing search_preferences.md
  • Staleness handling: timestamp entries, replace contradictions
  • Transparency: extracted prefs visible/editable in Preferences panel
  • Kill switch in tuning panel (on for Deep modes, off for Quick)

8C. Active Context Interview (triggered)

  • Trigger: button in Project Notes + thread context menu + proactive suggestion
  • 3-phase flow: confirm existing → expand with probes → identify gaps
  • Questions displayed inline (conversation area, not modal)
  • Output: updated notes + extracted preferences + user review
  • Persist interview state for resume across sessions
  • Re-interview suggestion after 10+ new exchanges

8D. Thread-Level Context

  • Per-thread notes field (editable via context menu)
  • Thread auto-brief: full trajectory summary (not just recent)
  • Thread context injected into synthesis alongside project context

8E. Enhanced Auto-Brief

  • Dual-output: findings + preferences + open questions
  • Cross-project pattern extraction to global search_preferences.md

8F. Future (not building yet)

Preference confidence scoring, conflict detection, onboarding interview, preference analytics dashboard.

Phase 9Document & Image Upload System

Theme: Persistent, organized, searchable uploads that survive across sessions and threads. The #1 pain point with ChatGPT is upload amnesia — documents tied to a single conversation and forgotten next session. Effort: 3–4 sessions. Dependencies: Phase 4A (image search UI), Phase 5A (cross-thread search).

9A. Storage Architecture

  • Two-tier retention: ephemeral (7-day auto-clean) + persistent (indefinite, user-managed)
  • Metadata sidecar JSON (filename, tags, extracted text path, thread associations)
  • Per-user quotas: 10MB/file, 500MB/user persistent. Configurable per deployment.
  • Multi-tenant: per-user isolation + shared team library (Data/system/search/shared_uploads/)
  • Shared library: publish from personal, read-only refs, content-hash dedup, configurable quota
  • Admin storage dashboard: GET /api/search/admin/storage (per-user usage summary)

9B. Text Extraction & Indexing

  • PDF text extraction (PyMuPDF)
  • Image OCR/description (Claude vision API)
  • Extracted text stored alongside uploads, indexed for cross-thread search

9C. Upload UI

  • Upload button (paperclip icon in input row) + drag-and-drop
  • Post-upload choice: "Use for this search" vs "Save to library"
  • Inline preview: PDF first page + page count, image thumbnail

9D. Document Library

  • Slide-out library panel (same pattern as history/preferences)
  • Search within library by filename, extracted text, tags
  • "Reference this" button — injects document context into next search
  • Auto-tag on upload via Haiku

9E. Cross-Session Reference

  • @-mention documents: @suntsu-contract what are the termination clauses?
  • Auto-detect document references in queries, inject extracted text as <document_context>
  • Thread association tracking (usage history in library)

9F. Image Upload for Search

  • Reverse image search: upload → Claude vision describes → description enriches search
  • Mobile camera capture (accept="image/*" capture="environment")
  • Ephemeral by default, "Save to library" promotes to persistent

9G. Cleanup & Maintenance

  • Spine startup task: auto-clean ephemeral files older than 7 days
  • Storage quota enforcement on upload
  • Orphan cleanup (extracted text without matching upload)

Open questions: Document versioning (replace vs coexist), large document chunking (semantic chunk selection for 50+ page contracts), shared library moderation at 30 users, extraction cost at scale (local OCR fallback vs Claude vision), cross-deployment portability.

Adversarial Review Record

Round 1 (s200) — 15 findings

Initial confidence: MEDIUM. All addressed.

IDSeverityFindingResolution
C-1Critical1A is a non-issueReduced to logging + cleanup
C-2CriticalCan't reuse search generatorsNon-streaming wrappers in 3A
C-3CriticalWebSocket TTS won't work through tunnelSwitched to SSE-first
R-1RiskResearch blocks uvicorn workerBackground create_task
R-2RiskInline JS at breaking pointAdded 1E: JS extraction
R-3RiskRecurring search unbounded costCap 5 watches, Quick only
R-4RiskImage upload lifecycle missingPath, retention, max size defined
R-5RiskHaiku planner poor qualitySonnet + quality gate
G-1:4GapAPI degradation, Dict, index, duckingAll addressed in respective phases
Q-1:3QuestionPhase 6 order, export UX, build orderAll resolved

Post-Round-1 confidence: HIGH

Round 2 (s200/s201) — 12 findings

All addressed in s201 review with RG.

IDSeverityFindingResolution
C-1CriticalToken budget undercountsFull budget + cost logging first
C-2CriticalNo research cost capSemaphore, ceiling, confirmation
R-1RiskcreateScriptProcessor deprecatedMigrate in 1E
R-2RiskSSE audio may buffer200-500ms chunks, tunnel test
R-3RiskThread files unboundedMonitoring (A) + archival (B), C roadmapped
R-4RiskResearch no lifecycleRegistry, cancel, persist partial
G-1:4GapBT latency, model deprecation, JS risk, Path fixAll addressed
Q-1:2QuestionBrief weighting, branch counterA+B weighting, skip counter on branch

Post-Round-2 confidence: HIGH

Round 3 (s201) — 14 findings

Post-0C/0D additions. All addressed in s201.

IDSeverityFindingResolution
R3-1Criticalsettings.yaml git-tracked; browser writes = merge conflictsSeparate .gitignored tuning file
R3-2CriticalNo config caching; TOCTOU race mid-searchConfig snapshot pattern per pipeline
R3-3Risk_get_settings() dead broken codeRemove dead Settings() call
R3-4RiskArchival race with save_exchange()Archival inside save (atomic)
R3-5RiskDefaults scattered across code + schemaSchema dict as single source of truth
R3-6RiskCost preview impossible to compute accuratelyLabel as estimates with caveat tooltip
R3-7GapConfig API needs authBehind web auth middleware
R3-8GapFile KB poor proxy for context usagePrimary metric: exchange count
R3-9GapCost ceiling slider unboundedSchema max=$5.00
R3-10GapA+B brief weighting undefinedDefined inline (4 exch, 800 char)
R3-11GapAudioWorklet migration underscopedWorklet file + MIME + extra time noted
R3-12QuestionConfig change hits in-flight searchCovered by R3-2 snapshot
R3-13QuestionModel keys for unbuilt features confusingHide until feature ships
R3-14QuestionEffort unchanged after Phase 0 doubledRevised: 19-27 sessions total

Post-Round-3 confidence: HIGH

Round 4 (s210) — Critical Fix, Pass 1 — 15 findings

Adversarial review of the follow-up query resolution plan. All addressed.

IDSeverityFindingResolution
R4-C1CriticalHaiku too weak for query resolution (Quick/Deep+S default)Dedicated synthesis_model_resolution config, default Sonnet
R4-C2CriticalRaw query to synthesis creates semantic mismatch with resolved search resultsPass resolved query to synthesize()
R4-H1HighToken explosion in Research resolution (30 exchanges, unlimited chars = 22K tokens)Resolution-specific truncation: 5 exchanges, 800 chars
R4-H2HighResearch sub-loop accidentally triggers resolution on sub-questionsskip_resolution=True flag on sub-loop calls
R4-H3HighNo handling of unrelated topics in existing threadsTopic switch prompt + user choice UX (continue/new thread)
R4-M1MediumNo suppression of "INTERPRETED AS" for similar queriesJaccard similarity >0.85 suppresses display
R4-M2Mediumexpand_query() doesn't need full conversation_contextNo changes to expand_query — resolved query is sufficient
R4-M3MediumResearch final synthesis doesn't need full historyCompact preamble only
R4-M4MediumQuick mode latency concern (~500-800ms)Accept: correct results > fast garbage
R4-M5MediumNeed resolved_query capture pattern in pages.pyInitialize alongside collectors, use or None
R4-L1LowConfig toggle neededenable_follow_up_resolution boolean
R4-L2LowResolution cost not tracked in metricsAdded to metrics dict + answer meta line
R4-L3LowOld threads missing resolved_query fieldSimple if guard in history rendering
R4-L4Lowsave_exchange signature underspecifiedresolved_query: str = "", omit when empty
R4-M6MediumRolling summarization needed for long threadsSummary-beyond-window for all modes with per-mode hard caps

Post-Round-4 confidence: HIGH

Round 5 (s210) — Critical Fix, Pass 2 — 13 findings

Second adversarial pass after incorporating Round 4 fixes. Found subtle interaction effects. All addressed.

IDSeverityFindingResolution
R5-C1Criticalget_thread_context() is sync; adding async LLM call inside crashesSplit: ensure_context_summary() async + get_thread_context() stays sync
R5-H1HighLLM JSON output parsing has no fallback for malformed responses5-step fallback: strip fences, json.loads, regex extract, validate keys, default
R5-H2High_gsGroupId doesn't exist in JS frontendJust clear _gsThreadId; group derived server-side from thread
R5-H3HighSummary cache breaks when exchanges archivedHash-based invalidation (overflow query strings + timestamps)
R5-H4Highforce_continue has no frontend-to-backend plumbingFull data flow: _gsForceResume → POST body → skip_resolution
R5-M1MediumTopic switch resubmit SSE abort race conditionExplicit abort + null controller before re-enabling UI
R5-M2Mediumsearch_and_summarize() doesn't accept skip_resolutionAdd param, forward to pipeline call
R5-M3MediumSimilarity check definition vagueJaccard of lowercased word sets, threshold 0.85
R5-M4MediumResearch planner gets full context it doesn't needPass search_query directly, no conversation_context param
R5-M5MediumDual cache split: resolution (5) and synthesis (20) different overflowCache keyed by max_exchanges
R5-M6MediumMust use search_query variable after resolution in all callsExplicit variable pattern documented in plan
R5-L1LowBuild step 4 could crash if research tested before step 8Add skip_resolution param to all functions in step 1
R5-L2LowConcurrent thread access race on save + summaryAccepted: _gsSearching UI guard prevents in normal use

Post-Round-5 confidence: HIGH