GriSearch Feature Expansion

Competitive gap analysis → 7-phase build plan + critical fix · 5x adversarial-tested

Createds200 (2026-04-06)
Phases0 – 8 + Critical Fix
Est. Effort21 – 30 sessions
Pressure Tests5 rounds, 69 findings

Executive Summary

Build ProgressPhases 0, 1, 3, CF complete — Phase 2 built (testing), Phase 4 in progress (s212)
PhaseThemeEffortStatus
0Context + Config Surface2–3 sessionsComplete (s204)
1Polish & Quick Wins + JS Extract2–3 sessionsComplete (s205)
2Voice Output (TTS)2–3 sessions2A-2D built, browser testing pending
3Agentic Deep Research4–6 sessionsComplete (s205–s206)
CFFollow-Up Query Resolution (Critical Fix)2–3 sessionsComplete (s210–s211)
4Visual & Rich Content2–3 sessions4B+4C done, 4A in progress (s212)
5Advanced Organization3–5 sessionsNot Started
6Specialty Search1–2 eachNot Started
7Tabular Data & Spreadsheets3–5 sessionsNot Started
8Context Intelligence3–5 sessionsNot Started

Build order: Phase 0 → 1 → 3 → 2 → CF (Critical Fix) → 4 → 5 → 6. CF jumps the queue — follow-up queries are fundamentally broken without query resolution.

1. Don't chase parity for parity's sake. Only build features that serve the actual research workflow.
2. Preserve the speed advantage. Quick mode must stay under 3s.
3. Build on what's unique. Group context, personalization, multi-provider diversity are moats.
4. Incremental value delivery. Every phase ships something usable.
Multi-Provider Search Architecture

GriSearch sends every query to three independent search providers simultaneously, merges and deduplicates results, then re-ranks the unified pool. This is the same multi-retrieval pattern used by Perplexity, Google AI Mode, and ChatGPT search.

ProviderStrengthIndexLatency
BraveFastest latency, strong keyword precision, independent 30B+ page indexOwn~670ms
ExaSemantic understanding, spam filtering, high-signal authoritative contentOwn~2s
ParallelStrong accuracy-to-cost ratio, independent ranking perspectiveOwn~5-14s

Pipeline

User Query
ASYNC FAN-OUT
Brave
~670ms · keyword-strong
Exa
~2s · semantic search
Parallel
~5-14s · independent rank
Merge + Dedup
Re-Rank
Ranked Results
LLM Synthesis
Answer + Citations

Why Three Providers?

BenefitMechanism
Better recallThree indexes catch what one misses
Better precisionCross-provider agreement filters noise
ResilienceIf one API goes down, the other two still work
SpeedAsync fan-out = as fast as the fastest provider (with timeouts)
No vendor lock-inCan swap providers without rewriting the system
Quality signalDedup overlap acts as an implicit relevance vote

Benchmark data (2025-2026) shows the top 4 search APIs are statistically indistinguishable on quality individually. The winning strategy is to use multiple providers and let the combination outperform any single one.

Already Built (Pre-Plan)

GriSearch core was built across s197–s200 before this expansion plan was created.

s201: Plan page, /plans index, inbox replay fix, config schema (38 keys), settings cleanup, thread archival + health logging, defaults unified.
s204: Phase 0 complete (0A-0D). Per-mode context scaling, XML-tagged exchanges, full config surface + tuning panel, metrics logging + rolling averages, limit warnings, mode badges, Opus option, table rendering, JS extraction to static file, validation hooks.
s205: Phase 1 complete (1A-1D). Phase 3A shipped. Progress indicators, export, result previews, thread context menu, project creation. Research mode: Sonnet planner, Haiku quality gate, multi-step loop, cost controls, Sonnet synthesis.
s206: Phase 3B-3D shipped. GraySearch → GriSearch rebrand. Research timeline table, stop & summarize, structured report cards, "Dig deeper" buttons, first-use explainer. Persistent research data. Cloudflare Pages deploy. Citation table UI.
s208: Citation numbering fix. Collapsible sources list. Spine crash root cause fixed (importlib.reload memory leak). Health logging added.

Table of Contents

  1. Executive Summary
  2. Multi-Provider Search Architecture
  3. Already Built (Pre-Plan)
  4. Phase 0A: Per-Mode Context Scaling
  5. Phase 0B: Context Format Upgrade
  6. Phase 0C: Unified Config Surface
  7. Phase 0D: Config UI (Tuning Panel)
  8. Phase 1A: Observability & Cleanup
  9. Phase 1B: Export / Report Generation
  10. Phase 1C: Search Progress Enhancement
  11. Phase 1D: Search Result Previews
  12. Phase 1E: Extract JS to Static File
  13. Phase 2: Voice Output (TTS)
  14. Phase 3A: Research Agent Architecture
  15. Phase 3B: Progress Streaming
  16. Phase 3C: Report Generation
  17. Phase 3D: UI Integration
  18. Critical Fix: Follow-Up Query Resolution
  19. Phase 4: Visual & Rich Content
  20. Phase 5: Advanced Organization
  21. Phase 6: Specialty Search
  22. Phase 7: Tabular Data & Spreadsheet Intelligence
  23. Phase 8: Context Intelligence
  24. Adversarial Review Record
Phase 0APer-Mode Context Scaling

Replace the single max_exchanges=5 / 600 char truncation with per-mode strategy. Current usage is 1.6–8.2% of the 200K context window.

Modemax_exchangesanswer_truncationToken Budget
Quick5800 chars~1,000 tokens
Deep+Summary102,000 chars~5,000 tokens
Deep+Full204,000 chars~10,000 tokens
Research30No truncation~15,000 tokens
  • Refactor get_thread_context() to accept max_exchanges + max_answer_chars params (s204)
  • Route handler passes mode-appropriate limits from cfg (s204)
  • Add input token logging: synthesis + expand_query (s204)
  • Quick mode query unchanged (<600 extra chars, preserves <3s target)
  • Per-search metrics logging -- rolling 20/mode to search_metrics.json (s204)
  • Rolling averages in tuning panel (muted orange, per applicable control) (s204)
  • Graceful limit handling -- amber inline warnings when limits hit (s204)
  • Opus model option + Basic/Advanced tier toggle + descriptions (s204)
  • Mode-colored labels in tuning panel matching inline badge colors (s204)
  • Modified-from-default indicator (green *) on changed values (s204)
  • Per-field tradeoff descriptions with click-to-expand (s204)
  • Averages expanded to cover all 38 config fields (s204)
  • group_context_chars metric added to all pipelines (s204)
Round 2 C-1: Token budget estimates measure conversation context ONLY. Full prompt = system (~200 tok) + user context (~750) + group context (~500-700) + search passages (up to ~10,000) + conversation context. Research synthesis could reach 30,000+ tokens ($0.10-0.50). Token+cost logging must ship before expanding limits.
Phase 0BConversation Context Format Upgrade

Upgrade from plain User:/Assistant: to XML-tagged exchanges with mode and citations.

  • XML-tagged exchanges with mode attribute (s204)
  • Include exchange mode tag (quick vs deep calibration) (s204)
  • Include citation URLs in <sources> block, top 5 per exchange (s204)
  • Mode badge on each response (color-coded top + bottom with token counts) (s204)
  • Improved thread title generation (few-shot prompt, answer-rejection guard) (s204)
  • Research mode button placeholder (disabled, Phase 3) (s204)
  • Color-coded mode selector buttons (s204)
  • Markdown renderer: tables, ### headings, --- dividers, tighter spacing (s204)
<exchange n="1" mode="quick">
<query>Best espresso machine under $500?</query>
<answer>The Breville Barista Express...</answer>
<sources><url>https://example.com/review</url></sources>
</exchange>
Phase 0CUnified Config Surface ("Sliders")

Centralize all tunable limits. Code defaults in git-tracked settings.yaml. Browser-written overrides in .gitignored config/grisearch_tuning.yaml. Runtime merges both, tuning takes precedence. Config snapshot pattern prevents mid-search TOCTOU races.

GroupKeysExamples
Context Limits8max_exchanges, max_answer_chars per mode
Models6synthesis model per mode, planner, quality gate
Token/Cost7max_tokens per mode, cost ceiling, Brave rate limit
Research Agent4max_rounds, sub_questions, wall time, concurrency
Search Providers8timeouts, max results, max extract pages
Auto-Brief4exchanges/thread, truncation (normal vs research)
Thread Health1size warning threshold (KB)
  • Add all 38 schema keys to settings.yaml under grisearch: (s204)
  • Create config/grisearch_tuning.yaml (.gitignored) for browser overrides (s204)
  • Update _get_settings(): merge defaults + tuning overrides
  • Remove dead Settings() no-arg call from _get_settings()
  • Build GRISEARCH_CONFIG_SCHEMA (38 keys, 8 groups) as single source of truth
  • Config snapshot: pipelines call _get_settings() once, pass cfg downstream (s204)
  • All cfg.get() fallbacks reference _default() from schema
  • Replace Path(__file__).parent.parent with env var (s204)
  • Hide config keys for unbuilt features until they ship (s204, R3-13)
Phase 0DConfig UI (Tuning Panel)

In-browser config editor on the GriSearch page. Gear icon opens settings panel. Both API endpoints behind web auth. Cost previews labeled as estimates with tooltip caveat.

Config TypeControlExample
Integer limitsSlider + stepper5 [---o-----] 30
Cost ceilingsStepper ($0.05)$0.50 [-] [+]
Model selectionDropdown[claude-haiku-4-5 v]
0 = unlimitedToggle + stepper[x] Limit: 4000
  • GET /api/search/config returns config + schema metadata (s204)
  • POST /api/search/config validates + merges overrides into tuning YAML (s204)
  • POST /api/search/config/reset clears all overrides (s204)
  • Config schema with type/range validation (s201)
  • Grouped sections, auto-generated from schema (s204)
  • Live cost/token impact preview (deferred — averages in tuning panel serve this need)
  • Instant apply -- no restart needed (s201)
  • "Reset all" button + modified values highlighted green (s204)
  • Tuning panel via hammer icon in header (s204)
  • JS extracted to static/js/search.js (no more {{}} escaping) (s204)
  • PostToolUse hook for rendered JS validation on views/*.py (s204)
  • Pre-restart validation: scripts/validate_views.py (s204)
Phase 1AObservability & Cleanup
  • Add log.info for model/mode in synthesize() (s204)
  • Per-search cost logging: input_tokens, output_tokens, model, cost (s204)
  • Fix Dict[tuple, Any] type annotation (s204)
  • Fix _REPO_ROOT = Path(__file__).parent.parent (s204)

Thread Health Monitoring

  • Log file size + exchange count on every save_exchange()
  • Primary: exchange count color dot (green <10, yellow 10-20, red >20) (s205)
  • Thread list shows exchange count indicator per thread (s205)
  • MCP get_stack_status includes thread health summary (deferred — operational tooling)

Thread Archival

  • Archival runs inside save_exchange() (atomic, no race conditions)
  • After N exchanges (configurable, default 20), move older to archive
  • load_thread_full() for complete history (deferred — no threads near archive threshold)

Roadmap: Per-exchange storage (solution C) if archival proves insufficient.

Phase 1BExport / Report Generation

Markdown Export (MVP)

  • GET /api/search/thread/{id}/export?format=md (s205)
  • Title as H1, exchanges as H2, citations as footnotes (s205)
  • Export button on thread bar + mobile share sheet (s205)
  • File named {title}_{date}.md (s205)

HTML Export (stretch)

  • Same endpoint with format=html, print-friendly (deferred — Markdown covers the need)

Platform UX: Desktop: browser download. Mobile (Safari): navigator.share() with fallback.

Phase 1CSearch Progress Enhancement
  • During expanding: yield sub-queries as detail line (s205)
  • During reading: yield URLs, show unique domains (s205)
  • During searching: show providers ("Searching Brave + Exa...") (s205)
  • Elapsed time display (running timer, 500ms update) (s205)
Phase 1DSearch Result Previews
  • Preview cards: favicon + title + domain + date + snippet (2-line clamp) (s205)
  • Cards collapse to compact chips on synthesis start (s205)
  • Mobile: 44px min-height, vertical stack (s205)
Phase 1EExtract JS to Static File

Prerequisite for Phase 2+. views/search.py = 1,113 lines of double-brace-escaped JS in Python template strings.

  • Extract search JS into static/js/search.js (s204)
  • PostToolUse validation hook + validate_views.py (s204)
  • Extract remaining JS from willy.py, pages.py, dashboard.py (deferred — S-14)
  • Migrate createScriptProcessorAudioWorkletNode (deferred — Phase 2 prerequisite)
Phase 2Voice Output (TTS)

Complete the voice loop. SSE with base64 audio chunks (proven tunnel-compatible).

2A. TTS Provider

  • Evaluate: Deepgram Aura, ElevenLabs, OpenAI TTS, Cartesia (s208)
  • Criteria: <500ms TTFB, natural voice, <$0.01/search (s208)
  • Build lib/tts.py — Deepgram Aura-2 REST streaming (s208)

2B. Streaming Pipeline

  • SSE audio_start/chunk/done events (base64 MP3) (s208)
  • Web Audio API decode + queue playback (s208)
  • End-to-end test: SSE pipeline streams audio chunks (s208)
  • Tap/click interrupt: toggle, indicator click, new search (s208)

2C. Voice Flow

  • "Voice mode" toggle (speaker button, green active state) (s208)
  • Auto-listen after TTS finishes: mic activates after ducking delay, AUTO toggle button (s212)
  • Audio ducking: _gsTTSPlaying gate on mic start + audio processor, configurable delay (500ms default, 800ms for Bluetooth) (s212)
  • Bluetooth auto-detect via enumerateDevices(), extends ducking delay (s212)
  • Voice preferences: localStorage persistence for voice mode, auto-listen, ducking delay (s212)

2D. Smart TTS

  • Strip citations/URLs/markdown before TTS (_strip_for_tts) (s208)
  • Truncate long answers at sentence boundary (4000 char cap) (s208)
  • Mode-aware TTS length: Quick/Deep+S full (4000), Deep+F/Research first ~2 paragraphs (1500) (s212)
  • Table-to-prose conversion (_table_to_prose) for natural reading (s212)
  • Code block stripping, inline code cleanup, Sources: line removal (s212)
  • Dangling preposition cleanup after URL removal (s212)
  • Truncation indicator: audio_done.truncated flag + muted UI notice (s212)
  • Live browser test + voice quality tuning (deferred to tonight)
Phase 3AResearch Agent Architecture

Multi-step autonomous research via non-streaming wrappers over existing search functions.

User query → [Planner/Sonnet] → [Quality Gate/Haiku]
  → [Research Loop] → [Synthesizer] → [Structured Report]

Search Wrappers

  • search_and_summarize(): consumes async generator, returns dict (s205)

Cost Control (Round 2 C-2)

  • Shared Brave rate limiter (asyncio.Semaphore)
  • Per-research cost ceiling (default $0.50) (s205)
  • Hard cap: research_max_brave_calls (default 20) (s205)
  • Cost estimate shown before research starts

Lifecycle (Round 2 R-4)

  • Cancellation flag via asyncio.Event (s206)
  • Concurrent limit config: research_concurrent_limit (s205)
  • Cancellation on tab close (beforeunload)
  • Persist partial findings to disk

Agent Loop

  • SSE streaming (non-blocking via async generator) (s205)
  • Sonnet planner + Haiku quality gate (s205)
  • Max 5 rounds, 5 min wall time, 3-8 sub-questions (s205)
  • Per-sub-question mode selection (quick vs deep_summary) (s205)
  • Structured scratchpad per sub-question (s205)
Phase 3BProgress Streaming
  • SSE research_progress event (step, total, sub_question, status) (s205)
  • Vertical timeline with status indicators (pending/spinner/check/fail/skipped) (s206)
  • Running timer + step counter in panel header (s206)
  • "Stop and summarize" button + POST /api/search/research/cancel (s206)
  • "Also consider..." redirect input (mid-research constraint injection)
Phase 3CReport Generation
  • Sonnet final synthesis from all findings (s205)
  • Structured report: Summary, Findings, Open Questions, Sources (s205)
  • Saved as thread with mode: "research" (s205)
  • Auto-export to group directory
  • Brief weighting config: 4 exchanges/thread, 800-char truncation (s205)

Roadmap: Separate brief section (C) after evaluating real output.

Phase 3DResearch UI Integration
  • Fourth mode pill: "Research" (green, #10b981) (s205)
  • First-use explainer via localStorage (s206)
  • Full-width report card (.gs-report, 95% width, green border) (s206)
  • "Dig deeper" button on subsection headings (switches to Deep+Full) (s206)
Critical FixFollow-Up Query Resolution

Discovered: s210 (2026-04-07). Follow-up queries in threads produce garbage search results because query expansion has no access to conversation history. Every major AI platform rewrites follow-ups before searching — GriSearch did not. Effort: 2–3 sessions. Pressure tested: 2 rounds, 28 findings, all resolved.

CF.0 Discovery & Analysis (s210)

  • Identified bug: Exchange 5 in Iran war thread returned FedEx/K-pop results for "Deliver updates since the last review"
  • Root cause analysis: 7 blind spots across 4 search modes where conversation_context is in scope but not forwarded to query expansion
  • Mapped full query flow: expand_query(), _research_plan(), get_thread_context(), all 4 search modes, SSE endpoint
  • Confirmed synthesis receives context (answer referenced prior briefing) but search queries were decontextualized

CF.1 Industry Research (s210)

  • Researched 7 platforms: ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, OpenAI Deep Research
  • Reviewed academic SOTA: CHIQ history enhancement, conversational query reformulation, RAG multi-turn patterns
  • Key finding: every platform rewrites follow-ups; only Perplexity Pro and Copilot show the rewrite to users
  • Key finding: Google/Elastic use original + rewritten in fan-out (never fully replace)
  • Key finding: CHIQ topic switch detection is academic SOTA for preventing stale context pollution
  • Identified 4 gaps plan must address: query fan-out, topic switch, rolling summarization, error accumulation

CF.2 Plan Design & Adversarial Testing (s210)

  • Designed 5-phase fix: query resolution, context summarization, SSE events + frontend, research mode, cost tracking
  • 16-issue walkthrough with RG — each issue presented, discussed, agreed or modified
  • Round 4 adversarial: 15 findings (1 critical, 4 high, 6 medium, 4 low)
  • Round 5 adversarial: 13 findings (1 critical, 4 high, 5 medium, 3 low)
  • All 28 findings resolved and incorporated into final plan

CF.3 Query Resolution (backend)

  • New config keys: synthesis_model_resolution (Sonnet default), enable_follow_up_resolution, resolution exchange/char limits
  • Add skip_resolution param to all 3 search functions + search_and_summarize()
  • New resolve_follow_up(): structured JSON return with topic_switch detection, robust 5-step JSON parsing fallback chain
  • Resolution-specific context truncation (5 exchanges, 800 chars — independent of per-mode synthesis limits)
  • Wire into search_deep_full using search_query variable pattern (original bug trigger)
  • Wire into search_quick and search_deep_summary
  • Search fan-out: [original, resolved] + expand(resolved) — original as safety net per Google/Elastic pattern
  • Pass resolved query to synthesize() (not raw original)
  • Similarity check: Jaccard of lowercased word sets, suppress display if >0.85

CF.4 Context Summarization (summary-beyond-window)

  • New ensure_context_summary() async function (keeps get_thread_context() synchronous)
  • Summary cache in thread JSON, keyed by max_exchanges, hash-based invalidation (handles archival, deletion, mode switches)
  • Config schema caps: Quick max=5, Deep+S max=10, Deep+F and Research uncapped
  • All modes: older exchanges summarized beyond window (never dropped)
  • get_thread_context() reads cached summary, prepends <context_summary> block

CF.5 SSE Events + Frontend

  • New query_resolved SSE event + emerald green .gs-msg-resolved block ("INTERPRETED AS")
  • New topic_switch_detected SSE event + amber .gs-msg-switch prompt
  • Topic switch UX: "Continue in this thread" / "Start new search" buttons
  • force_continue data flow: _gsForceResume flag → POST body → skip_resolution
  • Clean fetch abort on topic switch (prevent SSE race condition)
  • save_exchange(): persist resolved_query field (omit when empty)
  • Thread history: render green block for stored resolved_query with backward compat guard

CF.6 Research Mode + Metrics

  • Pass search_query to _research_plan() directly (no conversation_context needed — resolved query is self-contained)
  • Sub-loop: skip_resolution=True via search_and_summarize()
  • Final synthesis: compact preamble ("follow-up, resolved to: ..."), not full history
  • Resolution cost tracking: resolution_input_tokens, resolution_output_tokens, resolution_cost_usd in metrics + answer meta line

CF.7 Verification

  • Live test: Iran war thread follow-up ("Deliver updates since the last review")
  • Live test: topic switch detection ("What's the weather in San Diego" in Iran thread)
  • Live test: similar query suppression (no green block for already-specific queries)
  • Live test: force_continue flow
  • Live test: summary caching across modes

Roadmap (not in this build): Embedding-based topic switch detection (cosine similarity). Error accumulation monitoring (resolution quality tracking over long threads).

Full spec: ~/.claude/plans/grisearch-follow-up-context-fix.md

Phase 4Visual & Rich Content

4A. Image Search

  • Brave Image Search API: _search_brave_images() + ImageResult model (s212)
  • Parallel image search in Deep+Summary and Deep+Full pipelines (s212)
  • SSE image_results event with thumbnail grid (3-col desktop, 2-col mobile) (s212)
  • Click-to-expand: full image overlay + source link + dimensions (s212)
  • Collapsible IMAGES header (s212)
  • Image upload for reverse search → moved to Phase 9 (Document & Image Upload System)
  • Storage lifecycle → moved to Phase 9 (two-tier: ephemeral 7-day + persistent indefinite)

4B. Rich Result Cards

  • Query-type detection: weather, quick fact, comparison, timeline via regex patterns (s212)
  • Synthesis format hints: type-specific prompt suffix guides structured output (s212)
  • Comparison: "X vs Y" triggers table-formatted synthesis (existing table renderer handles it) (s212)
  • Timeline: "history of" triggers chronological list; vertical timeline CSS renderer with date dots (s212)
  • Quick facts: bold answer lead, compact format hint (s212)
  • Weather: structured conditions + forecast hint (s212)

4C. Location-Aware

  • search_location + search_country_code config keys in tuning panel (str type) (s212)
  • "Near me" / "nearby" / "local" detection via regex, replaces with configured location (s212)
  • Brave country param wired to all 3 search pipelines (quick, deep_summary, deep_full) (s212)
  • Location badge in header (muted, auto-hidden when empty) (s212)
  • Badge updates on tuning save (s212)
  • Dynamic browser geolocation: "Use Location" toggle in mode row, detects on new thread, Nominatim reverse geocode, badge updates (s212)
Phase 5Advanced Organization

5A. Cross-Thread Search

  • GET /api/search/corpus with scoring
  • Lazy-load index, JSON persist, survives restarts
  • "Search within this group" filter

5B. Research Notebook

  • Pin answers + named collections
  • Collection export + auto-suggest pins

5C. Drag-and-Drop Thread Organization

  • Desktop: HTML5 DnD with drag handles + drop targets on group headers
  • Drop targets: group headers glow on dragover, "Recent" = unassign
  • Thread reorder within groups (thread_order in groups.json)
  • Group reorder (group_order in groups.json)
  • Backend: POST /api/search/groups/{id}/reorder + POST /api/search/groups/reorder
  • Mobile: keep context menu flow (no DnD) — evaluate touch DnD as follow-on

Reuses existing api_search_group_assign for moves — no new backend for basic group assignment. Sort order APIs are new. Desktop-only initially; mobile DnD (long-press or polyfill) evaluated after desktop ships.

5D. Branched Conversations

  • "Branch here" on any exchange
  • Branch metadata + group inheritance
  • Branch does NOT bump brief counter
Phase 6Specialty Search

6A. Product Research

  • Product query detection + review site enrichment
  • Comparison synthesis (pros/cons/price/verdict)

6B. Academic/Technical

  • Semantic Scholar API + citation scoring

6C. News Mode

  • Brave News API + recency-first sorting
  • Timeline rendering + "Follow this topic"

6D. Recurring Search

  • "Watch this" (max 5, Quick only, cost estimate)
  • 6-hour re-run + URL dedup + unread badges
Phase 7Tabular Data & Spreadsheet Intelligence

Theme: Accept, analyze, transform, and export structured data across all modes. Dedicated Data mode for analysis-heavy workflows. Effort: 3–5 sessions.

7A. Planning & Requirements

  • Competitive analysis (ChatGPT, Gemini, Copilot tabular UX)
  • Catalog RG's actual tabular workflows from ChatGPT history
  • Formula scope ranking by usage
  • Adversarial review of spec

7B. Tabular Input (all modes)

  • Paste detection (TSV/CSV) with table preview
  • Context injection as fenced CSV block
  • File upload: CSV (client-side) + Excel (openpyxl)
  • Size limits (~5K rows / 500KB)

7C. Tabular Output & Export (all modes)

  • CSV download button on each rendered table
  • Copy table as TSV to clipboard
  • Excel export (.xlsx via openpyxl)
  • Multi-table support + "Download all"

7D. Formula Generation

  • Synthesis prompt for formula requests (Excel vs Sheets toggle)
  • Monospace code blocks with copy button + explanation
  • All major categories: lookup, conditional, financial, array, text, date
  • Optional formula validation (verify output)

7E. Data Mode (dedicated)

  • "Data" mode pill — no web search, direct Claude analysis
  • Specialized analysis prompt (stats, insights, suggest visualizations)
  • Multi-turn analysis with table context carried in thread
  • Computed columns: generates formula AND fills values

7F. Future (not building yet)

Chart generation, Google Sheets integration, SQL-like queries, pivot table builder, data persistence across sessions.

Phase 8Context Intelligence (Active + Passive Learning)

Theme: Make GriSearch progressively smarter about user preferences and research patterns. Active interviews + passive extraction + enhanced auto-briefs. Effort: 3–5 sessions.

8A. Planning & Requirements

  • Audit current context injection chain (user, project, thread, conversation)
  • Catalog preference types (source, format, domain, constraint, fact)
  • Review ChatGPT memory system (learn from their mistakes)
  • Adversarial review of spec

8B. Passive Preference Extraction (all modes)

  • Post-synthesis Haiku extraction: 0-3 new preferences per exchange
  • Category tagging: format, source, domain, constraint, fact
  • Dedup + merge against existing search_preferences.md
  • Staleness handling: timestamp entries, replace contradictions
  • Transparency: extracted prefs visible/editable in Preferences panel
  • Kill switch in tuning panel (on for Deep modes, off for Quick)

8C. Active Context Interview (triggered)

  • Trigger: button in Project Notes + thread context menu + proactive suggestion
  • 3-phase flow: confirm existing → expand with probes → identify gaps
  • Questions displayed inline (conversation area, not modal)
  • Output: updated notes + extracted preferences + user review
  • Persist interview state for resume across sessions
  • Re-interview suggestion after 10+ new exchanges

8D. Thread-Level Context

  • Per-thread notes field (editable via context menu)
  • Thread auto-brief: full trajectory summary (not just recent)
  • Thread context injected into synthesis alongside project context

8E. Enhanced Auto-Brief

  • Dual-output: findings + preferences + open questions
  • Cross-project pattern extraction to global search_preferences.md

8F. Future (not building yet)

Preference confidence scoring, conflict detection, onboarding interview, preference analytics dashboard.

Phase 9Document & Image Upload System

Theme: Persistent, organized, searchable uploads that survive across sessions and threads. The #1 pain point with ChatGPT is upload amnesia — documents tied to a single conversation and forgotten next session. Effort: 3–4 sessions. Dependencies: Phase 4A (image search UI), Phase 5A (cross-thread search).

9A. Storage Architecture

  • Two-tier retention: ephemeral (7-day auto-clean) + persistent (indefinite, user-managed)
  • Metadata sidecar JSON (filename, tags, extracted text path, thread associations)
  • Per-user quotas: 10MB/file, 500MB/user persistent. Configurable per deployment.
  • Multi-tenant: per-user isolation + shared team library (Data/system/search/shared_uploads/)
  • Shared library: publish from personal, read-only refs, content-hash dedup, configurable quota
  • Admin storage dashboard: GET /api/search/admin/storage (per-user usage summary)

9B. Text Extraction & Indexing

  • PDF text extraction (PyMuPDF)
  • Image OCR/description (Claude vision API)
  • Extracted text stored alongside uploads, indexed for cross-thread search

9C. Upload UI

  • Upload button (paperclip icon in input row) + drag-and-drop
  • Post-upload choice: "Use for this search" vs "Save to library"
  • Inline preview: PDF first page + page count, image thumbnail

9D. Document Library

  • Slide-out library panel (same pattern as history/preferences)
  • Search within library by filename, extracted text, tags
  • "Reference this" button — injects document context into next search
  • Auto-tag on upload via Haiku

9E. Cross-Session Reference

  • @-mention documents: @suntsu-contract what are the termination clauses?
  • Auto-detect document references in queries, inject extracted text as <document_context>
  • Thread association tracking (usage history in library)

9F. Image Upload for Search

  • Reverse image search: upload → Claude vision describes → description enriches search
  • Mobile camera capture (accept="image/*" capture="environment")
  • Ephemeral by default, "Save to library" promotes to persistent

9G. Cleanup & Maintenance

  • Spine startup task: auto-clean ephemeral files older than 7 days
  • Storage quota enforcement on upload
  • Orphan cleanup (extracted text without matching upload)

Open questions: Document versioning (replace vs coexist), large document chunking (semantic chunk selection for 50+ page contracts), shared library moderation at 30 users, extraction cost at scale (local OCR fallback vs Claude vision), cross-deployment portability.

Adversarial Review Record

Round 1 (s200) — 15 findings

Initial confidence: MEDIUM. All addressed.

IDSeverityFindingResolution
C-1Critical1A is a non-issueReduced to logging + cleanup
C-2CriticalCan't reuse search generatorsNon-streaming wrappers in 3A
C-3CriticalWebSocket TTS won't work through tunnelSwitched to SSE-first
R-1RiskResearch blocks uvicorn workerBackground create_task
R-2RiskInline JS at breaking pointAdded 1E: JS extraction
R-3RiskRecurring search unbounded costCap 5 watches, Quick only
R-4RiskImage upload lifecycle missingPath, retention, max size defined
R-5RiskHaiku planner poor qualitySonnet + quality gate
G-1:4GapAPI degradation, Dict, index, duckingAll addressed in respective phases
Q-1:3QuestionPhase 6 order, export UX, build orderAll resolved

Post-Round-1 confidence: HIGH

Round 2 (s200/s201) — 12 findings

All addressed in s201 review with RG.

IDSeverityFindingResolution
C-1CriticalToken budget undercountsFull budget + cost logging first
C-2CriticalNo research cost capSemaphore, ceiling, confirmation
R-1RiskcreateScriptProcessor deprecatedMigrate in 1E
R-2RiskSSE audio may buffer200-500ms chunks, tunnel test
R-3RiskThread files unboundedMonitoring (A) + archival (B), C roadmapped
R-4RiskResearch no lifecycleRegistry, cancel, persist partial
G-1:4GapBT latency, model deprecation, JS risk, Path fixAll addressed
Q-1:2QuestionBrief weighting, branch counterA+B weighting, skip counter on branch

Post-Round-2 confidence: HIGH

Round 3 (s201) — 14 findings

Post-0C/0D additions. All addressed in s201.

IDSeverityFindingResolution
R3-1Criticalsettings.yaml git-tracked; browser writes = merge conflictsSeparate .gitignored tuning file
R3-2CriticalNo config caching; TOCTOU race mid-searchConfig snapshot pattern per pipeline
R3-3Risk_get_settings() dead broken codeRemove dead Settings() call
R3-4RiskArchival race with save_exchange()Archival inside save (atomic)
R3-5RiskDefaults scattered across code + schemaSchema dict as single source of truth
R3-6RiskCost preview impossible to compute accuratelyLabel as estimates with caveat tooltip
R3-7GapConfig API needs authBehind web auth middleware
R3-8GapFile KB poor proxy for context usagePrimary metric: exchange count
R3-9GapCost ceiling slider unboundedSchema max=$5.00
R3-10GapA+B brief weighting undefinedDefined inline (4 exch, 800 char)
R3-11GapAudioWorklet migration underscopedWorklet file + MIME + extra time noted
R3-12QuestionConfig change hits in-flight searchCovered by R3-2 snapshot
R3-13QuestionModel keys for unbuilt features confusingHide until feature ships
R3-14QuestionEffort unchanged after Phase 0 doubledRevised: 19-27 sessions total

Post-Round-3 confidence: HIGH

Round 4 (s210) — Critical Fix, Pass 1 — 15 findings

Adversarial review of the follow-up query resolution plan. All addressed.

IDSeverityFindingResolution
R4-C1CriticalHaiku too weak for query resolution (Quick/Deep+S default)Dedicated synthesis_model_resolution config, default Sonnet
R4-C2CriticalRaw query to synthesis creates semantic mismatch with resolved search resultsPass resolved query to synthesize()
R4-H1HighToken explosion in Research resolution (30 exchanges, unlimited chars = 22K tokens)Resolution-specific truncation: 5 exchanges, 800 chars
R4-H2HighResearch sub-loop accidentally triggers resolution on sub-questionsskip_resolution=True flag on sub-loop calls
R4-H3HighNo handling of unrelated topics in existing threadsTopic switch prompt + user choice UX (continue/new thread)
R4-M1MediumNo suppression of "INTERPRETED AS" for similar queriesJaccard similarity >0.85 suppresses display
R4-M2Mediumexpand_query() doesn't need full conversation_contextNo changes to expand_query — resolved query is sufficient
R4-M3MediumResearch final synthesis doesn't need full historyCompact preamble only
R4-M4MediumQuick mode latency concern (~500-800ms)Accept: correct results > fast garbage
R4-M5MediumNeed resolved_query capture pattern in pages.pyInitialize alongside collectors, use or None
R4-L1LowConfig toggle neededenable_follow_up_resolution boolean
R4-L2LowResolution cost not tracked in metricsAdded to metrics dict + answer meta line
R4-L3LowOld threads missing resolved_query fieldSimple if guard in history rendering
R4-L4Lowsave_exchange signature underspecifiedresolved_query: str = "", omit when empty
R4-M6MediumRolling summarization needed for long threadsSummary-beyond-window for all modes with per-mode hard caps

Post-Round-4 confidence: HIGH

Round 5 (s210) — Critical Fix, Pass 2 — 13 findings

Second adversarial pass after incorporating Round 4 fixes. Found subtle interaction effects. All addressed.

IDSeverityFindingResolution
R5-C1Criticalget_thread_context() is sync; adding async LLM call inside crashesSplit: ensure_context_summary() async + get_thread_context() stays sync
R5-H1HighLLM JSON output parsing has no fallback for malformed responses5-step fallback: strip fences, json.loads, regex extract, validate keys, default
R5-H2High_gsGroupId doesn't exist in JS frontendJust clear _gsThreadId; group derived server-side from thread
R5-H3HighSummary cache breaks when exchanges archivedHash-based invalidation (overflow query strings + timestamps)
R5-H4Highforce_continue has no frontend-to-backend plumbingFull data flow: _gsForceResume → POST body → skip_resolution
R5-M1MediumTopic switch resubmit SSE abort race conditionExplicit abort + null controller before re-enabling UI
R5-M2Mediumsearch_and_summarize() doesn't accept skip_resolutionAdd param, forward to pipeline call
R5-M3MediumSimilarity check definition vagueJaccard of lowercased word sets, threshold 0.85
R5-M4MediumResearch planner gets full context it doesn't needPass search_query directly, no conversation_context param
R5-M5MediumDual cache split: resolution (5) and synthesis (20) different overflowCache keyed by max_exchanges
R5-M6MediumMust use search_query variable after resolution in all callsExplicit variable pattern documented in plan
R5-L1LowBuild step 4 could crash if research tested before step 8Add skip_resolution param to all functions in step 1
R5-L2LowConcurrent thread access race on save + summaryAccepted: _gsSearching UI guard prevents in normal use

Post-Round-5 confidence: HIGH