GriSearch Feature Expansion

Competitive gap analysis → 7-phase build plan + critical fix · 5x adversarial-tested

Createds200 (2026-04-06)
Phases0 – 8 + Critical Fix
Est. Effort21 – 30 sessions
Pressure Tests5 rounds, 69 findings

Executive Summary

Build ProgressPhases 0–3, 5, CF complete — Phase 6: News + Product + Academic done. Phase 7: 7A–7D complete (s218–s219), 7E dropped. Remaining: 6D (watches), 8–10.
PhaseThemeEffortStatus
0Context + Config Surface2–3 sessionsComplete (s204)
1Polish & Quick Wins + JS Extract2–3 sessionsComplete (s205)
2Voice Output (TTS)2–3 sessions2A-2D built, browser testing pending
3Agentic Deep Research4–6 sessionsComplete (s205–s206)
CFFollow-Up Query Resolution (Critical Fix)2–3 sessionsComplete (s210–s211)
4Visual & Rich Content2–3 sessions4B+4C done, 4A in progress (s212)
5Advanced Organization3–5 sessionsComplete (s213)
6Specialty Search1–2 each6C News + 6A Product done (s214)
7Tabular Data & Spreadsheets3–5 sessionsNot Started
8Context Intelligence3–5 sessionsNot Started

Build order: Phase 0 → 1 → 3 → 2 → CF (Critical Fix) → 4 → 5 → 6. CF jumps the queue — follow-up queries are fundamentally broken without query resolution.

1. Don't chase parity for parity's sake. Only build features that serve the actual research workflow.
2. Preserve the speed advantage. Quick mode must stay under 3s.
3. Build on what's unique. Group context, personalization, multi-provider diversity are moats.
4. Incremental value delivery. Every phase ships something usable.
Multi-Provider Search Architecture

GriSearch sends every query to three independent search providers simultaneously, merges and deduplicates results, then re-ranks the unified pool. This is the same multi-retrieval pattern used by Perplexity, Google AI Mode, and ChatGPT search.

ProviderStrengthIndexLatency
BraveFastest latency, strong keyword precision, independent 30B+ page indexOwn~670ms
ExaSemantic understanding, spam filtering, high-signal authoritative contentOwn~2s
ParallelStrong accuracy-to-cost ratio, independent ranking perspectiveOwn~5-14s

Pipeline

User Query
ASYNC FAN-OUT
Brave
~670ms · keyword-strong
Exa
~2s · semantic search
Parallel
~5-14s · independent rank
Merge + Dedup
Re-Rank
Ranked Results
LLM Synthesis
Answer + Citations

Why Three Providers?

BenefitMechanism
Better recallThree indexes catch what one misses
Better precisionCross-provider agreement filters noise
ResilienceIf one API goes down, the other two still work
SpeedAsync fan-out = as fast as the fastest provider (with timeouts)
No vendor lock-inCan swap providers without rewriting the system
Quality signalDedup overlap acts as an implicit relevance vote

Benchmark data (2025-2026) shows the top 4 search APIs are statistically indistinguishable on quality individually. The winning strategy is to use multiple providers and let the combination outperform any single one.

Already Built (Pre-Plan)

GriSearch core was built across s197–s200 before this expansion plan was created.

s201: Plan page, /plans index, inbox replay fix, config schema (38 keys), settings cleanup, thread archival + health logging, defaults unified.
s204: Phase 0 complete (0A-0D). Per-mode context scaling, XML-tagged exchanges, full config surface + tuning panel, metrics logging + rolling averages, limit warnings, mode badges, Opus option, table rendering, JS extraction to static file, validation hooks.
s205: Phase 1 complete (1A-1D). Phase 3A shipped. Progress indicators, export, result previews, thread context menu, project creation. Research mode: Sonnet planner, Haiku quality gate, multi-step loop, cost controls, Sonnet synthesis.
s206: Phase 3B-3D shipped. GraySearch → GriSearch rebrand. Research timeline table, stop & summarize, structured report cards, "Dig deeper" buttons, first-use explainer. Persistent research data. Cloudflare Pages deploy. Citation table UI.
s208: Citation numbering fix. Collapsible sources list. Spine crash root cause fixed (importlib.reload memory leak). Health logging added.
s213: Phase 5 complete (all 4 sub-phases in 1 session). 5A: cross-thread corpus search. 5B: research notebook (pins, collections, export, 9 endpoints). 5C: desktop DnD (thread moves, reorder persistence). 5D: branched conversations. Also: research retry/rephrase/escalation, synthesis failure caching, "Retry Search" button, XSS fix, notebook info panel, "New Search" pill.
s214: Phase 6C News Mode + Phase 6A Product Research complete. Brave News API with freshness params, news query detection + auto-suggest, _NEWS_SYSTEM prompt. Product mode with review site boosting (11 domains), enriched query expansion, _PRODUCT_SYSTEM Sonnet synthesis with comparison tables. 15 new config schema keys. Two new mode pills (orange News, teal Product).

Table of Contents

  1. Executive Summary
  2. Multi-Provider Search Architecture
  3. Already Built (Pre-Plan)
  4. Phase 0A: Per-Mode Context Scaling
  5. Phase 0B: Context Format Upgrade
  6. Phase 0C: Unified Config Surface
  7. Phase 0D: Config UI (Tuning Panel)
  8. Phase 1A: Observability & Cleanup
  9. Phase 1B: Export / Report Generation
  10. Phase 1C: Search Progress Enhancement
  11. Phase 1D: Search Result Previews
  12. Phase 1E: Extract JS to Static File
  13. Phase 2: Voice Output (TTS)
  14. Phase 3A: Research Agent Architecture
  15. Phase 3B: Progress Streaming
  16. Phase 3C: Report Generation
  17. Phase 3D: UI Integration
  18. Research Retry & Recovery
  19. Critical Fix: Follow-Up Query Resolution
  20. Phase 4: Visual & Rich Content
  21. Phase 5: Advanced Organization
  22. Phase 6: Specialty Search
  23. Phase 7: Tabular Data & Spreadsheet Intelligence
  24. Phase 8: Context Intelligence
  25. Phase 9: Document & Image Upload
  26. Phase 10: Auto-Mode Classification
  27. Brainstorm: Mode Architecture Rethink
  28. Adversarial Review Record
Phase 0APer-Mode Context Scaling

Replace the single max_exchanges=5 / 600 char truncation with per-mode strategy. Current usage is 1.6–8.2% of the 200K context window.

Modemax_exchangesanswer_truncationToken Budget
Quick5800 chars~1,000 tokens
Deep+Summary102,000 chars~5,000 tokens
Deep+Full204,000 chars~10,000 tokens
Research30No truncation~15,000 tokens
  • Refactor get_thread_context() to accept max_exchanges + max_answer_chars params (s204)
  • Route handler passes mode-appropriate limits from cfg (s204)
  • Add input token logging: synthesis + expand_query (s204)
  • Quick mode query unchanged (<600 extra chars, preserves <3s target)
  • Per-search metrics logging -- rolling 20/mode to search_metrics.json (s204)
  • Rolling averages in tuning panel (muted orange, per applicable control) (s204)
  • Graceful limit handling -- amber inline warnings when limits hit (s204)
  • Opus model option + Basic/Advanced tier toggle + descriptions (s204)
  • Mode-colored labels in tuning panel matching inline badge colors (s204)
  • Modified-from-default indicator (green *) on changed values (s204)
  • Per-field tradeoff descriptions with click-to-expand (s204)
  • Averages expanded to cover all 38 config fields (s204)
  • group_context_chars metric added to all pipelines (s204)
Round 2 C-1: Token budget estimates measure conversation context ONLY. Full prompt = system (~200 tok) + user context (~750) + group context (~500-700) + search passages (up to ~10,000) + conversation context. Research synthesis could reach 30,000+ tokens ($0.10-0.50). Token+cost logging must ship before expanding limits.
Phase 0BConversation Context Format Upgrade

Upgrade from plain User:/Assistant: to XML-tagged exchanges with mode and citations.

  • XML-tagged exchanges with mode attribute (s204)
  • Include exchange mode tag (quick vs deep calibration) (s204)
  • Include citation URLs in <sources> block, top 5 per exchange (s204)
  • Mode badge on each response (color-coded top + bottom with token counts) (s204)
  • Improved thread title generation (few-shot prompt, answer-rejection guard) (s204)
  • Research mode button placeholder (disabled, Phase 3) (s204)
  • Color-coded mode selector buttons (s204)
  • Markdown renderer: tables, ### headings, --- dividers, tighter spacing (s204)
<exchange n="1" mode="quick">
<query>Best espresso machine under $500?</query>
<answer>The Breville Barista Express...</answer>
<sources><url>https://example.com/review</url></sources>
</exchange>
Phase 0CUnified Config Surface ("Sliders")

Centralize all tunable limits. Code defaults in git-tracked settings.yaml. Browser-written overrides in .gitignored config/grisearch_tuning.yaml. Runtime merges both, tuning takes precedence. Config snapshot pattern prevents mid-search TOCTOU races.

GroupKeysExamples
Context Limits8max_exchanges, max_answer_chars per mode
Models6synthesis model per mode, planner, quality gate
Token/Cost7max_tokens per mode, cost ceiling, Brave rate limit
Research Agent4max_rounds, sub_questions, wall time, concurrency
Search Providers8timeouts, max results, max extract pages
Auto-Brief4exchanges/thread, truncation (normal vs research)
Thread Health1size warning threshold (KB)
  • Add all 38 schema keys to settings.yaml under grisearch: (s204)
  • Create config/grisearch_tuning.yaml (.gitignored) for browser overrides (s204)
  • Update _get_settings(): merge defaults + tuning overrides
  • Remove dead Settings() no-arg call from _get_settings()
  • Build GRISEARCH_CONFIG_SCHEMA (38 keys, 8 groups) as single source of truth
  • Config snapshot: pipelines call _get_settings() once, pass cfg downstream (s204)
  • All cfg.get() fallbacks reference _default() from schema
  • Replace Path(__file__).parent.parent with env var (s204)
  • Hide config keys for unbuilt features until they ship (s204, R3-13)
Phase 0DConfig UI (Tuning Panel)

In-browser config editor on the GriSearch page. Gear icon opens settings panel. Both API endpoints behind web auth. Cost previews labeled as estimates with tooltip caveat.

Config TypeControlExample
Integer limitsSlider + stepper5 [---o-----] 30
Cost ceilingsStepper ($0.05)$0.50 [-] [+]
Model selectionDropdown[claude-haiku-4-5 v]
0 = unlimitedToggle + stepper[x] Limit: 4000
  • GET /api/search/config returns config + schema metadata (s204)
  • POST /api/search/config validates + merges overrides into tuning YAML (s204)
  • POST /api/search/config/reset clears all overrides (s204)
  • Config schema with type/range validation (s201)
  • Grouped sections, auto-generated from schema (s204)
  • Live cost/token impact preview (deferred — averages in tuning panel serve this need)
  • Instant apply -- no restart needed (s201)
  • "Reset all" button + modified values highlighted green (s204)
  • Tuning panel via hammer icon in header (s204)
  • JS extracted to static/js/search.js (no more {{}} escaping) (s204)
  • PostToolUse hook for rendered JS validation on views/*.py (s204)
  • Pre-restart validation: scripts/validate_views.py (s204)
Phase 1AObservability & Cleanup
  • Add log.info for model/mode in synthesize() (s204)
  • Per-search cost logging: input_tokens, output_tokens, model, cost (s204)
  • Fix Dict[tuple, Any] type annotation (s204)
  • Fix _REPO_ROOT = Path(__file__).parent.parent (s204)

Thread Health Monitoring

  • Log file size + exchange count on every save_exchange()
  • Primary: exchange count color dot (green <10, yellow 10-20, red >20) (s205)
  • Thread list shows exchange count indicator per thread (s205)
  • MCP get_stack_status includes thread health summary (deferred — operational tooling)

Thread Archival

  • Archival runs inside save_exchange() (atomic, no race conditions)
  • After N exchanges (configurable, default 20), move older to archive
  • load_thread_full() for complete history (deferred — no threads near archive threshold)

Roadmap: Per-exchange storage (solution C) if archival proves insufficient.

Phase 1BExport / Report Generation

Markdown Export (MVP)

  • GET /api/search/thread/{id}/export?format=md (s205)
  • Title as H1, exchanges as H2, citations as footnotes (s205)
  • Export button on thread bar + mobile share sheet (s205)
  • File named {title}_{date}.md (s205)

HTML Export Complete (s218)

  • Same endpoint with format=html, print-friendly styling, tables, code blocks (s218)
  • HTML export button alongside MD export in thread bar (s218)

Platform UX: Desktop: browser download. Mobile (Safari): navigator.share() with fallback.

Phase 1CSearch Progress Enhancement
  • During expanding: yield sub-queries as detail line (s205)
  • During reading: yield URLs, show unique domains (s205)
  • During searching: show providers ("Searching Brave + Exa...") (s205)
  • Elapsed time display (running timer, 500ms update) (s205)
  • Stage-specific icons (search, expand, read, synthesize) + progress bar (s218)
Phase 1DSearch Result Previews
  • Preview cards: favicon + title + domain + date + snippet (2-line clamp) (s205)
  • Cards collapse to compact chips on synthesis start (s205)
  • Mobile: 44px min-height, vertical stack (s205)
  • Click-to-expand: tap source card to show full snippet + "Visit" link (s218)
Phase 1EExtract JS to Static File

Prerequisite for Phase 2+. views/search.py = 1,113 lines of double-brace-escaped JS in Python template strings.

  • Extract search JS into static/js/search.js (s204)
  • PostToolUse validation hook + validate_views.py (s204)
  • Extract remaining JS from willy.py, pages.py, dashboard.py (deferred — S-14)
  • Migrate createScriptProcessorAudioWorkletNode (deferred — Phase 2 prerequisite)
Phase 2Voice Output (TTS)

Complete the voice loop. SSE with base64 audio chunks (proven tunnel-compatible).

2A. TTS Provider

  • Evaluate: Deepgram Aura, ElevenLabs, OpenAI TTS, Cartesia (s208)
  • Criteria: <500ms TTFB, natural voice, <$0.01/search (s208)
  • Build lib/tts.py — Deepgram Aura-2 REST streaming (s208)

2B. Streaming Pipeline

  • SSE audio_start/chunk/done events (base64 MP3) (s208)
  • Web Audio API decode + queue playback (s208)
  • End-to-end test: SSE pipeline streams audio chunks (s208)
  • Tap/click interrupt: toggle, indicator click, new search (s208)

2C. Voice Flow

  • "Voice mode" toggle (speaker button, green active state) (s208)
  • Auto-listen after TTS finishes: mic activates after ducking delay, AUTO toggle button (s212)
  • Audio ducking: _gsTTSPlaying gate on mic start + audio processor, configurable delay (500ms default, 800ms for Bluetooth) (s212)
  • Bluetooth auto-detect via enumerateDevices(), extends ducking delay (s212)
  • Voice preferences: localStorage persistence for voice mode, auto-listen, ducking delay (s212)

2D. Smart TTS

  • Strip citations/URLs/markdown before TTS (_strip_for_tts) (s208)
  • Truncate long answers at sentence boundary (4000 char cap) (s208)
  • Mode-aware TTS length: Quick/Deep+S full (4000), Deep+F/Research first ~2 paragraphs (1500) (s212)
  • Table-to-prose conversion (_table_to_prose) for natural reading (s212)
  • Code block stripping, inline code cleanup, Sources: line removal (s212)
  • Dangling preposition cleanup after URL removal (s212)
  • Truncation indicator: audio_done.truncated flag + muted UI notice (s212)
  • Live browser test: News + Product TTS confirmed working (s218)

2E. Voice Input Polish (s218)

  • Silence detection: 4s threshold (up from 1.5s), resets on interim results (not just finals)
  • 5-second visual countdown before auto-send (mic button shows 5…4…3…2…1, color shift cyan → yellow → red)
  • Tap mic during countdown to cancel (keeps text in input for editing)
  • TTS voice selector: 12 Deepgram Aura-2 voices, persisted to localStorage, passed through to backend
  • AUTO barge-in: interrupt TTS by speaking (shipped but needs tuning — pinned)
Phase 3AResearch Agent Architecture

Multi-step autonomous research via non-streaming wrappers over existing search functions.

User query → [Planner/Sonnet] → [Quality Gate/Haiku]
  → [Research Loop] → [Synthesizer] → [Structured Report]

Search Wrappers

  • search_and_summarize(): consumes async generator, returns dict (s205)

Cost Control (Round 2 C-2)

  • Shared Brave rate limiter (asyncio.Semaphore)
  • Per-research cost ceiling (default $0.50) (s205)
  • Hard cap: research_max_brave_calls (default 20) (s205)
  • Cost estimate shown before research starts

Lifecycle (Round 2 R-4)

  • Cancellation flag via asyncio.Event (s206)
  • Concurrent limit config: research_concurrent_limit (s205)
  • Cancellation on tab close (beforeunload)
  • Persist partial findings to disk

Agent Loop

  • SSE streaming (non-blocking via async generator) (s205)
  • Sonnet planner + Haiku quality gate (s205)
  • Max 5 rounds, 5 min wall time, 3-8 sub-questions (s205)
  • Per-sub-question mode selection (quick vs deep_summary) (s205)
  • Structured scratchpad per sub-question (s205)
Phase 3BProgress Streaming
  • SSE research_progress event (step, total, sub_question, status) (s205)
  • Vertical timeline with status indicators (pending/spinner/check/fail/skipped) (s206)
  • Running timer + step counter in panel header (s206)
  • "Stop and summarize" button + POST /api/search/research/cancel (s206)
  • "Also consider..." redirect input (mid-research constraint injection)
Phase 3CReport Generation
  • Sonnet final synthesis from all findings (s205)
  • Structured report: Summary, Findings, Open Questions, Sources (s205)
  • Saved as thread with mode: "research" (s205)
  • Auto-export to group directory
  • Brief weighting config: 4 exchanges/thread, 800-char truncation (s205)

Roadmap: Separate brief section (C) after evaluating real output.

Phase 3DResearch UI Integration
  • Fourth mode pill: "Research" (green, #10b981) (s205)
  • First-use explainer via localStorage (s206)
  • Full-width report card (.gs-report, 95% width, green border) (s206)
  • "Dig deeper" button on subsection headings (switches to Deep+Full) (s206)
EnhancementResearch Retry & Recovery (s213)

Comprehensive failure recovery for the research agent and all search modes. Previously, failed sub-questions were silently skipped and synthesis failures lost all findings.

Sub-Question Retry / Rephrase / Escalation

  • _rephrase_sub_question() — Haiku rephrases failed sub-questions from a different angle
  • _execute_research_step() — 3-tier recovery: retry → rephrase+retry → mode escalation (quick→deep_summary), up to 4 attempts per sub-question
  • UI: retry/rephrase/escalation badges on research timeline steps, "N attempts exhausted" on final failure

Research Synthesis Failure Recovery

  • cache_research_findings() — caches findings on synthesis failure so sub-question work isn't lost
  • retry_research_synthesis() — retries final synthesis from cached findings (skips all sub-question searches)
  • /api/search/retry auto-detects research findings cache and routes to correct retry function
  • UI: "Retry Synthesis" button with "Uses cached findings (skips sub-question searches)" hint

Non-Research Retry

  • "Retry Search" button on Quick/Deep/Deep+Full failures (re-submits same query, same mode)
Critical FixFollow-Up Query Resolution

Discovered: s210 (2026-04-07). Follow-up queries in threads produce garbage search results because query expansion has no access to conversation history. Every major AI platform rewrites follow-ups before searching — GriSearch did not. Effort: 2–3 sessions. Pressure tested: 2 rounds, 28 findings, all resolved.

CF.0 Discovery & Analysis (s210)

  • Identified bug: Exchange 5 in Iran war thread returned FedEx/K-pop results for "Deliver updates since the last review"
  • Root cause analysis: 7 blind spots across 4 search modes where conversation_context is in scope but not forwarded to query expansion
  • Mapped full query flow: expand_query(), _research_plan(), get_thread_context(), all 4 search modes, SSE endpoint
  • Confirmed synthesis receives context (answer referenced prior briefing) but search queries were decontextualized

CF.1 Industry Research (s210)

  • Researched 7 platforms: ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, OpenAI Deep Research
  • Reviewed academic SOTA: CHIQ history enhancement, conversational query reformulation, RAG multi-turn patterns
  • Key finding: every platform rewrites follow-ups; only Perplexity Pro and Copilot show the rewrite to users
  • Key finding: Google/Elastic use original + rewritten in fan-out (never fully replace)
  • Key finding: CHIQ topic switch detection is academic SOTA for preventing stale context pollution
  • Identified 4 gaps plan must address: query fan-out, topic switch, rolling summarization, error accumulation

CF.2 Plan Design & Adversarial Testing (s210)

  • Designed 5-phase fix: query resolution, context summarization, SSE events + frontend, research mode, cost tracking
  • 16-issue walkthrough with RG — each issue presented, discussed, agreed or modified
  • Round 4 adversarial: 15 findings (1 critical, 4 high, 6 medium, 4 low)
  • Round 5 adversarial: 13 findings (1 critical, 4 high, 5 medium, 3 low)
  • All 28 findings resolved and incorporated into final plan

CF.3 Query Resolution (backend)

  • New config keys: synthesis_model_resolution (Sonnet default), enable_follow_up_resolution, resolution exchange/char limits
  • Add skip_resolution param to all 3 search functions + search_and_summarize()
  • New resolve_follow_up(): structured JSON return with topic_switch detection, robust 5-step JSON parsing fallback chain
  • Resolution-specific context truncation (5 exchanges, 800 chars — independent of per-mode synthesis limits)
  • Wire into search_deep_full using search_query variable pattern (original bug trigger)
  • Wire into search_quick and search_deep_summary
  • Search fan-out: [original, resolved] + expand(resolved) — original as safety net per Google/Elastic pattern
  • Pass resolved query to synthesize() (not raw original)
  • Similarity check: Jaccard of lowercased word sets, suppress display if >0.85

CF.4 Context Summarization (summary-beyond-window)

  • New ensure_context_summary() async function (keeps get_thread_context() synchronous)
  • Summary cache in thread JSON, keyed by max_exchanges, hash-based invalidation (handles archival, deletion, mode switches)
  • Config schema caps: Quick max=5, Deep+S max=10, Deep+F and Research uncapped
  • All modes: older exchanges summarized beyond window (never dropped)
  • get_thread_context() reads cached summary, prepends <context_summary> block

CF.5 SSE Events + Frontend

  • New query_resolved SSE event + emerald green .gs-msg-resolved block ("INTERPRETED AS")
  • New topic_switch_detected SSE event + amber .gs-msg-switch prompt
  • Topic switch UX: "Continue in this thread" / "Start new search" buttons
  • force_continue data flow: _gsForceResume flag → POST body → skip_resolution
  • Clean fetch abort on topic switch (prevent SSE race condition)
  • save_exchange(): persist resolved_query field (omit when empty)
  • Thread history: render green block for stored resolved_query with backward compat guard

CF.6 Research Mode + Metrics

  • Pass search_query to _research_plan() directly (no conversation_context needed — resolved query is self-contained)
  • Sub-loop: skip_resolution=True via search_and_summarize()
  • Final synthesis: compact preamble ("follow-up, resolved to: ..."), not full history
  • Resolution cost tracking: resolution_input_tokens, resolution_output_tokens, resolution_cost_usd in metrics + answer meta line

CF.7 Verification

  • Live test: Iran war thread follow-up ("Deliver updates since the last review")
  • Live test: topic switch detection ("What's the weather in San Diego" in Iran thread)
  • Live test: similar query suppression (no green block for already-specific queries)
  • Live test: force_continue flow
  • Live test: summary caching across modes

Roadmap (not in this build): Embedding-based topic switch detection (cosine similarity). Error accumulation monitoring (resolution quality tracking over long threads).

Full spec: ~/.claude/plans/grisearch-follow-up-context-fix.md

Phase 4Visual & Rich Content

4A. Image Search

  • Brave Image Search API: _search_brave_images() + ImageResult model (s212)
  • Parallel image search in Deep+Summary and Deep+Full pipelines (s212)
  • SSE image_results event with thumbnail grid (3-col desktop, 2-col mobile) (s212)
  • Click-to-expand: full image overlay + source link + dimensions (s212)
  • Collapsible IMAGES header (s212)
  • Image upload for reverse search → moved to Phase 9 (Document & Image Upload System)
  • Storage lifecycle → moved to Phase 9 (two-tier: ephemeral 7-day + persistent indefinite)

4B. Rich Result Cards

  • Query-type detection: weather, quick fact, comparison, timeline via regex patterns (s212)
  • Synthesis format hints: type-specific prompt suffix guides structured output (s212)
  • Comparison: "X vs Y" triggers table-formatted synthesis (existing table renderer handles it) (s212)
  • Timeline: "history of" triggers chronological list; vertical timeline CSS renderer with date dots (s212)
  • Quick facts: bold answer lead, compact format hint (s212)
  • Weather: structured conditions + forecast hint (s212)

4C. Location-Aware

  • search_location + search_country_code config keys in tuning panel (str type) (s212)
  • "Near me" / "nearby" / "local" detection via regex, replaces with configured location (s212)
  • Brave country param wired to all 3 search pipelines (quick, deep_summary, deep_full) (s212)
  • Location badge in header (muted, auto-hidden when empty) (s212)
  • Badge updates on tuning save (s212)
  • Dynamic browser geolocation: "Use Location" toggle in mode row, detects on new thread, Nominatim reverse geocode, badge updates (s212)
Phase 5Advanced Organization

5A. Cross-Thread Search Complete (s213)

  • GET /api/search/corpus — full-text search across all threads + archives, title/query/answer scoring (3x/2x/1x) + recency boost, group filter
  • Lazy-build corpus index, persisted to corpus_index.json (5-min TTL), invalidated on save/delete, ~0.4ms cached load
  • History panel search input with 300ms debounce, results replace thread list, click loads thread + scrolls to exchange with highlight
  • "Search within this group" filter via group_id API param
  • Race condition protection (sequence counter), XSS fix (javascript: URL blocking in markdown links)

5B. Research Notebook Complete (s213)

  • Pin button (★) on every answer bubble — pin/unpin toggle, duplicate prevention, yellow highlight
  • Pin data layer: pinned.json with thread_id, exchange_index, query, answer_snippet, collection_id
  • Collections CRUD: create, delete, rename, list with pin counts, assign pins to collections
  • Notebook section in history panel (yellow theme, collapsible collections, unsorted section)
  • Collection Markdown export — download with collection name, notes, each pin as section
  • 9 new API endpoints: pin, unpin, list pins, assign pin, list/create/delete/rename collections, export
  • Auto-suggest pins after Deep+Full/Research (deferred)

5C. Drag-and-Drop Thread Organization Complete (s213)

  • Desktop: HTML5 DnD with drag handles + drop targets on group headers (UA-gated, mobile excluded)
  • Drop targets: group headers glow blue on dragover, "Recent" = unassign from group
  • Thread reorder within groups (thread_order in groups.json) + POST /api/search/groups/{id}/reorder
  • Group reorder (group_order in groups.json) + POST /api/search/groups/reorder
  • History panel respects both order fields (fallback to updated_utc/most-recent)
  • Mobile: context menu flow preserved (no DnD)

Reuses existing api_search_group_assign for moves — no new backend for basic group assignment. Sort order APIs are new. Desktop-only initially; mobile DnD (long-press or polyfill) evaluated after desktop ships.

5D. Branched Conversations Complete (s213)

  • "Branch here" button on each exchange in thread replay — creates new thread with exchanges up to branch point
  • Branch metadata (branched_from) + group inheritance + corpus index invalidation
  • Branch does NOT bump brief counter — copied exchanges already counted
  • Cyan branch icon on branched threads in history panel
Phase 6Specialty Search

6C. News Mode Complete (s214)

  • Brave News API (_search_brave_news) + dual-source (News + Web)
  • News query detection with auto-freshness hints (pd/pw/pm)
  • Recency-first sorting + news source boost
  • _NEWS_SYSTEM synthesis prompt (what changed, attribution)
  • Orange News mode pill (#f97316) + mode auto-suggest
  • 8 config schema keys (model, tokens, freshness, etc.)
  • Timeline rendering (deferred to 4B)
  • "Follow this topic" (deferred to 6D)

6A. Product Research Complete (s214)

  • _detect_product_query with patterns + false-positive exclusions
  • Enriched query expansion targeting review sites (Wirecutter, RTINGS, Reddit)
  • _PRODUCT_SYSTEM synthesis: recommendation, comparison table, pros/cons
  • Review domain boost (11 review sites scored higher)
  • Teal Product pill (#14b8a6) + mode auto-suggest
  • 7 config keys (Sonnet model, 3000 tokens, 8 extract pages)

6B. Academic/Technical Mode Complete (s218)

  • Academic search pipeline (Brave + Exa, scholarly query expansion, 16-domain academic boost)
  • Structured synthesis prompt (key findings, notable papers, methodology, open questions)
  • Query detection, mode auto-suggest, 8 config schema keys, purple pill
  • Semantic Scholar API integration (citation graph, paper-level metadata)

6D. Recurring Search / Watch Topics

Effort: 2–3 sessions. No external dependencies. Builds on existing search_and_summarize() non-streaming wrapper.

  • Watch data model + storage (max 5 active, Quick mode only, cost estimate on creation)
  • Background scheduler (asyncio.create_task pattern, 6-hour default interval, configurable)
  • URL dedup + diff-focused synthesis (skip if no new results)
  • "Watch this topic" button + watches panel (slide-out) + unread badges
  • CRUD API (/api/search/watches: create, list, toggle, delete, force re-run)
Phase 7Tabular Data & Spreadsheet Intelligence

Theme: Accept, analyze, transform, and export structured data across all modes. Actual: 2 sessions (s218 foundation + s219 upload/formula).

7A. Planning & Requirements Complete (s219)

  • Competitive analysis: ChatGPT (Code Interpreter, no native export), Gemini (no code exec in chat, strong in Sheets), Copilot (Python in Excel) (s219)
  • Scope decision: CSV upload + formula toggle = build. Excel/openpyxl = defer. Data mode = drop (no code execution = not competitive) (s219)

7B. Tabular Input (all modes) Complete (s219)

  • Paste detection (TSV/CSV) with table preview below input (s218)
  • Context injection as fenced CSV block (<data> tag in query) (s218)
  • File upload: CSV/TSV/TXT via file picker + drag-drop, 500KB guard, client-side FileReader (s219)
  • Size limits: 500KB file guard, 5000+ row warning in preview (s219)
  • Excel upload (openpyxl) — deferred (paste from Excel already works)

7C. Tabular Output & Export (all modes) Complete (s218)

  • Copy table as TSV to clipboard (per-table Copy button) (s218)
  • CSV download button on each rendered table (s218)
  • Excel export (.xlsx via openpyxl) — deferred (CSV covers 90%)
  • Multi-table "Download all" — deferred (per-table export works)

7D. Formula Generation Complete (s219)

  • Fenced code blocks with language label + copy button (all modes) (s218)
  • Excel vs Sheets toggle button, <formula_platform> context injection with platform-specific syntax hints (s219)
  • Formula validation (verify output) — deferred

7E. Data Mode Dropped (s219)

Without a code execution sandbox, Data mode would be “Claude without web search” — not competitive vs ChatGPT Code Interpreter. GriSearch’s moat is search, not computation. Revisit if lightweight code execution becomes available.

7F. Future (not building yet)

Chart generation, Google Sheets integration, SQL-like queries, pivot table builder, data persistence across sessions.

Phase 8Context Intelligence (Active + Passive Learning)

Theme: Make GriSearch progressively smarter about user preferences and research patterns. Active interviews + passive extraction + enhanced auto-briefs. Effort: 3–5 sessions.

8A. Planning & Requirements

  • Audit current context injection chain (user, project, thread, conversation)
  • Catalog preference types (source, format, domain, constraint, fact)
  • Review ChatGPT memory system (learn from their mistakes)
  • Adversarial review of spec

8B. Passive Preference Extraction (all modes)

  • Post-synthesis Haiku extraction: 0-3 new preferences per exchange
  • Category tagging: format, source, domain, constraint, fact
  • Dedup + merge against existing search_preferences.md
  • Staleness handling: timestamp entries, replace contradictions
  • Transparency: extracted prefs visible/editable in Preferences panel
  • Kill switch in tuning panel (on for Deep modes, off for Quick)

8C. Active Context Interview (triggered)

  • Trigger: button in Project Notes + thread context menu + proactive suggestion
  • 3-phase flow: confirm existing → expand with probes → identify gaps
  • Questions displayed inline (conversation area, not modal)
  • Output: updated notes + extracted preferences + user review
  • Persist interview state for resume across sessions
  • Re-interview suggestion after 10+ new exchanges

8D. Thread-Level Context

  • Per-thread notes field (editable via context menu)
  • Thread auto-brief: full trajectory summary (not just recent)
  • Thread context injected into synthesis alongside project context

8E. Enhanced Auto-Brief

  • Dual-output: findings + preferences + open questions
  • Cross-project pattern extraction to global search_preferences.md

8F. Future (not building yet)

Preference confidence scoring, conflict detection, onboarding interview, preference analytics dashboard.

Phase 9Document & Image Upload System

Theme: Persistent, organized, searchable uploads that survive across sessions and threads. The #1 pain point with ChatGPT is upload amnesia — documents tied to a single conversation and forgotten next session. Effort: 3–4 sessions. Dependencies: Phase 4A (image search UI), Phase 5A (cross-thread search).

9A. Storage Architecture

  • Two-tier retention: ephemeral (7-day auto-clean) + persistent (indefinite, user-managed)
  • Metadata sidecar JSON (filename, tags, extracted text path, thread associations)
  • Per-user quotas: 10MB/file, 500MB/user persistent. Configurable per deployment.
  • Multi-tenant: per-user isolation + shared team library (Data/system/search/shared_uploads/)
  • Shared library: publish from personal, read-only refs, content-hash dedup, configurable quota
  • Admin storage dashboard: GET /api/search/admin/storage (per-user usage summary)

9B. Text Extraction & Indexing

  • PDF text extraction (PyMuPDF)
  • Image OCR/description (Claude vision API)
  • Extracted text stored alongside uploads, indexed for cross-thread search

9C. Upload UI

  • Upload button (paperclip icon in input row) + drag-and-drop
  • Post-upload choice: "Use for this search" vs "Save to library"
  • Inline preview: PDF first page + page count, image thumbnail

9D. Document Library

  • Slide-out library panel (same pattern as history/preferences)
  • Search within library by filename, extracted text, tags
  • "Reference this" button — injects document context into next search
  • Auto-tag on upload via Haiku

9E. Cross-Session Reference

  • @-mention documents: @suntsu-contract what are the termination clauses?
  • Auto-detect document references in queries, inject extracted text as <document_context>
  • Thread association tracking (usage history in library)

9F. Image Upload for Search

  • Reverse image search: upload → Claude vision describes → description enriches search
  • Mobile camera capture (accept="image/*" capture="environment")
  • Ephemeral by default, "Save to library" promotes to persistent

9G. Cleanup & Maintenance

  • Spine startup task: auto-clean ephemeral files older than 7 days
  • Storage quota enforcement on upload
  • Orphan cleanup (extracted text without matching upload)

Open questions: Document versioning (replace vs coexist), large document chunking (semantic chunk selection for 50+ page contracts), shared library moderation at 30 users, extraction cost at scale (local OCR fallback vs Claude vision), cross-deployment portability.

Phase 10Auto-Mode Classification

Theme: Intelligent query routing — classify the user's intent and auto-select the best search mode. Effort: 1–2 sessions. Roadmap: T1-28.

10A. Query Classifier

  • Rule-based classifier: keyword patterns, question structure, temporal markers (news), product/price/review/buy signals (product), multi-source cues (research)
  • Extend existing gsAutoSuggestMode (s214) into a full classifier that runs automatically
  • Confidence scoring: only auto-route when classification confidence is high, fall back to Quick for ambiguous queries
  • Override: user can still manually select a mode to override auto-classification

10B. Auto Mode UI

  • "Auto" pill in mode selector (replaces manual mode selection as default)
  • Show detected mode badge on search results (e.g. "Auto → News")
  • Persist Auto preference to localStorage

10C. Haiku Upgrade Path (optional)

  • If rule-based accuracy is insufficient, add lightweight Haiku classification call
  • Latency budget: <500ms added to search time
  • Cache frequent query patterns to avoid repeat classification calls
BrainstormMode Architecture Rethink

Status: Paused. All new mode development on hold pending this session. Current modes ship as-is. This session rethinks the entire approach before building more.

Layer 1: Search Providers (tools)

ProviderAPIStrengthsUsed By
Brave WebBrave SearchFast, broad, reliable. Backbone of every mode.All modes
Brave NewsBrave NewsRecency-filtered, freshness params (pd/pw/pm)News only
Brave ImagesBrave ImagesVisual results for any queryAll except Quick
ExaNeural searchSemantic relevance, returns full text (no extraction needed)Deep+S, Deep+F, Product, Academic
ParallelParallel AIIndependent aggregation, different source poolDeep+Full only

Layer 2: Source Classifications

Source TypeExamplesBoost Applied
General webWikipedia, blogs, forumsNone (baseline)
News outletsReuters, AP, NYT, BBC+0.3 in News mode (brave_news source tag)
Review sitesWirecutter, RTINGS, CNET, Reddit, Amazon+0.25 in Product mode (12 domains)
Academic / scholarlyarxiv, PubMed, Nature, IEEE, Springer, JSTOR+0.3 in Academic mode (16 domains)
Technical / docsMDN, Stack Overflow, GitHub, official docsNone (no dedicated mode yet)
Government / legalregulations.gov, CourtListener, .gov sitesNone (no dedicated mode yet)
Social / forumsReddit, HN, X, QuoraPartial (Reddit boosted in Product)

Layer 3: Pipeline Behaviors

BehaviorQuickDeep+SDeep+FResearchNewsProductAcademic
Query expansion (Haiku)24via subs2 + 5 hardcoded2 + 3 hardcoded
Page extraction5 pages10 pagesvia subs5 pages8 pages6 pages
Domain boostnews sourcesreview sitesscholarly domains
Synthesis modelHaikuHaikuHaiku*SonnetHaikuSonnetSonnet
Structured outputReportBriefingTable + pros/consPapers + methodology
Follow-up resolution

Observations

  • Modes are mostly combinations of the same 3 dimensions: provider mix, domain boost, and synthesis prompt
  • Deep+Summary and Deep+Full differ only in scale (more providers, more extraction, more tokens) — not in kind
  • News is the only mode with a unique provider (Brave News API)
  • Product, Academic, and any future specialty mode share the same pipeline (Brave + Exa + expand) — only the boost list and prompt differ
  • 8 pills already feels dense on mobile. Adding more specialty modes (legal, technical, social) doesn't scale as discrete pills

Questions for Brainstorm

  • Should domain boosts be composable layers instead of mode-locked? (e.g., "deep search + academic boost" vs "academic mode")
  • Could a single adaptive mode replace Quick / Deep+S / Deep+F by scaling effort based on query complexity?
  • Are specialty modes (News, Product, Academic) better as boost presets applied on top of a depth slider?
  • Should auto-classification drive the boost layer transparently, with manual override available?
  • What's the right UX: fewer pills + smarter routing, or keep pills but collapse behind a "more" menu?

Blocked items: 6D (Recurring Search), Phase 10 (Auto-Mode), any new specialty modes. Resume after this brainstorm concludes.

Adversarial Review Record

Round 1 (s200) — 15 findings

Initial confidence: MEDIUM. All addressed.

IDSeverityFindingResolution
C-1Critical1A is a non-issueReduced to logging + cleanup
C-2CriticalCan't reuse search generatorsNon-streaming wrappers in 3A
C-3CriticalWebSocket TTS won't work through tunnelSwitched to SSE-first
R-1RiskResearch blocks uvicorn workerBackground create_task
R-2RiskInline JS at breaking pointAdded 1E: JS extraction
R-3RiskRecurring search unbounded costCap 5 watches, Quick only
R-4RiskImage upload lifecycle missingPath, retention, max size defined
R-5RiskHaiku planner poor qualitySonnet + quality gate
G-1:4GapAPI degradation, Dict, index, duckingAll addressed in respective phases
Q-1:3QuestionPhase 6 order, export UX, build orderAll resolved

Post-Round-1 confidence: HIGH

Round 2 (s200/s201) — 12 findings

All addressed in s201 review with RG.

IDSeverityFindingResolution
C-1CriticalToken budget undercountsFull budget + cost logging first
C-2CriticalNo research cost capSemaphore, ceiling, confirmation
R-1RiskcreateScriptProcessor deprecatedMigrate in 1E
R-2RiskSSE audio may buffer200-500ms chunks, tunnel test
R-3RiskThread files unboundedMonitoring (A) + archival (B), C roadmapped
R-4RiskResearch no lifecycleRegistry, cancel, persist partial
G-1:4GapBT latency, model deprecation, JS risk, Path fixAll addressed
Q-1:2QuestionBrief weighting, branch counterA+B weighting, skip counter on branch

Post-Round-2 confidence: HIGH

Round 3 (s201) — 14 findings

Post-0C/0D additions. All addressed in s201.

IDSeverityFindingResolution
R3-1Criticalsettings.yaml git-tracked; browser writes = merge conflictsSeparate .gitignored tuning file
R3-2CriticalNo config caching; TOCTOU race mid-searchConfig snapshot pattern per pipeline
R3-3Risk_get_settings() dead broken codeRemove dead Settings() call
R3-4RiskArchival race with save_exchange()Archival inside save (atomic)
R3-5RiskDefaults scattered across code + schemaSchema dict as single source of truth
R3-6RiskCost preview impossible to compute accuratelyLabel as estimates with caveat tooltip
R3-7GapConfig API needs authBehind web auth middleware
R3-8GapFile KB poor proxy for context usagePrimary metric: exchange count
R3-9GapCost ceiling slider unboundedSchema max=$5.00
R3-10GapA+B brief weighting undefinedDefined inline (4 exch, 800 char)
R3-11GapAudioWorklet migration underscopedWorklet file + MIME + extra time noted
R3-12QuestionConfig change hits in-flight searchCovered by R3-2 snapshot
R3-13QuestionModel keys for unbuilt features confusingHide until feature ships
R3-14QuestionEffort unchanged after Phase 0 doubledRevised: 19-27 sessions total

Post-Round-3 confidence: HIGH

Round 4 (s210) — Critical Fix, Pass 1 — 15 findings

Adversarial review of the follow-up query resolution plan. All addressed.

IDSeverityFindingResolution
R4-C1CriticalHaiku too weak for query resolution (Quick/Deep+S default)Dedicated synthesis_model_resolution config, default Sonnet
R4-C2CriticalRaw query to synthesis creates semantic mismatch with resolved search resultsPass resolved query to synthesize()
R4-H1HighToken explosion in Research resolution (30 exchanges, unlimited chars = 22K tokens)Resolution-specific truncation: 5 exchanges, 800 chars
R4-H2HighResearch sub-loop accidentally triggers resolution on sub-questionsskip_resolution=True flag on sub-loop calls
R4-H3HighNo handling of unrelated topics in existing threadsTopic switch prompt + user choice UX (continue/new thread)
R4-M1MediumNo suppression of "INTERPRETED AS" for similar queriesJaccard similarity >0.85 suppresses display
R4-M2Mediumexpand_query() doesn't need full conversation_contextNo changes to expand_query — resolved query is sufficient
R4-M3MediumResearch final synthesis doesn't need full historyCompact preamble only
R4-M4MediumQuick mode latency concern (~500-800ms)Accept: correct results > fast garbage
R4-M5MediumNeed resolved_query capture pattern in pages.pyInitialize alongside collectors, use or None
R4-L1LowConfig toggle neededenable_follow_up_resolution boolean
R4-L2LowResolution cost not tracked in metricsAdded to metrics dict + answer meta line
R4-L3LowOld threads missing resolved_query fieldSimple if guard in history rendering
R4-L4Lowsave_exchange signature underspecifiedresolved_query: str = "", omit when empty
R4-M6MediumRolling summarization needed for long threadsSummary-beyond-window for all modes with per-mode hard caps

Post-Round-4 confidence: HIGH

Round 5 (s210) — Critical Fix, Pass 2 — 13 findings

Second adversarial pass after incorporating Round 4 fixes. Found subtle interaction effects. All addressed.

IDSeverityFindingResolution
R5-C1Criticalget_thread_context() is sync; adding async LLM call inside crashesSplit: ensure_context_summary() async + get_thread_context() stays sync
R5-H1HighLLM JSON output parsing has no fallback for malformed responses5-step fallback: strip fences, json.loads, regex extract, validate keys, default
R5-H2High_gsGroupId doesn't exist in JS frontendJust clear _gsThreadId; group derived server-side from thread
R5-H3HighSummary cache breaks when exchanges archivedHash-based invalidation (overflow query strings + timestamps)
R5-H4Highforce_continue has no frontend-to-backend plumbingFull data flow: _gsForceResume → POST body → skip_resolution
R5-M1MediumTopic switch resubmit SSE abort race conditionExplicit abort + null controller before re-enabling UI
R5-M2Mediumsearch_and_summarize() doesn't accept skip_resolutionAdd param, forward to pipeline call
R5-M3MediumSimilarity check definition vagueJaccard of lowercased word sets, threshold 0.85
R5-M4MediumResearch planner gets full context it doesn't needPass search_query directly, no conversation_context param
R5-M5MediumDual cache split: resolution (5) and synthesis (20) different overflowCache keyed by max_exchanges
R5-M6MediumMust use search_query variable after resolution in all callsExplicit variable pattern documented in plan
R5-L1LowBuild step 4 could crash if research tested before step 8Add skip_resolution param to all functions in step 1
R5-L2LowConcurrent thread access race on save + summaryAccepted: _gsSearching UI guard prevents in normal use

Post-Round-5 confidence: HIGH