GriSearch Feature Expansion

Competitive gap analysis → 7-phase build plan + critical fix · 5x adversarial-tested

Createds200 (2026-04-06)

Phases0 – 8 + Critical Fix

Est. Effort21 – 30 sessions

Pressure Tests5 rounds, 69 findings

Executive Summary

Build ProgressPhases 0–3, 5, CF complete — Phase 6: News + Product + Academic done. Phase 7: 7A–7D complete (s218–s219), 7E dropped. Remaining: 6D (watches), 8–10.

Phase	Theme	Effort	Status
0	Context + Config Surface	2–3 sessions	Complete (s204)
1	Polish & Quick Wins + JS Extract	2–3 sessions	Complete (s205)
2	Voice Output (TTS)	2–3 sessions	2A-2D built, browser testing pending
3	Agentic Deep Research	4–6 sessions	Complete (s205–s206)
CF	Follow-Up Query Resolution (Critical Fix)	2–3 sessions	Complete (s210–s211)
4	Visual & Rich Content	2–3 sessions	COMPLETE (s208–s214) Image search, rich cards, location. Remaining deferred to Phase 9
5	Advanced Organization	3–5 sessions	Complete (s213)
6	Specialty Search	1–2 each	COMPLETE (s214–s218) Product, Academic, News. Recurring/watch deferred
7	Tabular Data & Spreadsheets	3–5 sessions	COMPLETE (s218–s219) CSV upload, paste, export, formula toggle. Excel/xlsx deferred
8	Context Intelligence	3–5 sessions	COMPLETE (s220) Passive pref extraction, thread notes/brief, context interview, enhanced auto-brief

Build order: Phase 0 → 1 → 3 → 2 → CF (Critical Fix) → 4 → 5 → 6. CF jumps the queue — follow-up queries are fundamentally broken without query resolution.

1. Don't chase parity for parity's sake. Only build features that serve the actual research workflow.

2. Preserve the speed advantage. Quick mode must stay under 3s.

3. Build on what's unique. Group context, personalization, multi-provider diversity are moats.

4. Incremental value delivery. Every phase ships something usable.

Multi-Provider Search Architecture

GriSearch sends every query to three independent search providers simultaneously, merges and deduplicates results, then re-ranks the unified pool. This is the same multi-retrieval pattern used by Perplexity, Google AI Mode, and ChatGPT search.

Provider	Strength	Index	Latency
Brave	Fastest latency, strong keyword precision, independent 30B+ page index	Own	~670ms
Exa	Semantic understanding, spam filtering, high-signal authoritative content	Own	~2s
Parallel	Strong accuracy-to-cost ratio, independent ranking perspective	Own	~5-14s

Pipeline

User Query

ASYNC FAN-OUT

Brave

~670ms · keyword-strong

Exa

~2s · semantic search

Parallel

~5-14s · independent rank

Merge + Dedup

→

Re-Rank

Ranked Results

LLM Synthesis

Answer + Citations

Why Three Providers?

Benefit	Mechanism
Better recall	Three indexes catch what one misses
Better precision	Cross-provider agreement filters noise
Resilience	If one API goes down, the other two still work
Speed	Async fan-out = as fast as the fastest provider (with timeouts)
No vendor lock-in	Can swap providers without rewriting the system
Quality signal	Dedup overlap acts as an implicit relevance vote

Benchmark data (2025-2026) shows the top 4 search APIs are statistically indistinguishable on quality individually. The winning strategy is to use multiple providers and let the combination outperform any single one.

Already Built (Pre-Plan)

GriSearch core was built across s197–s200 before this expansion plan was created.

Multi-provider search engine — Brave + Exa + Parallel, async fan-out, dedup + ranking
Three search modes — Quick (Haiku, <3s), Deep+Summary (Haiku), Deep+Full (Sonnet)
SSE streaming synthesis — Real-time answer streaming via Server-Sent Events
Content extraction — Trafilatura full-text extraction, async thread pool
Query expansion — Haiku-generated alternative queries for deep modes
Thread system — Persistent conversation threads with auto-generated titles
Group context — Per-group notes.md + auto-generated brief.md, injected into synthesis
Voice input — Deepgram STT via WebSocket, mic button in search UI
Search preferences — Per-user preferences panel (stored in search_preferences.md)
Mobile-responsive UI — Full mobile layout with slide-in panels (history, prefs, group notes)
Group auto-brief — Haiku-generated summaries, dirty flag + threshold-based regen
User context personalization — Core memory + preferences injected into every synthesis

s201: Plan page, /plans index, inbox replay fix, config schema (38 keys), settings cleanup, thread archival + health logging, defaults unified.
s204: Phase 0 complete (0A-0D). Per-mode context scaling, XML-tagged exchanges, full config surface + tuning panel, metrics logging + rolling averages, limit warnings, mode badges, Opus option, table rendering, JS extraction to static file, validation hooks.
s205: Phase 1 complete (1A-1D). Phase 3A shipped. Progress indicators, export, result previews, thread context menu, project creation. Research mode: Sonnet planner, Haiku quality gate, multi-step loop, cost controls, Sonnet synthesis.
s206: Phase 3B-3D shipped. GraySearch → GriSearch rebrand. Research timeline table, stop & summarize, structured report cards, "Dig deeper" buttons, first-use explainer. Persistent research data. Cloudflare Pages deploy. Citation table UI.
s208: Citation numbering fix. Collapsible sources list. Spine crash root cause fixed (importlib.reload memory leak). Health logging added.
s213: Phase 5 complete (all 4 sub-phases in 1 session). 5A: cross-thread corpus search. 5B: research notebook (pins, collections, export, 9 endpoints). 5C: desktop DnD (thread moves, reorder persistence). 5D: branched conversations. Also: research retry/rephrase/escalation, synthesis failure caching, "Retry Search" button, XSS fix, notebook info panel, "New Search" pill.
s214: Phase 6C News Mode + Phase 6A Product Research complete. Brave News API with freshness params, news query detection + auto-suggest, _NEWS_SYSTEM prompt. Product mode with review site boosting (11 domains), enriched query expansion, _PRODUCT_SYSTEM Sonnet synthesis with comparison tables. 15 new config schema keys. Two new mode pills (orange News, teal Product).

Executive Summary
Multi-Provider Search Architecture
Already Built (Pre-Plan)
Phase 0A: Per-Mode Context Scaling
Phase 0B: Context Format Upgrade
Phase 0C: Unified Config Surface
Phase 0D: Config UI (Tuning Panel)
Phase 1A: Observability & Cleanup
Phase 1B: Export / Report Generation
Phase 1C: Search Progress Enhancement
Phase 1D: Search Result Previews
Phase 1E: Extract JS to Static File
Phase 2: Voice Output (TTS)
Phase 3A: Research Agent Architecture
Phase 3B: Progress Streaming
Phase 3C: Report Generation
Phase 3D: UI Integration
Research Retry & Recovery
Critical Fix: Follow-Up Query Resolution
Phase 4: Visual & Rich Content
Phase 5: Advanced Organization
Phase 6: Specialty Search
Phase 7: Tabular Data & Spreadsheet Intelligence
Phase 8: Context Intelligence
Phase 9: Document & Image Upload
Phase 10: Auto-Mode Classification
Brainstorm: Mode Architecture Rethink
Adversarial Review Record

Phase 0APer-Mode Context Scaling

Replace the single max_exchanges=5 / 600 char truncation with per-mode strategy. Current usage is 1.6–8.2% of the 200K context window.

Mode	max_exchanges	answer_truncation	Token Budget
Quick	5	800 chars	~1,000 tokens
Deep+Summary	10	2,000 chars	~5,000 tokens
Deep+Full	20	4,000 chars	~10,000 tokens
Research	30	No truncation	~15,000 tokens

Refactor get_thread_context() to accept max_exchanges + max_answer_chars params (s204)
Route handler passes mode-appropriate limits from cfg (s204)
Add input token logging: synthesis + expand_query (s204)
Quick mode query unchanged (<600 extra chars, preserves <3s target)
Per-search metrics logging -- rolling 20/mode to search_metrics.json (s204)
Rolling averages in tuning panel (muted orange, per applicable control) (s204)
Graceful limit handling -- amber inline warnings when limits hit (s204)
Opus model option + Basic/Advanced tier toggle + descriptions (s204)
Mode-colored labels in tuning panel matching inline badge colors (s204)
Modified-from-default indicator (green *) on changed values (s204)
Per-field tradeoff descriptions with click-to-expand (s204)
Averages expanded to cover all 38 config fields (s204)
group_context_chars metric added to all pipelines (s204)

Round 2 C-1: Token budget estimates measure conversation context ONLY. Full prompt = system (~200 tok) + user context (~750) + group context (~500-700) + search passages (up to ~10,000) + conversation context. Research synthesis could reach 30,000+ tokens ($0.10-0.50). Token+cost logging must ship before expanding limits.

Phase 0BConversation Context Format Upgrade

Upgrade from plain User:/Assistant: to XML-tagged exchanges with mode and citations.

XML-tagged exchanges with mode attribute (s204)
Include exchange mode tag (quick vs deep calibration) (s204)
Include citation URLs in <sources> block, top 5 per exchange (s204)
Mode badge on each response (color-coded top + bottom with token counts) (s204)
Improved thread title generation (few-shot prompt, answer-rejection guard) (s204)
Research mode button placeholder (disabled, Phase 3) (s204)
Color-coded mode selector buttons (s204)
Markdown renderer: tables, ### headings, --- dividers, tighter spacing (s204)

<exchange n="1" mode="quick">
<query>Best espresso machine under $500?</query>
<answer>The Breville Barista Express...</answer>
<sources><url>https://example.com/review</url></sources>
</exchange>

Phase 0CUnified Config Surface ("Sliders")

Centralize all tunable limits. Code defaults in git-tracked settings.yaml. Browser-written overrides in .gitignored config/grisearch_tuning.yaml. Runtime merges both, tuning takes precedence. Config snapshot pattern prevents mid-search TOCTOU races.

Group	Keys	Examples
Context Limits	8	max_exchanges, max_answer_chars per mode
Models	6	synthesis model per mode, planner, quality gate
Token/Cost	7	max_tokens per mode, cost ceiling, Brave rate limit
Research Agent	4	max_rounds, sub_questions, wall time, concurrency
Search Providers	8	timeouts, max results, max extract pages
Auto-Brief	4	exchanges/thread, truncation (normal vs research)
Thread Health	1	size warning threshold (KB)

Add all 38 schema keys to settings.yaml under grisearch: (s204)
Create config/grisearch_tuning.yaml (.gitignored) for browser overrides (s204)
Update _get_settings(): merge defaults + tuning overrides
Remove dead Settings() no-arg call from _get_settings()
Build GRISEARCH_CONFIG_SCHEMA (38 keys, 8 groups) as single source of truth
Config snapshot: pipelines call _get_settings() once, pass cfg downstream (s204)
All cfg.get() fallbacks reference _default() from schema
Replace Path(__file__).parent.parent with env var (s204)
Hide config keys for unbuilt features until they ship (s204, R3-13)

Phase 0DConfig UI (Tuning Panel)

In-browser config editor on the GriSearch page. Gear icon opens settings panel. Both API endpoints behind web auth. Cost previews labeled as estimates with tooltip caveat.

Config Type	Control	Example
Integer limits	Slider + stepper	`5 [---o-----] 30`
Cost ceilings	Stepper ($0.05)	`$0.50 [-] [+]`
Model selection	Dropdown	`[claude-haiku-4-5 v]`
0 = unlimited	Toggle + stepper	`[x] Limit: 4000`

GET /api/search/config returns config + schema metadata (s204)
POST /api/search/config validates + merges overrides into tuning YAML (s204)
POST /api/search/config/reset clears all overrides (s204)
Config schema with type/range validation (s201)
Grouped sections, auto-generated from schema (s204)
Live cost/token impact preview (deferred — averages in tuning panel serve this need)
Instant apply -- no restart needed (s201)
"Reset all" button + modified values highlighted green (s204)
Tuning panel via hammer icon in header (s204)
JS extracted to static/js/search.js (no more {{}} escaping) (s204)
PostToolUse hook for rendered JS validation on views/*.py (s204)
Pre-restart validation: scripts/validate_views.py (s204)

Phase 1AObservability & Cleanup

Add log.info for model/mode in synthesize() (s204)
Per-search cost logging: input_tokens, output_tokens, model, cost (s204)
Fix Dict[tuple, Any] type annotation (s204)
Fix _REPO_ROOT = Path(__file__).parent.parent (s204)

Thread Health Monitoring

Log file size + exchange count on every save_exchange()
Primary: exchange count color dot (green <10, yellow 10-20, red >20) (s205)
Thread list shows exchange count indicator per thread (s205)
MCP get_stack_status includes thread health summary (deferred — operational tooling)

Thread Archival

Archival runs inside save_exchange() (atomic, no race conditions)
After N exchanges (configurable, default 20), move older to archive
load_thread_full() for complete history (deferred — no threads near archive threshold)

Roadmap: Per-exchange storage (solution C) if archival proves insufficient.

Phase 1BExport / Report Generation

Markdown Export (MVP)

GET /api/search/thread/{id}/export?format=md (s205)
Title as H1, exchanges as H2, citations as footnotes (s205)
Export button on thread bar + mobile share sheet (s205)
File named {title}_{date}.md (s205)

HTML Export Complete (s218)

Same endpoint with format=html, print-friendly styling, tables, code blocks (s218)
HTML export button alongside MD export in thread bar (s218)

Platform UX: Desktop: browser download. Mobile (Safari): navigator.share() with fallback.

Phase 1CSearch Progress Enhancement

During expanding: yield sub-queries as detail line (s205)
During reading: yield URLs, show unique domains (s205)
During searching: show providers ("Searching Brave + Exa...") (s205)
Elapsed time display (running timer, 500ms update) (s205)
Stage-specific icons (search, expand, read, synthesize) + progress bar (s218)

Phase 1DSearch Result Previews

Preview cards: favicon + title + domain + date + snippet (2-line clamp) (s205)
Cards collapse to compact chips on synthesis start (s205)
Mobile: 44px min-height, vertical stack (s205)
Click-to-expand: tap source card to show full snippet + "Visit" link (s218)

Phase 1EExtract JS to Static File

Prerequisite for Phase 2+. views/search.py = 1,113 lines of double-brace-escaped JS in Python template strings.

Extract search JS into static/js/search.js (s204)
PostToolUse validation hook + validate_views.py (s204)
Extract remaining JS from willy.py, pages.py, dashboard.py (deferred — S-14)
Migrate createScriptProcessor → AudioWorkletNode (deferred — Phase 2 prerequisite)

Phase 2Voice Output (TTS)

Complete the voice loop. SSE with base64 audio chunks (proven tunnel-compatible).

2A. TTS Provider

Evaluate: Deepgram Aura, ElevenLabs, OpenAI TTS, Cartesia (s208)
Criteria: <500ms TTFB, natural voice, <$0.01/search (s208)
Build lib/tts.py — Deepgram Aura-2 REST streaming (s208)

2B. Streaming Pipeline

SSE audio_start/chunk/done events (base64 MP3) (s208)
Web Audio API decode + queue playback (s208)
End-to-end test: SSE pipeline streams audio chunks (s208)
Tap/click interrupt: toggle, indicator click, new search (s208)

2C. Voice Flow

"Voice mode" toggle (speaker button, green active state) (s208)
Auto-listen after TTS finishes: mic activates after ducking delay, AUTO toggle button (s212)
Audio ducking: _gsTTSPlaying gate on mic start + audio processor, configurable delay (500ms default, 800ms for Bluetooth) (s212)
Bluetooth auto-detect via enumerateDevices(), extends ducking delay (s212)
Voice preferences: localStorage persistence for voice mode, auto-listen, ducking delay (s212)

2D. Smart TTS

Strip citations/URLs/markdown before TTS (_strip_for_tts) (s208)
Truncate long answers at sentence boundary (4000 char cap) (s208)
Mode-aware TTS length: Quick/Deep+S full (4000), Deep+F/Research first ~2 paragraphs (1500) (s212)
Table-to-prose conversion (_table_to_prose) for natural reading (s212)
Code block stripping, inline code cleanup, Sources: line removal (s212)
Dangling preposition cleanup after URL removal (s212)
Truncation indicator: audio_done.truncated flag + muted UI notice (s212)
Live browser test: News + Product TTS confirmed working (s218)

2E. Voice Input Polish (s218)

Silence detection: 4s threshold (up from 1.5s), resets on interim results (not just finals)
5-second visual countdown before auto-send (mic button shows 5…4…3…2…1, color shift cyan → yellow → red)
Tap mic during countdown to cancel (keeps text in input for editing)
TTS voice selector: 12 Deepgram Aura-2 voices, persisted to localStorage, passed through to backend
AUTO barge-in: interrupt TTS by speaking (shipped but needs tuning — pinned)

Phase 3AResearch Agent Architecture

Multi-step autonomous research via non-streaming wrappers over existing search functions.

User query → [Planner/Sonnet] → [Quality Gate/Haiku]
  → [Research Loop] → [Synthesizer] → [Structured Report]

Search Wrappers

search_and_summarize(): consumes async generator, returns dict (s205)

Cost Control (Round 2 C-2)

Shared Brave rate limiter (asyncio.Semaphore)
Per-research cost ceiling (default $0.50) (s205)
Hard cap: research_max_brave_calls (default 20) (s205)
Cost estimate shown before research starts

Lifecycle (Round 2 R-4)

Cancellation flag via asyncio.Event (s206)
Concurrent limit config: research_concurrent_limit (s205)
Cancellation on tab close (beforeunload)
Persist partial findings to disk

Agent Loop

SSE streaming (non-blocking via async generator) (s205)
Sonnet planner + Haiku quality gate (s205)
Max 5 rounds, 5 min wall time, 3-8 sub-questions (s205)
Per-sub-question mode selection (quick vs deep_summary) (s205)
Structured scratchpad per sub-question (s205)

Phase 3BProgress Streaming

SSE research_progress event (step, total, sub_question, status) (s205)
Vertical timeline with status indicators (pending/spinner/check/fail/skipped) (s206)
Running timer + step counter in panel header (s206)
"Stop and summarize" button + POST /api/search/research/cancel (s206)
"Also consider..." redirect input (mid-research constraint injection)

Phase 3CReport Generation

Sonnet final synthesis from all findings (s205)
Structured report: Summary, Findings, Open Questions, Sources (s205)
Saved as thread with mode: "research" (s205)
Auto-export to group directory
Brief weighting config: 4 exchanges/thread, 800-char truncation (s205)

Roadmap: Separate brief section (C) after evaluating real output.

Phase 3DResearch UI Integration

Fourth mode pill: "Research" (green, #10b981) (s205)
First-use explainer via localStorage (s206)
Full-width report card (.gs-report, 95% width, green border) (s206)
"Dig deeper" button on subsection headings (switches to Deep+Full) (s206)

EnhancementResearch Retry & Recovery (s213)

Comprehensive failure recovery for the research agent and all search modes. Previously, failed sub-questions were silently skipped and synthesis failures lost all findings.

Sub-Question Retry / Rephrase / Escalation

_rephrase_sub_question() — Haiku rephrases failed sub-questions from a different angle
_execute_research_step() — 3-tier recovery: retry → rephrase+retry → mode escalation (quick→deep_summary), up to 4 attempts per sub-question
UI: retry/rephrase/escalation badges on research timeline steps, "N attempts exhausted" on final failure

Research Synthesis Failure Recovery

cache_research_findings() — caches findings on synthesis failure so sub-question work isn't lost
retry_research_synthesis() — retries final synthesis from cached findings (skips all sub-question searches)
/api/search/retry auto-detects research findings cache and routes to correct retry function
UI: "Retry Synthesis" button with "Uses cached findings (skips sub-question searches)" hint

Non-Research Retry

"Retry Search" button on Quick/Deep/Deep+Full failures (re-submits same query, same mode)

Critical FixFollow-Up Query Resolution

Discovered: s210 (2026-04-07). Follow-up queries in threads produce garbage search results because query expansion has no access to conversation history. Every major AI platform rewrites follow-ups before searching — GriSearch did not. Effort: 2–3 sessions. Pressure tested: 2 rounds, 28 findings, all resolved.

CF.0 Discovery & Analysis (s210)

Identified bug: Exchange 5 in Iran war thread returned FedEx/K-pop results for "Deliver updates since the last review"
Root cause analysis: 7 blind spots across 4 search modes where conversation_context is in scope but not forwarded to query expansion
Mapped full query flow: expand_query(), _research_plan(), get_thread_context(), all 4 search modes, SSE endpoint
Confirmed synthesis receives context (answer referenced prior briefing) but search queries were decontextualized

CF.1 Industry Research (s210)

Researched 7 platforms: ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, OpenAI Deep Research
Reviewed academic SOTA: CHIQ history enhancement, conversational query reformulation, RAG multi-turn patterns
Key finding: every platform rewrites follow-ups; only Perplexity Pro and Copilot show the rewrite to users
Key finding: Google/Elastic use original + rewritten in fan-out (never fully replace)
Key finding: CHIQ topic switch detection is academic SOTA for preventing stale context pollution
Identified 4 gaps plan must address: query fan-out, topic switch, rolling summarization, error accumulation

CF.2 Plan Design & Adversarial Testing (s210)

Designed 5-phase fix: query resolution, context summarization, SSE events + frontend, research mode, cost tracking
16-issue walkthrough with RG — each issue presented, discussed, agreed or modified
Round 4 adversarial: 15 findings (1 critical, 4 high, 6 medium, 4 low)
Round 5 adversarial: 13 findings (1 critical, 4 high, 5 medium, 3 low)
All 28 findings resolved and incorporated into final plan

CF.3 Query Resolution (backend)

New config keys: synthesis_model_resolution (Sonnet default), enable_follow_up_resolution, resolution exchange/char limits
Add skip_resolution param to all 3 search functions + search_and_summarize()
New resolve_follow_up(): structured JSON return with topic_switch detection, robust 5-step JSON parsing fallback chain
Resolution-specific context truncation (5 exchanges, 800 chars — independent of per-mode synthesis limits)
Wire into search_deep_full using search_query variable pattern (original bug trigger)
Wire into search_quick and search_deep_summary
Search fan-out: [original, resolved] + expand(resolved) — original as safety net per Google/Elastic pattern
Pass resolved query to synthesize() (not raw original)
Similarity check: Jaccard of lowercased word sets, suppress display if >0.85

CF.4 Context Summarization (summary-beyond-window)

New ensure_context_summary() async function (keeps get_thread_context() synchronous)
Summary cache in thread JSON, keyed by max_exchanges, hash-based invalidation (handles archival, deletion, mode switches)
Config schema caps: Quick max=5, Deep+S max=10, Deep+F and Research uncapped
All modes: older exchanges summarized beyond window (never dropped)
get_thread_context() reads cached summary, prepends <context_summary> block

CF.5 SSE Events + Frontend

New query_resolved SSE event + emerald green .gs-msg-resolved block ("INTERPRETED AS")
New topic_switch_detected SSE event + amber .gs-msg-switch prompt
Topic switch UX: "Continue in this thread" / "Start new search" buttons
force_continue data flow: _gsForceResume flag → POST body → skip_resolution
Clean fetch abort on topic switch (prevent SSE race condition)
save_exchange(): persist resolved_query field (omit when empty)
Thread history: render green block for stored resolved_query with backward compat guard

CF.6 Research Mode + Metrics

Pass search_query to _research_plan() directly (no conversation_context needed — resolved query is self-contained)
Sub-loop: skip_resolution=True via search_and_summarize()
Final synthesis: compact preamble ("follow-up, resolved to: ..."), not full history
Resolution cost tracking: resolution_input_tokens, resolution_output_tokens, resolution_cost_usd in metrics + answer meta line

CF.7 Verification

Live test: Iran war thread follow-up ("Deliver updates since the last review")
Live test: topic switch detection ("What's the weather in San Diego" in Iran thread)
Live test: similar query suppression (no green block for already-specific queries)
Live test: force_continue flow
Live test: summary caching across modes

Roadmap (not in this build): Embedding-based topic switch detection (cosine similarity). Error accumulation monitoring (resolution quality tracking over long threads).

Full spec: ~/.claude/plans/grisearch-follow-up-context-fix.md

Phase 4Visual & Rich Content

4A. Image Search

Brave Image Search API: _search_brave_images() + ImageResult model (s212)
Parallel image search in Deep+Summary and Deep+Full pipelines (s212)
SSE image_results event with thumbnail grid (3-col desktop, 2-col mobile) (s212)
Click-to-expand: full image overlay + source link + dimensions (s212)
Collapsible IMAGES header (s212)
Image upload for reverse search → moved to Phase 9 (Document & Image Upload System)
Storage lifecycle → moved to Phase 9 (two-tier: ephemeral 7-day + persistent indefinite)

4B. Rich Result Cards

Query-type detection: weather, quick fact, comparison, timeline via regex patterns (s212)
Synthesis format hints: type-specific prompt suffix guides structured output (s212)
Comparison: "X vs Y" triggers table-formatted synthesis (existing table renderer handles it) (s212)
Timeline: "history of" triggers chronological list; vertical timeline CSS renderer with date dots (s212)
Quick facts: bold answer lead, compact format hint (s212)
Weather: structured conditions + forecast hint (s212)

4C. Location-Aware

search_location + search_country_code config keys in tuning panel (str type) (s212)
"Near me" / "nearby" / "local" detection via regex, replaces with configured location (s212)
Brave country param wired to all 3 search pipelines (quick, deep_summary, deep_full) (s212)
Location badge in header (muted, auto-hidden when empty) (s212)
Badge updates on tuning save (s212)
Dynamic browser geolocation: "Use Location" toggle in mode row, detects on new thread, Nominatim reverse geocode, badge updates (s212)

Phase 5Advanced Organization

5A. Cross-Thread Search Complete (s213)

GET /api/search/corpus — full-text search across all threads + archives, title/query/answer scoring (3x/2x/1x) + recency boost, group filter
Lazy-build corpus index, persisted to corpus_index.json (5-min TTL), invalidated on save/delete, ~0.4ms cached load
History panel search input with 300ms debounce, results replace thread list, click loads thread + scrolls to exchange with highlight
"Search within this group" filter via group_id API param
Race condition protection (sequence counter), XSS fix (javascript: URL blocking in markdown links)

5B. Research Notebook Complete (s213)

Pin button (★) on every answer bubble — pin/unpin toggle, duplicate prevention, yellow highlight
Pin data layer: pinned.json with thread_id, exchange_index, query, answer_snippet, collection_id
Collections CRUD: create, delete, rename, list with pin counts, assign pins to collections
Notebook section in history panel (yellow theme, collapsible collections, unsorted section)
Collection Markdown export — download with collection name, notes, each pin as section
9 new API endpoints: pin, unpin, list pins, assign pin, list/create/delete/rename collections, export
Auto-suggest pins after Deep+Full/Research (deferred)

5C. Drag-and-Drop Thread Organization Complete (s213)

Desktop: HTML5 DnD with drag handles + drop targets on group headers (UA-gated, mobile excluded)
Drop targets: group headers glow blue on dragover, "Recent" = unassign from group
Thread reorder within groups (thread_order in groups.json) + POST /api/search/groups/{id}/reorder
Group reorder (group_order in groups.json) + POST /api/search/groups/reorder
History panel respects both order fields (fallback to updated_utc/most-recent)
Mobile: context menu flow preserved (no DnD)

Reuses existing api_search_group_assign for moves — no new backend for basic group assignment. Sort order APIs are new. Desktop-only initially; mobile DnD (long-press or polyfill) evaluated after desktop ships.

5D. Branched Conversations Complete (s213)

"Branch here" button on each exchange in thread replay — creates new thread with exchanges up to branch point
Branch metadata (branched_from) + group inheritance + corpus index invalidation
Branch does NOT bump brief counter — copied exchanges already counted
Cyan branch icon on branched threads in history panel

Phase 6Specialty Search

6C. News Mode Complete (s214)

Brave News API (_search_brave_news) + dual-source (News + Web)
News query detection with auto-freshness hints (pd/pw/pm)
Recency-first sorting + news source boost
_NEWS_SYSTEM synthesis prompt (what changed, attribution)
Orange News mode pill (#f97316) + mode auto-suggest
8 config schema keys (model, tokens, freshness, etc.)
Timeline rendering (deferred to 4B)
"Follow this topic" (deferred to 6D)

6A. Product Research Complete (s214)

_detect_product_query with patterns + false-positive exclusions
Enriched query expansion targeting review sites (Wirecutter, RTINGS, Reddit)
_PRODUCT_SYSTEM synthesis: recommendation, comparison table, pros/cons
Review domain boost (11 review sites scored higher)
Teal Product pill (#14b8a6) + mode auto-suggest
7 config keys (Sonnet model, 3000 tokens, 8 extract pages)

6B. Academic/Technical Mode Complete (s218)

Academic search pipeline (Brave + Exa, scholarly query expansion, 16-domain academic boost)
Structured synthesis prompt (key findings, notable papers, methodology, open questions)
Query detection, mode auto-suggest, 8 config schema keys, purple pill
Semantic Scholar API integration (citation graph, paper-level metadata)

6D. Recurring Search / Watch Topics

Effort: 2–3 sessions. No external dependencies. Builds on existing search_and_summarize() non-streaming wrapper.

Watch data model + storage (max 5 active, Quick mode only, cost estimate on creation)
Background scheduler (asyncio.create_task pattern, 6-hour default interval, configurable)
URL dedup + diff-focused synthesis (skip if no new results)
"Watch this topic" button + watches panel (slide-out) + unread badges
CRUD API (/api/search/watches: create, list, toggle, delete, force re-run)

Phase 7Tabular Data & Spreadsheet Intelligence

Theme: Accept, analyze, transform, and export structured data across all modes. Actual: 2 sessions (s218 foundation + s219 upload/formula).

7A. Planning & Requirements Complete (s219)

Competitive analysis: ChatGPT (Code Interpreter, no native export), Gemini (no code exec in chat, strong in Sheets), Copilot (Python in Excel) (s219)
Scope decision: CSV upload + formula toggle = build. Excel/openpyxl = defer. Data mode = drop (no code execution = not competitive) (s219)

7B. Tabular Input (all modes) Complete (s219)

Paste detection (TSV/CSV) with table preview below input (s218)
Context injection as fenced CSV block (<data> tag in query) (s218)
File upload: CSV/TSV/TXT via file picker + drag-drop, 500KB guard, client-side FileReader (s219)
Size limits: 500KB file guard, 5000+ row warning in preview (s219)
Excel upload (openpyxl) — deferred (paste from Excel already works)

7C. Tabular Output & Export (all modes) Complete (s218)

Copy table as TSV to clipboard (per-table Copy button) (s218)
CSV download button on each rendered table (s218)
Excel export (.xlsx via openpyxl) — deferred (CSV covers 90%)
Multi-table "Download all" — deferred (per-table export works)

7D. Formula Generation Complete (s219)

Fenced code blocks with language label + copy button (all modes) (s218)
Excel vs Sheets toggle button, <formula_platform> context injection with platform-specific syntax hints (s219)
Formula validation (verify output) — deferred

7E. Data Mode Dropped (s219)

Without a code execution sandbox, Data mode would be “Claude without web search” — not competitive vs ChatGPT Code Interpreter. GriSearch’s moat is search, not computation. Revisit if lightweight code execution becomes available.

7F. Future (not building yet)

Chart generation, Google Sheets integration, SQL-like queries, pivot table builder, data persistence across sessions.

Phase 8Context Intelligence (Active + Passive Learning) — COMPLETE (s220)

Theme: Make GriSearch progressively smarter about user preferences and research patterns. Active interviews + passive extraction + enhanced auto-briefs. Effort: 3–5 sessions. Completed in 1 session (s220).

8A. Planning & Requirements — COMPLETE (s220)

Audit current context injection chain (user, project, thread, conversation) (s220)
Catalog preference types: format, source, domain, constraint, fact (s220)
Review ChatGPT memory system: adopt inline notifications, granular delete, timestamps; avoid silent bulk capture, no-edit, no-expiry (s220)
Adversarial review → deferred (built incrementally with live testing instead)

8B. Passive Preference Extraction — COMPLETE (s220)

Post-synthesis Haiku extraction: 0-3 new preferences per exchange (s220)
Category tagging: [format], [source], [domain], [constraint], [fact] (s220)
Dedup: 70% content-word overlap filter, skip short words (s220)
Timestamps + (auto) tag on every extracted preference (s220)
Dedicated ## Auto-Learned section in search_preferences.md (s220)
SSE toast notification on preference learned (s220)
auto_learn_preferences toggle in tuning panel (Context group) (s220)
Non-blocking: extraction runs after done event (s220)
Mode gating: Deep, Research, Product, Academic only (skip Quick, News) (s220)

8C. Active Context Interview — COMPLETE (s220)

Interview button in Project Notes panel (s220)
Haiku generates 4-5 targeted questions from existing context (s220)
Inline Q&A flow with Skip/Next buttons (s220)
Answers synthesized into updated project notes via Haiku (s220)
Interview state persistence (resume across sessions) → deferred
Proactive re-interview suggestion → deferred

8D. Thread-Level Context — COMPLETE (s220)

Per-thread notes field with pencil button in thread bar (s220)
Thread auto-brief via Haiku (regenerates every 5 exchanges) (s220)
Thread notes + brief injected as <thread_context> in synthesis (s220)
API: GET/PUT /api/search/thread/{id}/notes, POST regen-brief (s220)

8E. Enhanced Auto-Brief — COMPLETE (s220)

Structured brief: Findings + Preferences + Open Questions sections (s220)
Cross-project pattern extraction → deferred (organic via 8B passive extraction)

8F. Future (not building yet)

Preference confidence scoring, conflict detection, onboarding interview, preference analytics dashboard, interview state persistence, proactive re-interview.

Phase 9Document & Image Upload System

Theme: Persistent, organized, searchable uploads that survive across sessions and threads. The #1 pain point with ChatGPT is upload amnesia — documents tied to a single conversation and forgotten next session. Effort: 3–4 sessions. Dependencies: Phase 4A (image search UI), Phase 5A (cross-thread search).

9A. Storage Architecture

Two-tier retention: ephemeral (7-day auto-clean) + persistent (indefinite, user-managed)
Metadata sidecar JSON (filename, tags, extracted text path, thread associations)
Per-user quotas: 10MB/file, 500MB/user persistent. Configurable per deployment.
Multi-tenant: per-user isolation + shared team library (Data/system/search/shared_uploads/)
Shared library: publish from personal, read-only refs, content-hash dedup, configurable quota
Admin storage dashboard: GET /api/search/admin/storage (per-user usage summary)

9B. Text Extraction & Indexing

PDF text extraction (PyMuPDF)
Image OCR/description (Claude vision API)
Extracted text stored alongside uploads, indexed for cross-thread search

9C. Upload UI

Upload button (paperclip icon in input row) + drag-and-drop
Post-upload choice: "Use for this search" vs "Save to library"
Inline preview: PDF first page + page count, image thumbnail

9D. Document Library

Slide-out library panel (same pattern as history/preferences)
Search within library by filename, extracted text, tags
"Reference this" button — injects document context into next search
Auto-tag on upload via Haiku

9E. Cross-Session Reference

@-mention documents: @suntsu-contract what are the termination clauses?
Auto-detect document references in queries, inject extracted text as <document_context>
Thread association tracking (usage history in library)

9F. Image Upload for Search

Reverse image search: upload → Claude vision describes → description enriches search
Mobile camera capture (accept="image/*" capture="environment")
Ephemeral by default, "Save to library" promotes to persistent

9G. Cleanup & Maintenance

Spine startup task: auto-clean ephemeral files older than 7 days
Storage quota enforcement on upload
Orphan cleanup (extracted text without matching upload)

Open questions: Document versioning (replace vs coexist), large document chunking (semantic chunk selection for 50+ page contracts), shared library moderation at 30 users, extraction cost at scale (local OCR fallback vs Claude vision), cross-deployment portability.

Phase 10Auto-Mode Classification

Theme: Intelligent query routing — classify the user's intent and auto-select the best search mode. Effort: 1–2 sessions. Roadmap: T1-28.

10A. Query Classifier

Rule-based classifier: keyword patterns, question structure, temporal markers (news), product/price/review/buy signals (product), multi-source cues (research)
Extend existing gsAutoSuggestMode (s214) into a full classifier that runs automatically
Confidence scoring: only auto-route when classification confidence is high, fall back to Quick for ambiguous queries
Override: user can still manually select a mode to override auto-classification

10B. Auto Mode UI

"Auto" pill in mode selector (replaces manual mode selection as default)
Show detected mode badge on search results (e.g. "Auto → News")
Persist Auto preference to localStorage

10C. Haiku Upgrade Path (optional)

If rule-based accuracy is insufficient, add lightweight Haiku classification call
Latency budget: <500ms added to search time
Cache frequent query patterns to avoid repeat classification calls

BrainstormMode Architecture Rethink

Status: Paused. All new mode development on hold pending this session. Current modes ship as-is. This session rethinks the entire approach before building more.

Layer 1: Search Providers (tools)

Provider	API	Strengths	Used By
Brave Web	Brave Search	Fast, broad, reliable. Backbone of every mode.	All modes
Brave News	Brave News	Recency-filtered, freshness params (pd/pw/pm)	News only
Brave Images	Brave Images	Visual results for any query	All except Quick
Exa	Neural search	Semantic relevance, returns full text (no extraction needed)	Deep+S, Deep+F, Product, Academic
Parallel	Parallel AI	Independent aggregation, different source pool	Deep+Full only

Layer 2: Source Classifications

Source Type	Examples	Boost Applied
General web	Wikipedia, blogs, forums	None (baseline)
News outlets	Reuters, AP, NYT, BBC	+0.3 in News mode (brave_news source tag)
Review sites	Wirecutter, RTINGS, CNET, Reddit, Amazon	+0.25 in Product mode (12 domains)
Academic / scholarly	arxiv, PubMed, Nature, IEEE, Springer, JSTOR	+0.3 in Academic mode (16 domains)
Technical / docs	MDN, Stack Overflow, GitHub, official docs	None (no dedicated mode yet)
Government / legal	regulations.gov, CourtListener, .gov sites	None (no dedicated mode yet)
Social / forums	Reddit, HN, X, Quora	Partial (Reddit boosted in Product)

Layer 3: Pipeline Behaviors

Behavior	Quick	Deep+S	Deep+F	Research	News	Product	Academic
Query expansion (Haiku)	—	2	4	via subs	—	2 + 5 hardcoded	2 + 3 hardcoded
Page extraction	—	5 pages	10 pages	via subs	5 pages	8 pages	6 pages
Domain boost	—	—	—	—	news sources	review sites	scholarly domains
Synthesis model	Haiku	Haiku	Haiku*	Sonnet	Haiku	Sonnet	Sonnet
Structured output	—	—	—	Report	Briefing	Table + pros/cons	Papers + methodology
Follow-up resolution	✓	✓	✓	✓	✓	✓	✓

Observations

Modes are mostly combinations of the same 3 dimensions: provider mix, domain boost, and synthesis prompt
Deep+Summary and Deep+Full differ only in scale (more providers, more extraction, more tokens) — not in kind
News is the only mode with a unique provider (Brave News API)
Product, Academic, and any future specialty mode share the same pipeline (Brave + Exa + expand) — only the boost list and prompt differ
8 pills already feels dense on mobile. Adding more specialty modes (legal, technical, social) doesn't scale as discrete pills

Questions for Brainstorm

Should domain boosts be composable layers instead of mode-locked? (e.g., "deep search + academic boost" vs "academic mode")
Could a single adaptive mode replace Quick / Deep+S / Deep+F by scaling effort based on query complexity?
Are specialty modes (News, Product, Academic) better as boost presets applied on top of a depth slider?
Should auto-classification drive the boost layer transparently, with manual override available?
What's the right UX: fewer pills + smarter routing, or keep pills but collapse behind a "more" menu?

Blocked items: 6D (Recurring Search), Phase 10 (Auto-Mode), any new specialty modes. Resume after this brainstorm concludes.

Adversarial Review Record

Round 1 (s200) — 15 findings

Initial confidence: MEDIUM. All addressed.

ID	Severity	Finding	Resolution
C-1	Critical	1A is a non-issue	Reduced to logging + cleanup
C-2	Critical	Can't reuse search generators	Non-streaming wrappers in 3A
C-3	Critical	WebSocket TTS won't work through tunnel	Switched to SSE-first
R-1	Risk	Research blocks uvicorn worker	Background `create_task`
R-2	Risk	Inline JS at breaking point	Added 1E: JS extraction
R-3	Risk	Recurring search unbounded cost	Cap 5 watches, Quick only
R-4	Risk	Image upload lifecycle missing	Path, retention, max size defined
R-5	Risk	Haiku planner poor quality	Sonnet + quality gate
G-1:4	Gap	API degradation, Dict, index, ducking	All addressed in respective phases
Q-1:3	Question	Phase 6 order, export UX, build order	All resolved

Post-Round-1 confidence: HIGH

Round 2 (s200/s201) — 12 findings

All addressed in s201 review with RG.

ID	Severity	Finding	Resolution
C-1	Critical	Token budget undercounts	Full budget + cost logging first
C-2	Critical	No research cost cap	Semaphore, ceiling, confirmation
R-1	Risk	`createScriptProcessor` deprecated	Migrate in 1E
R-2	Risk	SSE audio may buffer	200-500ms chunks, tunnel test
R-3	Risk	Thread files unbounded	Monitoring (A) + archival (B), C roadmapped
R-4	Risk	Research no lifecycle	Registry, cancel, persist partial
G-1:4	Gap	BT latency, model deprecation, JS risk, Path fix	All addressed
Q-1:2	Question	Brief weighting, branch counter	A+B weighting, skip counter on branch

Post-Round-2 confidence: HIGH

Round 3 (s201) — 14 findings

Post-0C/0D additions. All addressed in s201.

ID	Severity	Finding	Resolution
R3-1	Critical	settings.yaml git-tracked; browser writes = merge conflicts	Separate `.gitignore`d tuning file
R3-2	Critical	No config caching; TOCTOU race mid-search	Config snapshot pattern per pipeline
R3-3	Risk	`_get_settings()` dead broken code	Remove dead `Settings()` call
R3-4	Risk	Archival race with `save_exchange()`	Archival inside save (atomic)
R3-5	Risk	Defaults scattered across code + schema	Schema dict as single source of truth
R3-6	Risk	Cost preview impossible to compute accurately	Label as estimates with caveat tooltip
R3-7	Gap	Config API needs auth	Behind web auth middleware
R3-8	Gap	File KB poor proxy for context usage	Primary metric: exchange count
R3-9	Gap	Cost ceiling slider unbounded	Schema max=$5.00
R3-10	Gap	A+B brief weighting undefined	Defined inline (4 exch, 800 char)
R3-11	Gap	AudioWorklet migration underscoped	Worklet file + MIME + extra time noted
R3-12	Question	Config change hits in-flight search	Covered by R3-2 snapshot
R3-13	Question	Model keys for unbuilt features confusing	Hide until feature ships
R3-14	Question	Effort unchanged after Phase 0 doubled	Revised: 19-27 sessions total

Post-Round-3 confidence: HIGH

Round 4 (s210) — Critical Fix, Pass 1 — 15 findings

Adversarial review of the follow-up query resolution plan. All addressed.

ID	Severity	Finding	Resolution
R4-C1	Critical	Haiku too weak for query resolution (Quick/Deep+S default)	Dedicated `synthesis_model_resolution` config, default Sonnet
R4-C2	Critical	Raw query to synthesis creates semantic mismatch with resolved search results	Pass resolved query to `synthesize()`
R4-H1	High	Token explosion in Research resolution (30 exchanges, unlimited chars = 22K tokens)	Resolution-specific truncation: 5 exchanges, 800 chars
R4-H2	High	Research sub-loop accidentally triggers resolution on sub-questions	`skip_resolution=True` flag on sub-loop calls
R4-H3	High	No handling of unrelated topics in existing threads	Topic switch prompt + user choice UX (continue/new thread)
R4-M1	Medium	No suppression of "INTERPRETED AS" for similar queries	Jaccard similarity >0.85 suppresses display
R4-M2	Medium	`expand_query()` doesn't need full conversation_context	No changes to expand_query — resolved query is sufficient
R4-M3	Medium	Research final synthesis doesn't need full history	Compact preamble only
R4-M4	Medium	Quick mode latency concern (~500-800ms)	Accept: correct results > fast garbage
R4-M5	Medium	Need `resolved_query` capture pattern in pages.py	Initialize alongside collectors, use `or None`
R4-L1	Low	Config toggle needed	`enable_follow_up_resolution` boolean
R4-L2	Low	Resolution cost not tracked in metrics	Added to metrics dict + answer meta line
R4-L3	Low	Old threads missing `resolved_query` field	Simple `if` guard in history rendering
R4-L4	Low	`save_exchange` signature underspecified	`resolved_query: str = ""`, omit when empty
R4-M6	Medium	Rolling summarization needed for long threads	Summary-beyond-window for all modes with per-mode hard caps

Post-Round-4 confidence: HIGH

Round 5 (s210) — Critical Fix, Pass 2 — 13 findings

Second adversarial pass after incorporating Round 4 fixes. Found subtle interaction effects. All addressed.

ID	Severity	Finding	Resolution
R5-C1	Critical	`get_thread_context()` is sync; adding async LLM call inside crashes	Split: `ensure_context_summary()` async + `get_thread_context()` stays sync
R5-H1	High	LLM JSON output parsing has no fallback for malformed responses	5-step fallback: strip fences, json.loads, regex extract, validate keys, default
R5-H2	High	`_gsGroupId` doesn't exist in JS frontend	Just clear `_gsThreadId`; group derived server-side from thread
R5-H3	High	Summary cache breaks when exchanges archived	Hash-based invalidation (overflow query strings + timestamps)
R5-H4	High	`force_continue` has no frontend-to-backend plumbing	Full data flow: `_gsForceResume` → POST body → `skip_resolution`
R5-M1	Medium	Topic switch resubmit SSE abort race condition	Explicit abort + null controller before re-enabling UI
R5-M2	Medium	`search_and_summarize()` doesn't accept `skip_resolution`	Add param, forward to pipeline call
R5-M3	Medium	Similarity check definition vague	Jaccard of lowercased word sets, threshold 0.85
R5-M4	Medium	Research planner gets full context it doesn't need	Pass `search_query` directly, no conversation_context param
R5-M5	Medium	Dual cache split: resolution (5) and synthesis (20) different overflow	Cache keyed by `max_exchanges`
R5-M6	Medium	Must use `search_query` variable after resolution in all calls	Explicit variable pattern documented in plan
R5-L1	Low	Build step 4 could crash if research tested before step 8	Add `skip_resolution` param to all functions in step 1
R5-L2	Low	Concurrent thread access race on save + summary	Accepted: `_gsSearching` UI guard prevents in normal use

Post-Round-5 confidence: HIGH