Competitive gap analysis → 7-phase build plan + critical fix · 5x adversarial-tested
| Phase | Theme | Effort | Status |
|---|---|---|---|
| 0 | Context + Config Surface | 2–3 sessions | Complete (s204) |
| 1 | Polish & Quick Wins + JS Extract | 2–3 sessions | Complete (s205) |
| 2 | Voice Output (TTS) | 2–3 sessions | 2A-2D built, browser testing pending |
| 3 | Agentic Deep Research | 4–6 sessions | Complete (s205–s206) |
| CF | Follow-Up Query Resolution (Critical Fix) | 2–3 sessions | Complete (s210–s211) |
| 4 | Visual & Rich Content | 2–3 sessions | 4B+4C done, 4A in progress (s212) |
| 5 | Advanced Organization | 3–5 sessions | Not Started |
| 6 | Specialty Search | 1–2 each | Not Started |
| 7 | Tabular Data & Spreadsheets | 3–5 sessions | Not Started |
| 8 | Context Intelligence | 3–5 sessions | Not Started |
Build order: Phase 0 → 1 → 3 → 2 → CF (Critical Fix) → 4 → 5 → 6. CF jumps the queue — follow-up queries are fundamentally broken without query resolution.
GriSearch sends every query to three independent search providers simultaneously, merges and deduplicates results, then re-ranks the unified pool. This is the same multi-retrieval pattern used by Perplexity, Google AI Mode, and ChatGPT search.
| Provider | Strength | Index | Latency |
|---|---|---|---|
| Brave | Fastest latency, strong keyword precision, independent 30B+ page index | Own | ~670ms |
| Exa | Semantic understanding, spam filtering, high-signal authoritative content | Own | ~2s |
| Parallel | Strong accuracy-to-cost ratio, independent ranking perspective | Own | ~5-14s |
| Benefit | Mechanism |
|---|---|
| Better recall | Three indexes catch what one misses |
| Better precision | Cross-provider agreement filters noise |
| Resilience | If one API goes down, the other two still work |
| Speed | Async fan-out = as fast as the fastest provider (with timeouts) |
| No vendor lock-in | Can swap providers without rewriting the system |
| Quality signal | Dedup overlap acts as an implicit relevance vote |
Benchmark data (2025-2026) shows the top 4 search APIs are statistically indistinguishable on quality individually. The winning strategy is to use multiple providers and let the combination outperform any single one.
GriSearch core was built across s197–s200 before this expansion plan was created.
s201: Plan page, /plans index, inbox replay fix, config schema (38 keys), settings cleanup, thread archival + health logging, defaults unified.
s204: Phase 0 complete (0A-0D). Per-mode context scaling, XML-tagged exchanges, full config surface + tuning panel, metrics logging + rolling averages, limit warnings, mode badges, Opus option, table rendering, JS extraction to static file, validation hooks.
s205: Phase 1 complete (1A-1D). Phase 3A shipped. Progress indicators, export, result previews, thread context menu, project creation. Research mode: Sonnet planner, Haiku quality gate, multi-step loop, cost controls, Sonnet synthesis.
s206: Phase 3B-3D shipped. GraySearch → GriSearch rebrand. Research timeline table, stop & summarize, structured report cards, "Dig deeper" buttons, first-use explainer. Persistent research data. Cloudflare Pages deploy. Citation table UI.
s208: Citation numbering fix. Collapsible sources list. Spine crash root cause fixed (importlib.reload memory leak). Health logging added.
Replace the single max_exchanges=5 / 600 char truncation with per-mode strategy. Current usage is 1.6–8.2% of the 200K context window.
| Mode | max_exchanges | answer_truncation | Token Budget |
|---|---|---|---|
| Quick | 5 | 800 chars | ~1,000 tokens |
| Deep+Summary | 10 | 2,000 chars | ~5,000 tokens |
| Deep+Full | 20 | 4,000 chars | ~10,000 tokens |
| Research | 30 | No truncation | ~15,000 tokens |
get_thread_context() to accept max_exchanges + max_answer_chars params (s204)Upgrade from plain User:/Assistant: to XML-tagged exchanges with mode and citations.
<exchange n="1" mode="quick"> <query>Best espresso machine under $500?</query> <answer>The Breville Barista Express...</answer> <sources><url>https://example.com/review</url></sources> </exchange>
Centralize all tunable limits. Code defaults in git-tracked settings.yaml. Browser-written overrides in .gitignored config/grisearch_tuning.yaml. Runtime merges both, tuning takes precedence. Config snapshot pattern prevents mid-search TOCTOU races.
| Group | Keys | Examples |
|---|---|---|
| Context Limits | 8 | max_exchanges, max_answer_chars per mode |
| Models | 6 | synthesis model per mode, planner, quality gate |
| Token/Cost | 7 | max_tokens per mode, cost ceiling, Brave rate limit |
| Research Agent | 4 | max_rounds, sub_questions, wall time, concurrency |
| Search Providers | 8 | timeouts, max results, max extract pages |
| Auto-Brief | 4 | exchanges/thread, truncation (normal vs research) |
| Thread Health | 1 | size warning threshold (KB) |
settings.yaml under grisearch: (s204)config/grisearch_tuning.yaml (.gitignored) for browser overrides (s204)_get_settings(): merge defaults + tuning overridesSettings() no-arg call from _get_settings()GRISEARCH_CONFIG_SCHEMA (38 keys, 8 groups) as single source of truth_get_settings() once, pass cfg downstream (s204)cfg.get() fallbacks reference _default() from schemaPath(__file__).parent.parent with env var (s204)In-browser config editor on the GriSearch page. Gear icon opens settings panel. Both API endpoints behind web auth. Cost previews labeled as estimates with tooltip caveat.
| Config Type | Control | Example |
|---|---|---|
| Integer limits | Slider + stepper | 5 [---o-----] 30 |
| Cost ceilings | Stepper ($0.05) | $0.50 [-] [+] |
| Model selection | Dropdown | [claude-haiku-4-5 v] |
| 0 = unlimited | Toggle + stepper | [x] Limit: 4000 |
GET /api/search/config returns config + schema metadata (s204)POST /api/search/config validates + merges overrides into tuning YAML (s204)POST /api/search/config/reset clears all overrides (s204)log.info for model/mode in synthesize() (s204)Dict[tuple, Any] type annotation (s204)_REPO_ROOT = Path(__file__).parent.parent (s204)save_exchange()get_stack_status includes thread health summary (deferred — operational tooling)save_exchange() (atomic, no race conditions)load_thread_full() for complete history (deferred — no threads near archive threshold)Roadmap: Per-exchange storage (solution C) if archival proves insufficient.
GET /api/search/thread/{id}/export?format=md (s205){title}_{date}.md (s205)format=html, print-friendly (deferred — Markdown covers the need)Platform UX: Desktop: browser download. Mobile (Safari): navigator.share() with fallback.
expanding: yield sub-queries as detail line (s205)reading: yield URLs, show unique domains (s205)searching: show providers ("Searching Brave + Exa...") (s205)Prerequisite for Phase 2+. views/search.py = 1,113 lines of double-brace-escaped JS in Python template strings.
static/js/search.js (s204)validate_views.py (s204)createScriptProcessor → AudioWorkletNode (deferred — Phase 2 prerequisite)Complete the voice loop. SSE with base64 audio chunks (proven tunnel-compatible).
lib/tts.py — Deepgram Aura-2 REST streaming (s208)audio_start/chunk/done events (base64 MP3) (s208)_gsTTSPlaying gate on mic start + audio processor, configurable delay (500ms default, 800ms for Bluetooth) (s212)enumerateDevices(), extends ducking delay (s212)_strip_for_tts) (s208)_table_to_prose) for natural reading (s212)audio_done.truncated flag + muted UI notice (s212)Multi-step autonomous research via non-streaming wrappers over existing search functions.
User query → [Planner/Sonnet] → [Quality Gate/Haiku] → [Research Loop] → [Synthesizer] → [Structured Report]
search_and_summarize(): consumes async generator, returns dict (s205)asyncio.Semaphore)research_max_brave_calls (default 20) (s205)asyncio.Event (s206)research_concurrent_limit (s205)beforeunload)research_progress event (step, total, sub_question, status) (s205)POST /api/search/research/cancel (s206)mode: "research" (s205)Roadmap: Separate brief section (C) after evaluating real output.
.gs-report, 95% width, green border) (s206)Discovered: s210 (2026-04-07). Follow-up queries in threads produce garbage search results because query expansion has no access to conversation history. Every major AI platform rewrites follow-ups before searching — GriSearch did not. Effort: 2–3 sessions. Pressure tested: 2 rounds, 28 findings, all resolved.
conversation_context is in scope but not forwarded to query expansionexpand_query(), _research_plan(), get_thread_context(), all 4 search modes, SSE endpointsynthesis_model_resolution (Sonnet default), enable_follow_up_resolution, resolution exchange/char limitsskip_resolution param to all 3 search functions + search_and_summarize()resolve_follow_up(): structured JSON return with topic_switch detection, robust 5-step JSON parsing fallback chainsearch_deep_full using search_query variable pattern (original bug trigger)search_quick and search_deep_summary[original, resolved] + expand(resolved) — original as safety net per Google/Elastic patternsynthesize() (not raw original)ensure_context_summary() async function (keeps get_thread_context() synchronous)max_exchanges, hash-based invalidation (handles archival, deletion, mode switches)get_thread_context() reads cached summary, prepends <context_summary> blockquery_resolved SSE event + emerald green .gs-msg-resolved block ("INTERPRETED AS")topic_switch_detected SSE event + amber .gs-msg-switch promptforce_continue data flow: _gsForceResume flag → POST body → skip_resolutionsave_exchange(): persist resolved_query field (omit when empty)resolved_query with backward compat guardsearch_query to _research_plan() directly (no conversation_context needed — resolved query is self-contained)skip_resolution=True via search_and_summarize()resolution_input_tokens, resolution_output_tokens, resolution_cost_usd in metrics + answer meta lineRoadmap (not in this build): Embedding-based topic switch detection (cosine similarity). Error accumulation monitoring (resolution quality tracking over long threads).
Full spec: ~/.claude/plans/grisearch-follow-up-context-fix.md
_search_brave_images() + ImageResult model (s212)image_results event with thumbnail grid (3-col desktop, 2-col mobile) (s212)search_location + search_country_code config keys in tuning panel (str type) (s212)country param wired to all 3 search pipelines (quick, deep_summary, deep_full) (s212)GET /api/search/corpus with scoringthread_order in groups.json)group_order in groups.json)POST /api/search/groups/{id}/reorder + POST /api/search/groups/reorderReuses existing api_search_group_assign for moves — no new backend for basic group assignment. Sort order APIs are new. Desktop-only initially; mobile DnD (long-press or polyfill) evaluated after desktop ships.
Theme: Accept, analyze, transform, and export structured data across all modes. Dedicated Data mode for analysis-heavy workflows. Effort: 3–5 sessions.
openpyxl)openpyxl)Chart generation, Google Sheets integration, SQL-like queries, pivot table builder, data persistence across sessions.
Theme: Make GriSearch progressively smarter about user preferences and research patterns. Active interviews + passive extraction + enhanced auto-briefs. Effort: 3–5 sessions.
search_preferences.mdsearch_preferences.mdPreference confidence scoring, conflict detection, onboarding interview, preference analytics dashboard.
Theme: Persistent, organized, searchable uploads that survive across sessions and threads. The #1 pain point with ChatGPT is upload amnesia — documents tied to a single conversation and forgotten next session. Effort: 3–4 sessions. Dependencies: Phase 4A (image search UI), Phase 5A (cross-thread search).
Data/system/search/shared_uploads/)GET /api/search/admin/storage (per-user usage summary)@suntsu-contract what are the termination clauses?<document_context>accept="image/*" capture="environment")Open questions: Document versioning (replace vs coexist), large document chunking (semantic chunk selection for 50+ page contracts), shared library moderation at 30 users, extraction cost at scale (local OCR fallback vs Claude vision), cross-deployment portability.
Initial confidence: MEDIUM. All addressed.
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| C-1 | Critical | 1A is a non-issue | Reduced to logging + cleanup |
| C-2 | Critical | Can't reuse search generators | Non-streaming wrappers in 3A |
| C-3 | Critical | WebSocket TTS won't work through tunnel | Switched to SSE-first |
| R-1 | Risk | Research blocks uvicorn worker | Background create_task |
| R-2 | Risk | Inline JS at breaking point | Added 1E: JS extraction |
| R-3 | Risk | Recurring search unbounded cost | Cap 5 watches, Quick only |
| R-4 | Risk | Image upload lifecycle missing | Path, retention, max size defined |
| R-5 | Risk | Haiku planner poor quality | Sonnet + quality gate |
| G-1:4 | Gap | API degradation, Dict, index, ducking | All addressed in respective phases |
| Q-1:3 | Question | Phase 6 order, export UX, build order | All resolved |
Post-Round-1 confidence: HIGH
All addressed in s201 review with RG.
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| C-1 | Critical | Token budget undercounts | Full budget + cost logging first |
| C-2 | Critical | No research cost cap | Semaphore, ceiling, confirmation |
| R-1 | Risk | createScriptProcessor deprecated | Migrate in 1E |
| R-2 | Risk | SSE audio may buffer | 200-500ms chunks, tunnel test |
| R-3 | Risk | Thread files unbounded | Monitoring (A) + archival (B), C roadmapped |
| R-4 | Risk | Research no lifecycle | Registry, cancel, persist partial |
| G-1:4 | Gap | BT latency, model deprecation, JS risk, Path fix | All addressed |
| Q-1:2 | Question | Brief weighting, branch counter | A+B weighting, skip counter on branch |
Post-Round-2 confidence: HIGH
Post-0C/0D additions. All addressed in s201.
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| R3-1 | Critical | settings.yaml git-tracked; browser writes = merge conflicts | Separate .gitignored tuning file |
| R3-2 | Critical | No config caching; TOCTOU race mid-search | Config snapshot pattern per pipeline |
| R3-3 | Risk | _get_settings() dead broken code | Remove dead Settings() call |
| R3-4 | Risk | Archival race with save_exchange() | Archival inside save (atomic) |
| R3-5 | Risk | Defaults scattered across code + schema | Schema dict as single source of truth |
| R3-6 | Risk | Cost preview impossible to compute accurately | Label as estimates with caveat tooltip |
| R3-7 | Gap | Config API needs auth | Behind web auth middleware |
| R3-8 | Gap | File KB poor proxy for context usage | Primary metric: exchange count |
| R3-9 | Gap | Cost ceiling slider unbounded | Schema max=$5.00 |
| R3-10 | Gap | A+B brief weighting undefined | Defined inline (4 exch, 800 char) |
| R3-11 | Gap | AudioWorklet migration underscoped | Worklet file + MIME + extra time noted |
| R3-12 | Question | Config change hits in-flight search | Covered by R3-2 snapshot |
| R3-13 | Question | Model keys for unbuilt features confusing | Hide until feature ships |
| R3-14 | Question | Effort unchanged after Phase 0 doubled | Revised: 19-27 sessions total |
Post-Round-3 confidence: HIGH
Adversarial review of the follow-up query resolution plan. All addressed.
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| R4-C1 | Critical | Haiku too weak for query resolution (Quick/Deep+S default) | Dedicated synthesis_model_resolution config, default Sonnet |
| R4-C2 | Critical | Raw query to synthesis creates semantic mismatch with resolved search results | Pass resolved query to synthesize() |
| R4-H1 | High | Token explosion in Research resolution (30 exchanges, unlimited chars = 22K tokens) | Resolution-specific truncation: 5 exchanges, 800 chars |
| R4-H2 | High | Research sub-loop accidentally triggers resolution on sub-questions | skip_resolution=True flag on sub-loop calls |
| R4-H3 | High | No handling of unrelated topics in existing threads | Topic switch prompt + user choice UX (continue/new thread) |
| R4-M1 | Medium | No suppression of "INTERPRETED AS" for similar queries | Jaccard similarity >0.85 suppresses display |
| R4-M2 | Medium | expand_query() doesn't need full conversation_context | No changes to expand_query — resolved query is sufficient |
| R4-M3 | Medium | Research final synthesis doesn't need full history | Compact preamble only |
| R4-M4 | Medium | Quick mode latency concern (~500-800ms) | Accept: correct results > fast garbage |
| R4-M5 | Medium | Need resolved_query capture pattern in pages.py | Initialize alongside collectors, use or None |
| R4-L1 | Low | Config toggle needed | enable_follow_up_resolution boolean |
| R4-L2 | Low | Resolution cost not tracked in metrics | Added to metrics dict + answer meta line |
| R4-L3 | Low | Old threads missing resolved_query field | Simple if guard in history rendering |
| R4-L4 | Low | save_exchange signature underspecified | resolved_query: str = "", omit when empty |
| R4-M6 | Medium | Rolling summarization needed for long threads | Summary-beyond-window for all modes with per-mode hard caps |
Post-Round-4 confidence: HIGH
Second adversarial pass after incorporating Round 4 fixes. Found subtle interaction effects. All addressed.
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| R5-C1 | Critical | get_thread_context() is sync; adding async LLM call inside crashes | Split: ensure_context_summary() async + get_thread_context() stays sync |
| R5-H1 | High | LLM JSON output parsing has no fallback for malformed responses | 5-step fallback: strip fences, json.loads, regex extract, validate keys, default |
| R5-H2 | High | _gsGroupId doesn't exist in JS frontend | Just clear _gsThreadId; group derived server-side from thread |
| R5-H3 | High | Summary cache breaks when exchanges archived | Hash-based invalidation (overflow query strings + timestamps) |
| R5-H4 | High | force_continue has no frontend-to-backend plumbing | Full data flow: _gsForceResume → POST body → skip_resolution |
| R5-M1 | Medium | Topic switch resubmit SSE abort race condition | Explicit abort + null controller before re-enabling UI |
| R5-M2 | Medium | search_and_summarize() doesn't accept skip_resolution | Add param, forward to pipeline call |
| R5-M3 | Medium | Similarity check definition vague | Jaccard of lowercased word sets, threshold 0.85 |
| R5-M4 | Medium | Research planner gets full context it doesn't need | Pass search_query directly, no conversation_context param |
| R5-M5 | Medium | Dual cache split: resolution (5) and synthesis (20) different overflow | Cache keyed by max_exchanges |
| R5-M6 | Medium | Must use search_query variable after resolution in all calls | Explicit variable pattern documented in plan |
| R5-L1 | Low | Build step 4 could crash if research tested before step 8 | Add skip_resolution param to all functions in step 1 |
| R5-L2 | Low | Concurrent thread access race on save + summary | Accepted: _gsSearching UI guard prevents in normal use |
Post-Round-5 confidence: HIGH