Author

Ali Arbab

Project

01 / StockSaathi

Status

Live in production

Navigate

← projects

§ 01

Pitch

StockSaathi

Behavioural-finance simulator that lets users replay any market crisis on real Yahoo data.

A paper-trading platform that pairs the entire NSE universe and AMFI mutual-fund catalogue with an AI coach designed never to give buy or sell advice. A Time Travel mode replays real Indian market crises — demonetisation, COVID, Adani-Hindenburg — day by day, so a virtual portfolio's held vs panic-sold lines diverge over actual historical closes. Live at stocksaathi.co.in.

Crash replay · COVID March 2020

Nifty 50 · Mar 2 – Apr 13, 2020 · ₹1,00,000 portfolio

HeldPanic-sold day 3

Held

₹73,261

-26.7%

Panic-sold day 3

₹94,560

-5.4%

Cost of panic

-₹21,299

-22.5% more by holding

Pick a crash

Panic dayD+3 of 15

§ 02

Access

Production
stocksaathi.co.in
Live at Vercel3,000+ BSE/NSE stocksIndia
Source
github.com/Ali-Arbab/StockSaathi
Vanilla ES Modules · Python · Supabase · Multi-provider LLM

§ 03

Stack

Vanilla ES Modules
Vercel Python (stdlib-only)
Supabase + RLS + RPCs
Cloudflare Workers (front-door)
Fly.io failover
Gemini 2.5 Flash/Pro (Vertex)
Cerebras Llama 3.3 70B
Anthropic Claude Sonnet 4.5
Yahoo + Tickertape + AMFI
Service Worker + IndexedDB

§ 04

Origin

The hardest part of an AI investing tool turned out to be writing down everything the AI is NOT allowed to say.

Problem

Indian high-school curricula teach the mechanics of money — compound interest, GST, simple budgeting — and skip behavioral finance entirely. Teenagers leave school with formulas but no instinct for how their own brain will sabotage them once real rupees are involved. The market education materials that exist are written for adults, in English, by full-time investors. They read like job training, not learning.

Why me

I'm 17. I've watched the kids around me get their first taste of investing through whatever influencer's reel showed up that morning. The advice ranges from technically wrong to actively harmful. I'm one peer group away from the audience that needs this — not a finance professional translating down, but someone who started where they're starting now.

Learned

Coaches that give buy/sell advice teach dependence; coaches that ask 'what's your thesis on this?' teach reasoning. I rewrote the system prompt seven times before the AI stopped trying to be a tip service. The hardest part of an AI investing tool turned out to be writing down everything the AI is NOT allowed to say.

Four-tier failover, zero pip dependencies, all money in paise.

A Cloudflare Worker bound to stocksaathi.co.in/* intercepts every request, tries Vercel first, falls back to a regional Fly.io origin on 5xx/timeout, and trips a KV-backed 30-second circuit breaker so a single failure doesn't cost every user a 9-second retry. Two Edge JS handlers (/api/chat, /api/ai) are inline-bundled into the Worker so they keep responding even if Vercel and Fly are both down. A Cloudflare Pages deployment mirrors the static tree as the third fallback.

The Python serverless tier has zero pip dependencies — every endpoint is one BaseHTTPRequestHandler subclass per file, stdlib-only. Cold starts are correspondingly tiny because there is no pip install step. The Fly backup origin reuses these handlers unchanged via a 162-line FastAPI shim that imports each handler and replays the BHTRH protocol into a Starlette response. A pytest parity test fails CI if a new handler lands in app/api/ without a matching route in the shim.

Every mutating write goes through one of 16 SECURITY DEFINER PL/pgSQL RPCs. The client never executes raw UPDATE portfolios — the apply_trade RPC takes auth.uid() itself, takes a FOR UPDATE row lock, validates the trade, applies cash + holdings + transactions transactionally, and returns a JSON envelope. Idempotency keys are UNIQUE(user_id, key) so a network-retried POST short-circuits and returns {ok:true, idempotent:true} instead of double-spending.

-- apply_trade locks the row, then conditionally updates only if cash suffices.
PERFORM 1 FROM public.portfolios WHERE user_id = v_user_id FOR UPDATE;
UPDATE public.portfolios
   SET cash_paise = cash_paise - v_value
 WHERE user_id = v_user_id AND cash_paise >= v_value;
IF NOT FOUND THEN RAISE EXCEPTION 'insufficient_cash'; END IF;

Money is stored in integer paise (bigint) end-to-end so portfolio totals never drift via floating-point. numeric(18,6) is used only for fractional MF shares. Instrument fundamentals come through a 4-tier merge (Tickertape ships first because its dividend yield + P/E (TTM) are self-consistent and fresher than Yahoo's consumer-page-derived numbers; Yahoo v10 + v7 fill gaps; Yahoo v8/chart is anonymous last-resort), with a +-joined provenance string written to fundamentals_cache.source so data lineage survives the cache row.

§ 06

AI coach

Deterministic where stakes are high, generative where they aren't.

The coach is built around two tracks that never trade jobs. Track 1 runs nine deterministic bias detectors against every BUY/SELL — panic-sell, FOMO, concentration, sector concentration, disposition effect (Shefrin & Statman 1985), anchoring, churning, pump-chase, overtrading. Each detector is a pure function returning {bias, severity, evidence} | null with explicit numeric thresholds tuned against real NSE volatility (panic-sell fires at ≥5% drop over 3 sessions OR ≥3% intraday, on a holding <21 days at >2% loss). The orchestrator picks a pre-written reflection template, attaches a historical analog (“In the last 12 dips of ≥10% on the Nifty, prices recovered to their prior high in a median of 22 trading days”), and a warning level. The LLM is optional flavour.

Track 2 is the conversational chat surface (“Saathi”) — a tool-use loop with five OpenAI-compatible function tools executed in parallel via Promise.all, with tool responses capped at 4000 chars before re-feeding. The crypto tool bakes the warning into its own response (note: "India: crypto gains taxed at 30% + 1% TDS per trade since 2022") so the guardrail lands in the model's context regardless of whether the prompt remembers to ask for it.

The proxy at /api/chat is a multi-provider fallback chain — Gemini 2.5 Flash-Lite leads (the only Gemini model that's truly non-thinking in json_object mode), then Flash, then Pro, then Cerebras Llama 3.3 70B, then OpenAI. Vertex AI is preferred when GEMINI_VERTEX_PROJECT is set, with region pinned to asia-south1 so latency stays low for Indian users and data stays in-region. Three independent layers of SEBI guardrails — pre-LLM regex, in-prompt rules, and a post-LLM scanner that checks reflection, historical_context, and suggested_q for 25 forbidden phrases.

# SEBI-SAFE GUARDRAILS (ABSOLUTE)
- You can state a current price. That's public info.
- You CANNOT say: "should buy", "should sell", "recommend", "target price",
  "guaranteed", "sure shot", "will go up", "will crash".
- You CANNOT predict future prices, returns, or outcomes.
- If the user asks "should I buy/sell X?" → redirect to a reasoning framework
  (business health, valuation, drawdown tolerance, portfolio fit).
  Do not answer yes/no.

Conversation memory is client-local IndexedDB — the coach_messages table tags every turn with sessionId and surface. A power user can paste their own Anthropic key in Settings, which routes their requests directly to api.anthropic.com with anthropic-dangerous-direct-browser-access: true — no chat data ever leaves their device.

§ 07

Time travel

The LLM picks the story. Yahoo data is the truth.

Pick (or invent) a market crisis; watch a ₹1,00,000 portfolio split into a held line and a panic-sold-on-day-3 line over real historical closes. Three curated scenarios ship — COVID March 2020, GFC 2008, demonetisation 2016 — with educator-tone narration. Anything else routes through the custom-crash generator, a three-phase grounded pipeline:

Phase A — pick dates and ticker. Gemini Flash with a colloquial-to-formal mapping table identifies any Indian market event from any phrasing: “the soap guy scam” → Nirav Modi / PNB, “demon” → demonetisation, “yes guy” → YES Bank moratorium, “the short seller thing” → Adani-Hindenburg.
Phase B — fetch real historical data. No LLM. Yahoo via /api/ai?op=history. Primary symbol fires in parallel with up to three companion sector indices so Phase C can write relativity-aware narration.
Phase C — narrate over real numbers. JSON output with title, key moments, recovery days. Every numeric field is then overwritten with the Yahoo-real value before persistence — the LLM's job is the story, the numbers are facts.

// Overwrite any hallucinated numbers with the REAL ones. The LLM's
// numbers are a sanity cross-check; the real-data numbers are truth.
meta.startIndex = Math.round(startIdx * 100) / 100;
meta.troughIndex = Math.round(troughIdx * 100) / 100;
meta.endIndex = Math.round(endIdx * 100) / 100;
meta.troughDay = troughDayIdx;
meta.indexDrop = Math.round(realDropPct * 10) / 10;

Scenarios are shareable across users by URL. The cache key is SHA-256 of an aggressively normalised prompt — split letter/digit runs, lowercase, non-alnum to spaces, tokenise, dedupe, sort, hash. So "Adani Hindenburg 2023", "hindenburg 2023 adani", and "AdaniHindenburg2023" collapse to one cache row, one shareable URL, one LLM cost amortised across every user who follows the link.

§ 08

Universe

Live · NSE15:14:32 IST

RELIANCE1,247.30TCS4,089.50INFY1,856.75HDFCBANK1,634.20SBIN821.45WIPRO542.80

2,686 NSE equities and 13,969 mutual funds, refreshed before market open. AMFI India

The instrument universe is built daily by two pure-Node (zero deps) scripts. The equity build fetches the NSE master CSV, 17 NIFTY index constituent CSVs, and the NSE ETF API, warming a per-host cookie jar with browser-like Sec-Fetch-* headers because both NSE and NiftyIndices 403 anything else. A 27-value sector taxonomy is derived through a layered pipeline (NSE's industry tag → sectoral overlays → 100-line keyword regex for the long-tail ~1,500 small-caps). NIFTY index membership is packed into a 5-bit field per instrument, driving cap-bucket and risk-tier classification.

The MF build parses AMFI's proprietary semicolon-delimited NAVAll.txt (~17,000 raw rows, ~14,000 unique scheme codes), extracts AMC + category + plan + option from interleaved category headers, rolls 47 SEBI sub-categories into 7 buckets (Equity / Debt / Hybrid / Index / Solution / Commodity / FoF), and maps each fund to a benchmark. Output ships as content-addressed immutable JSON with a Brotli-q11 sidecar. A vercel.json rewrite swaps to .br when Accept-Encoding contains br.

Live quote caching is market-hours-aware: is_market_open_ist() drives both the Supabase TTL and the edge Cache-Control value — 5 seconds while NSE is open (Mon-Fri 09:15-15:30 IST), 300 seconds while closed. Eight cron windows daily refresh fundamentals, instruments, and MF NAVs; crons self-bail at 50 seconds against Vercel's 60-second maxDuration cap and resume from the next paginated ?offset= on the next firing.

§ 09

Polish

800ms-debounced hottest-200 LRU cache
The persisted-quote cache (ss.quotes.v3) writes only the 200 hottest symbols by last-access timestamp. A one-shot legacy migration deletes pre-split Reliance values from older keys so nobody paints with stale 2024-era numbers after a service-worker refresh.
Server-time anchor via performance.now()
The market-status badge can't be spoofed by changing the system clock. The anchor is (performance.now() at sync midpoint, server's epoch ms); serverNow() adds the monotonic delta. Flipping your laptop to '9:30 AM IST Sunday' cannot fake 'Market Open'.
Intervention modal with 3-second read delay
Before a panic-sell, the user sees historical recovery analog data. The 'Sell anyway' button counts down 3-2-1 before enabling. Escape and overlay-click are no-ops by design — Escape shakes the Hold button instead of closing.
Idempotency-key-based atomic apply_trade RPC
transactions.idempotency_key is UNIQUE(user_id, key); a deterministic key generated once per click is held across retries. The PL/pgSQL function row-locks the portfolio, conditionally decrements cash, upserts holdings, and inserts the transaction in one transaction.
Chunked render with RAF yield
The MF browser paints 13,969 cards in 200-card chunks with requestAnimationFrame yields between batches; 'Tab not responding' never fires. Two cooperating IntersectionObservers (200% rootMargin to hydrate, 600% to dehydrate back to skeleton) keep the DOM bounded.
Limit-order matcher with market-closed guard
AMOs ghosted because the matcher ran every 12s regardless of market state and trivially 'filled' against stale after-hours closes. Fix: if (!marketStatus().open) return { skipped: 'market_closed' }; before any matcher pass.

§ 10

Honest limits

Yahoo NSE data is officially 15-minute delayed. The “LIVE” badge flips to “DELAYED” via the staleAgeMinutes flag when Yahoo's own ts is older than 5 minutes during market hours. The product calls itself a paper-trading simulator, never a real-time tick feed.
The coach never gives buy/sell advice. The decision is structural, not stylistic — the prompt forbids it, the output filter blocks 25 forbidden phrases, the deterministic detectors carry the regulatorily sensitive output.
SELL-side limit orders don't reserve quantity. The schema comment is candid: “multi-order users can oversell. For the pitch scale this is acceptable.”
NSE 2026 holiday list is hardcoded. NSE doesn't expose a public holiday API; the file flags itself for annual update.
No real money. No real trades, no real advice — the legal posture that lets a teen-targeted app sidestep India's investor-suitability regulations.

§ 11

Numbers

2,686
NSE equities + ETFs
13,969
AMFI mutual funds
4-tier
upstream failover
4-tier
fundamentals fallback
9
deterministic bias detectors
3
SEBI guardrail layers
8
cron windows daily
5s / 300s
market-open / closed TTL
30s
circuit-breaker window
0
pip dependencies
16
atomic SECURITY DEFINER RPCs
paise
all money as bigint