About Sentinel

A quant intelligence platform for public‑company monitoring.

Sentinel detects material changes across 25 US large-cap companies using 21 public data sources and 13 quantitative analysis modules. A live report updates as new data lands; a 06:00 UTC snapshot freezes each weekday for durable permalinks. Every number carries signal provenance and a confidence interval.

25Companies

21Data sources

13Models

liveUpdated continuously

Recently shipped

What's new under the hood.

The last few weeks brought three substantial upgrades: the Policy Trade Tracker's noise floor dropped ~70% via formal statistical gating, the report view became a live canonical URL with date navigation, and the dashboard became a command surface rather than an email archive.

Phase 1 · noise filters

Policy Tracker statistical hardening

GDELT tightened to attribution-first (WH + Truth Social + headline-attribution only). SHA-256 content-hash dedup across sources. Ticker confidence floor 50 → 75 with a hardcoded ETF whitelist; lower-confidence extractions land in a separate audit table. Retry-on-failure replaced permanent tombstoning. Beta-corrected return z-scores (OLS residualized against SPY over a 20-day baseline). Benjamini-Hochberg FDR at q=0.10 across every (ticker × signal) pair. Wallet pre-positioning gates raised to $25k minimum + 3 overlapping tokens + MiniLM cosine similarity ≥ 0.35. Null-hypothesis verdict shipped as a first-class output.

Phase 2 · novel accuracy

Event-study CAR + synthetic control + adversarial narrative

Canonical finance event-study methodology: OLS market-model fit on a 100-trading-day estimation window, abnormal returns summed over the [-1, +3] event window, formal t-stat and p-value per (statement, ticker). Sector-peer synthetic control adds an orthogonal second layer — we flag moves that are idiosyncratic, not sector-wide. Pre-registered stance predictions are stored at ticker-extraction time and resolved against CAR after 5 trading days — a 90-day calibration curve (Brier score, hit-rate by conviction bucket) gates email promotion. Polymarket pre-statement leakage scoring and Polymarket↔Kalshi cross-venue arb divergence added as independent signals. Narratives now run through a three-LLM adversarial-consensus workflow (Sonnet pro-signal, GPT-4o-mini coincidence-advocate, Haiku arbiter) and publish only when the arbiter sides with the suspicion case AND at least three independent FDR-passed signals are present.

Track B · live report + dashboard

Canonical /report URL, date-scrubbing, dashboard overhaul

The old "latest report" resolver — which silently rolled back to the least-fresh of five signal tables — was replaced by a job-watermark model. Four canonical daily jobs UPSERT a completion marker; the report watermark is MIN(completed_for_date) across them, so the page never shows a half-rendered day. /report is now a single canonical URL that always resolves to the current watermark; /report/<date> is the historical permalink. A topbar date picker (←/→ keyboard, bounded by earliest snapshot + watermark) handles navigation, and a partial-coverage banner names which signals are missing on weekends or pipeline-failure days. The dashboard was rewritten to five rows: system-status strip (per-job freshness + tracker LLM budget), 30-day consensus-anomaly column chart, watchlist strip, a priority-sorted 24h signal feed, and the demoted email archive.

Testing + hardening

Exhaustive test coverage

~200 tests across 12 files cover the new math (event study, synthetic control, FDR, content-hash, prediction resolver), watermark DB integration, schema assertions, Dagster import smoke, Flask route surface, and a migration-sequence harness gated on a throwaway Postgres. CI runs pytest with coverage on every PR.

Methodology

The math, in plain English.

Every score in the report is computed by one of thirteen quantitative modules. Each has a canonical academic basis, a pre-specified threshold, and a citation we can point to when asked "why does this number say that?" This section explains each module three ways: what it is in plain English, what the formal definition looks like, and where it appears in the product.

Baseline

1. Cross-sectional z-score

The workhorse of anomaly detection. A z-score measures how many standard deviations a value sits above or below the historical mean for that metric — "today's reading is 2.3σ above its 20-day average, which is noticeably unusual but not extreme." A reading above ±2σ is roughly a 5% probability event under a normal distribution; above ±3σ is ~0.3%.

z = (x − μ) / σ where μ, σ are the mean and sample stdev over a 20-day rolling baseline

Used as the first-pass signal on volume spikes, return shocks, options flow, dark-pool share, short-interest deltas. Caveat: raw z-scores are sensitive to market-wide moves (a Fed day drives every stock's z simultaneously), so for the policy tracker we layer two stricter variants on top.

Market-model residualized

2. Beta-corrected return z-score

A z-score that first subtracts out the piece of today's return explained by the overall market. If SPY is up 1% and a high-beta stock is up 1.5%, most of that move is "beta" (riding the market), not "alpha" (stock-specific news). We fit a linear regression over a 20-day baseline and measure how far today's return is from what the model predicted.

r̂ᵢ,t = α + β · r_SPY,t (OLS, 20-day window)
abnormal_return = rᵢ,t − r̂ᵢ,t
z_beta = abnormal_return / stdev(baseline_residuals)

Replaces the raw return z-score in the Policy Trade Tracker. Kills most Fed-day and sector-wide false positives. Falls back to raw z when SPY data is unavailable.

Canonical finance methodology

3. Event-study CAR (cumulative abnormal return)

The formal version of "did this event move the stock?" We fit a market model over a long estimation window (100 trading days, weeks before the event). We then predict what the stock should have done over the event window ([-1, +3] trading days) given how the market moved, and sum up the deviations. This is the technique used by SEC enforcement and forensic-finance academic papers.

Estimation window: rᵢ = α + β · r_m + ε on [T−120, T−21]
Event window [T−1, T+3]: CAR = Σₜ (rᵢ,t − (α + β · r_m,t))
t-stat = CAR / √(L · σ²_ε) where L is event-window length
p-value = 2 · (1 − Φ(|t|)) two-sided normal approx (valid for n_est ≥ 60)

Every (Trump statement × exposed ticker) pair gets a CAR, t-stat, and p-value. Rows with p ≤ 0.10 enter the Benjamini-Hochberg FDR step below. See sentinel/forensic/event_study.py.

Counterfactual

4. Sector-peer synthetic control

Event-study CAR removes the market-wide move, but sector rotations can still inflate it — a tariff announcement lifts every oil name. Synthetic control asks a sharper question: "Did this ticker move idiosyncratically, or did its entire sector move with it?" We compute CAR for ≥10 sector peers (with no Trump exposure) and compare the treated ticker's CAR to the peer distribution. A 2-standard-deviation deviation from the peer mean is a strong idiosyncratic-move signal.

synth_CAR = mean(peer_CARs)
deviation = treated_CAR − synth_CAR
z_synth = deviation / stdev(peer_CARs)
p_synth = 2 · (1 − Φ(|z_synth|))

Equal-weight peer average instead of full Abadie-Diamond-Hainmueller QP — a simplification we accept because the peer pool is small and the refinement is ~5–10% in practice. See sentinel/forensic/synthetic_control.py.

Multiple-testing correction

5. Benjamini-Hochberg false discovery rate

If we run 100 statistical tests and accept anything with p < 0.05, we expect 5 false positives by pure chance. Benjamini-Hochberg fixes this: it sorts all p-values ascending, and for each one asks "is this p-value small enough to keep, given how many tests we ran?" The result is a set where the expected proportion of false positives among the "passed" group is bounded by q (we use q = 0.10).

Sort p-values ascending: p_(1) ≤ p_(2) ≤ … ≤ p_(n)
Find largest k such that p_(k) ≤ (k / n) · q
Reject H₀ for tests 1..k (pass FDR)

Applied across every (ticker × signal) pair within a single Trump statement. A 2-of-N consensus-count gate runs in parallel, so a row passes if EITHER it clears BH-FDR OR at least two independent signals fire. Stops a single statement from producing 12 spurious "suspicious" flags.

Probabilistic conviction

6. Bayesian conviction score

Classical statistics asks "how unlikely is this if nothing's going on?" — p-values. Bayesian inference asks the reverse: "given what I've seen, how likely is it that something is going on?" We start with a prior probability of "material change" for each company, then update using log-likelihood ratios from each signal. The output is a posterior probability in [0, 1] plus a breakdown of which signals contributed how much.

posterior = prior · ∏ᵢ LR(signalᵢ) / (prior · ∏ᵢ LR(signalᵢ) + (1 − prior))
where LR(signalᵢ) = P(signalᵢ | H₁) / P(signalᵢ | H₀)

Computed daily per company and stored in bayesian_scores. The UI shows posterior probability + a contribution table so you can see which signals drove the number up or down. See sentinel/bayesian_score.py.

State-space smoothing

7. Kalman-filtered health score

A company's "true" underlying health is hidden — we see only noisy daily signals (price, sentiment, filings, flows). The Kalman filter models health as an unobserved state that drifts over time and uses every new observation to refine its best estimate, along with a confidence interval that widens when data is sparse. Think of it as a moving average that trusts stable stretches more than spiky ones.

State: xₜ = F · xₜ₋₁ + wₜ (process model, drift + noise)
Observation: zₜ = H · xₜ + vₜ (what we actually measure)
Estimate updated via Kalman gain: x̂ₜ = x̂ₜ₋₁ + K · (zₜ − H · x̂ₜ₋₁)
Confidence interval: 95% = x̂ₜ ± 1.96 · √Pₜ

Produces health_score (0-100) + confidence_lower / confidence_upper. A widening CI signals degraded input coverage — you can see when the score is getting less trustworthy. See sentinel/kalman_filter.py.

Consensus detector

8. 4-layer anomaly ensemble with 2/4 consensus

Any single anomaly detector generates false positives. We run four orthogonal ones and only flag a "consensus anomaly" when at least two of the four agree. The four layers look at different things: Mod-Z (robust z-score on the median), ECOD (empirical CDF outlier score), CUSUM (cumulative-sum drift detection), and COPOD (copula-based tail probability). Each has a different failure mode, so two-of-four consensus dramatically cuts false positives.

Mod-Z: 0.6745 · (x − median) / MAD > 3.5
ECOD: outlier score from marginal ECDFs, threshold ~0.98
CUSUM: Sₜ = max(0, Sₜ₋₁ + xₜ − μ − k); alarm at Sₜ > h
COPOD: tail probability from empirical copula, threshold ~0.98
consensus_flag = (count of layers triggered ≥ 2)

The dashboard calendar + heatmap fire on consensus_flag, not individual layers. Per CLAUDE.md this reduces false positives "dramatically" vs single-detector approaches. See sentinel/anomaly_detection.py.

Structural break detection

9. PELT change-point detection

Anomaly detectors flag individual spikes; change-point detectors flag regime shifts — moments where the time series' statistical properties change permanently. PELT (Pruned Exact Linear Time) searches for the optimal set of breakpoints that minimize within-segment variance plus a penalty per new breakpoint. When PELT finds a breakpoint 15 days into a 60-day window, something structurally different happened that day.

argmin_{τ} Σᵢ cost(segmentᵢ) + β · |τ|
where τ is the set of breakpoints, cost is per-segment RSS, β is the penalty

Run on health-score / sentiment / volume series; stores breakpoints in change_points. Used to flag "this isn't just a bad day — it's a different regime." See sentinel/change_point.py using the ruptures library.

Hidden-state classification

10. Gaussian HMM regime detection

Markets alternate between "calm" and "stressed" regimes with predictable differences in volatility and auto-correlation. A 3-state Gaussian Hidden Markov Model classifies each day by inferring which regime the observed returns are most consistent with. Knowing the regime matters — many signals carry different information in calm vs. stressed markets.

Hidden state Sₜ ∈ {calm, transitional, stressed}
Observation: rₜ | Sₜ ~ Normal(μ_{Sₜ}, σ²_{Sₜ})
Transition: P(Sₜ | Sₜ₋₁) fit via Baum-Welch (EM)
Most-likely state via Viterbi

Outputs regime_states.regime + regime_probability per company per day. The IC-weighted score uses it to switch signal weights conditional on the regime. See sentinel/regime_detection.py.

Adaptive ensemble

11. 30-day rolling IC-weighted blending

Rather than giving every signal equal weight, we measure each signal's information coefficient (IC) — its empirical correlation with future returns over a 30-day rolling window. Signals that have been predictive recently get more weight; signals that have gone stale get less. The composite becomes self-tuning.

ICᵢ,t = corr(signalᵢ,t₋₃₀:t, returnst₋₂₉:t+₁)
weightᵢ,t = max(0, ICᵢ,t) / Σⱼ max(0, ICⱼ,t)
composite_t = Σᵢ weightᵢ,t · normalize(signalᵢ,t)

Produces the main Sentinel Score displayed on the report card. Weights are visible in the "signal contributions" sub-panel so you can see what's currently driving the number. See sentinel/ic_weighted_score.py + sentinel/composite_score.py.

Lead-lag testing

12. Granger causality

Granger causality asks whether past values of series X help predict future values of series Y beyond what Y's own history already predicts. A positive result doesn't prove causation in the philosophical sense but it rules out the easy alternative ("Y was always going to do this anyway"). We use it to test whether pre-statement Polymarket price drift helps predict post-statement equity moves — i.e., whether someone knew.

Unrestricted model: yₜ = α + Σ βᵢ · yₜ₋ᵢ + Σ γⱼ · xₜ₋ⱼ + εₜ
Restricted model: yₜ = α + Σ βᵢ · yₜ₋ᵢ + εₜ
F-test: reject H₀ if Σ γⱼ = 0 is implausible

Signal-by-signal validation in signal_ic_weights. Also powers the Policy Tracker's "leakage" tile — Polymarket price in the 6–48h pre-statement window as a Granger predictor of post-statement CAR.

Narrative synthesis

13. Adversarial-consensus LLM narrative

A single LLM summarizing the data is prone to one-sided storytelling. We run the narrative synthesis three ways: Sonnet 4.6 argues the signal case ("this is suspicious, here's why"), GPT-4o-mini argues the coincidence case ("this could easily be random"), and Haiku 4.5 reads both arguments + the raw data and picks a winner as strict JSON. The published narrative is only the signal side when the arbiter picks it AND at least three independent FDR-passed rows back it; otherwise the coincidence narrative wins (with a "likely clean" banner).

signal_md, cost_s = Sonnet(system=pro-signal, data)
coincidence_md, cost_c = GPT-4o-mini(system=coincidence, data)
{verdict, reasoning}, cost_a = Haiku(system=arbiter, signal_md, coincidence_md, data)
publish_signal ⇔ verdict == "signal" ∧ fdr_rows ≥ 3

All four artefacts (signal argument, coincidence argument, verdict, arbiter reasoning) are persisted in statement_narratives for audit. Monthly spend cap $80, scoped to tracker-only models so unrelated Sentinel LLM spend can't starve the tracker. See sentinel/jobs/trump_trade_correlation.py::_adversarial_synthesis.

Honest framing. None of these methods "know" whether something suspicious happened — they measure how unlikely a pattern is under a specified null hypothesis. A fishy-score of 80 doesn't mean "insider trading"; it means "if the null were true, we'd see a pattern this extreme only rarely." All verdict language uses hedge vocabulary ("consistent with", "suggestive of") never "proves" or "confirms". A pre-registered prediction ledger resolves each call against market outcomes so the tracker's own accuracy is auditable — after 90 days of resolutions we will publish a Brier score and calibration plot.

Data sources

Twenty public feeds, all free or already-paid.

Every signal you see is built from public or open-licensed feeds — government filings, exchange data, news APIs, social firehoses. No scraping of private accounts, no PII beyond what's already in SEC / FEC filings, no commercially-restricted vendor data is republished. A short list of paid feeds we use internally for sanity-checking (prime-broker borrow data, on-chain wallet entity labels) does not appear in the published report — those vendors don't permit redistribution and we honour that.

Market + price

yfinance (daily OHLCV, 52-week extremes, market cap, splits, dividends) · Polygon (options chains, IV / skew, contract-level volume — commercial license) · FINRA ATS (weekly off-exchange / dark-pool settlement volume per ticker) · Finnhub (sell-side analyst estimates, price targets, earnings calendar — free tier) · Kalshi (CFTC-regulated event-contract probabilities) · Polymarket (Gamma API + CLOB price-history for event-market positioning + wallet activity) · FRED (macro series — yields, dollar, VIX) · OpenFIGI (security identifier mapping)

Filings + disclosures

SEC EDGAR (10-K / 10-Q / 8-K narrative + Form 4 insider trades + 13D/13G activist filings + 13F institutional holdings + S-1/S-3 registrations) · SEC Litigation Releases (enforcement actions) · Senate Financial Disclosures (member trades with STOCK-Act late-filing flag) · FEC (campaign-donor rollups across Trump-aligned committees, used only for the policy tracker) · Federal Register (executive orders, proclamations, agency rule-makings)

News + sentiment

GDELT DOC 2.0 (global news sentiment + tone + theme tags + CAMEO event categories) · Reddit via PRAW (cashtag + company-name mentions across investing subs) · StockTwits (cashtag stream) · Bluesky Jetstream (firehose-derived company-name mentions) · Wikipedia pageview API (attention proxy) · certificate-transparency logs (crt.sh — used to detect upcoming product launches via novel subdomain registrations) · White House RSS + truthbrush (Truth Social public archive, used only for the policy tracker)

Regulatory + labour + corporate

FDA OpenFDA (drug + device enforcement actions, recalls, adverse events) · EPA ECHO (environmental enforcement records) · FTC RSS (consumer-protection actions) · FDIC (bank financial reports — sector context only) · USAspending.gov (federal contract awards, trailing 365d) · Senate LDA (lobbying disclosures, trailing 180d) · USPTO via Google Patents BigQuery (patent grants + citations as R&D activity proxy) · Adzuna (job-posting flow as a hiring signal) · ICIJ Offshore Leaks (named-entity matches for the policy tracker only)

Every row in the database carries the source it came from plus an ingested_at timestamp. The dashboard's system-status strip surfaces per-source freshness dots so you can see at a glance which feeds are late. Forward-only sources (Polymarket snapshots, Bluesky, Kalshi, Wikipedia revisions, analyst estimates) only accumulate from the day the collector was deployed — historical depth grows with time. Paid vendor feeds with commercial-use restrictions (Ortex prime-broker borrow data, Nansen / Arkham wallet labels) are used internally for cross-checks but never republished in the report.

How to read it

Interpreting each panel.

Every score on the report is built from one of the thirteen modules above. Here's how to read each one as you scroll through a company card.

Top of card

Health Score (0–100) + confidence interval

A Kalman-filtered composite of all signals for that company. Higher is healthier. The bracketed range underneath is the 95% confidence interval — it widens when input coverage is poor (weekends, missing feeds), so you can see when the score is less trustworthy. Treat scores in 60–100 as positive, 40–60 as neutral, 0–40 as risk-on.

Composite under Sentinel's IC-weighted blender; CI from Kalman variance. A widening CI without a score change means data quality dropped, not that anything happened.

Bayesian Signal Contributions

Posterior probability + waterfall of log-likelihood ratios

Reads "given today's evidence, how likely is something material?" The bars left of zero argue the bearish case, right of zero the bullish — the net direction tells you which side won. A posterior of ≥65% with three or more contributing signals is strong conviction; 50–55% with one bar is "barely moved off prior" and shouldn't be acted on alone.

Each bar is the log-likelihood ratio of one signal under H₁ (material change) vs H₀ (status quo). The bar's length, not its colour, is what matters.

Anomaly Detection

6-layer ensemble + consensus flag

Six independent detectors run on different aspects of the day's data: Mod-Z and ECOD on sentiment, CUSUM on residual drift, COPOD on tail probability, Vol-z and Rtn-z on price-action shocks. Consensus fires when ≥2 of the original four (Mod-Z, ECOD, CUSUM, COPOD) agree — that's the high-precision signal. A 1/6 single-layer trigger by itself is normal day-to-day noise.

Consensus rows surface in the dashboard signal feed and the report's "Unusual Today" section. Single-layer rows are never user-facing.

Signal Conflict (Dempster-Shafer)

Conflict factor K

A K-value answers "do my bullish and bearish signals agree on direction?" K = 0 means perfect agreement; K > 0.3 flags meaningful disagreement (the Bayesian posterior loses some reliability); K > 0.5 is direct contradiction (treat the score as soft and read the underlying signals manually).

Computed once we have ≥2 signals firing in opposite directions. The dashboard scatter plots K vs Bayesian posterior so high-conviction-but-conflicted names stand out.

Granger Causality

p-values + significant-after-BH flag

For each (signal → returns) pair we ask "do past values of the signal help predict next-day returns beyond what past returns alone do?" A p-value < 0.05 after Benjamini-Hochberg correction is the bar — anything else is "no evidence". This panel is where you find which signals have been predictive recently for THIS company.

Re-fit weekly on a 90-day window. Useful for sceptics who want to know if the IC-weighted blender's recent weight-shifts reflect genuine signal-decay or noise.

Cross-Sectional Position

Percentile rank vs the 24-name universe

Every metric ranked against the other 24 tracked companies on the same day. 100% = best in the universe; 50% = median; 0% = worst. This is a relative-value lens — good for "is the slowdown idiosyncratic or sector-wide?". A composite percentile of 90% but a sentiment percentile of 20% is a contrarian signal worth investigating.

All ranks are recomputed daily so the universe is always current. Pinned company shows on top.

Options-Implied Signals

Put/Call · IV · Skew · Term Structure · Volume

P/C > 1.2 = put-heavy (defensive flow); P/C < 0.8 = call-heavy (speculative flow). 30-day IV high vs the company's history = priced-in event risk. Positive 25-delta skew (puts > calls implied vol) = downside hedging in size. Term-structure inversion (front-month IV > longer-dated) = imminent-catalyst bet.

Polygon-sourced. The "unusual" badge fires when daily option volume exceeds 1.5× the 20-day mean — forward-only, doesn't backfill historical bars.

Polymarket vs Sentinel divergence

|Δ| ≥ 15 percentage points

For event-tied markets where we can compute a Sentinel probability (earnings beat, regulatory approval, named-event resolution), we display markets where the crowd's price diverges from our estimate by ≥15pp. Negative Δ = Sentinel is more bullish than the market; positive Δ = Sentinel is more bearish. Either side is a calibration-question, not necessarily an alpha-question.

Polymarket Gamma API for crowd price; Sentinel probability from the IC-weighted score mapped to a Brier-calibrated band.

Universal interpretation rule. No single panel is a buy or sell trigger. The report is built so that two or more independent panels firing in agreement is what you treat as signal — that's the entire point of the consensus gate, the FDR correction, and the multi-signal Bayesian aggregation. If only one panel says something unusual, treat it as a question, not an answer.

Stack

Boring, well-instrumented, Railway-hosted.

Backend

Python 3.12 · Flask 3 · Gunicorn (2 workers) · psycopg2 · tenacity retries on every external API

Orchestration

Dagster 1.9 (55+ scheduled jobs, UTC-only) · correlation every 30 min · scoring jobs 00:15–06:00 UTC · watermark UPSERT on completion

Database

Postgres on Neon · 50+ tables · parameterised SQL only · TIMESTAMPTZ everywhere · UNIQUE(company_id, date) on per-company daily tables

Frontend

Jinja2 templates · Tailwind CDN · Plotly.js · vanilla JS · Okabe-Ito colour palette (colour-blind safe)

LLMs

Anthropic Claude (Sonnet 4.6 primary, Haiku 4.5 extraction) · OpenAI GPT-4o-mini (fallback + adversarial skeptic) · ephemeral prompt caching · tracker spend capped $80/month

Delivery

SendGrid (daily email, 06:00 UTC) · Stripe (subscription billing + webhooks) · Sentry (error tracking, web + worker)

Hosting

Railway.app · two services (web + dagster-daemon) · ~$15/mo · ruff + pytest-cov on every PR

Honest limitations

What Sentinel is not.

Not investment advice. Not a recommendation engine. Not a short-selling tool. Not deanonymization. The tracker's "fishy score" is a statistical measure of pattern unusualness — not a claim of wrongdoing. We publish the methodology so readers can weigh the signal themselves. Data coverage is thin for forward-only sources (Polymarket / Bluesky / Kalshi only accumulate from the day the collector was deployed), and the adversarial-consensus narrative publishes only when arbiter+FDR gates agree — so many statements will show "null signal" rather than prose. A 90-day prediction-ledger calibration window is required before we promote any policy-tracker output to the daily email.

Get started

The live report is one click away.

Subscribers land on /report — the canonical live URL, always current. The dashboard at /dashboard is the command surface: system-status strip, 30-day anomaly chart, watchlist, priority-sorted signal feed. Pricing at /pricing; the 14-day trial covers up to 10 companies with full model access.

Start free trial Open the live report →

Questions? william@sentinelofficial.co.uk