Methodology

How four AIs reach monthly
ETF consensus through structured debate

Full disclosure of agent roles, round structure, weight calibration, and data sources. Same inputs must yield same conclusion — that's our single source of trust.

1. The Four AIs · Role Definitions

🌍

Macro Analyst

Aggregates 13 macro indicators (ISM PMI, unemployment, yield curve, HY spread, etc.) into a regime score. Monthly scoring on a +1.7 / -1.7 scale.

📈

Momentum Tracker

Monitors 3-month and 12-month moving averages plus RSI of 12 US sector ETFs. Auto-trim recommendation on overheating signals (e.g., 17 consecutive up-days).

🛡️

Risk Manager

Tracks VIX, HY spread, Fed dissent count. Enforces defensive (XLV / ITA) allocation when volatility hedge is needed.

🔍

Verifier

Scores the other 3 AIs' 3-month accuracy and historical miss frequency. AIs below 70% accuracy get auto-downweighted.

2. 5-Round Debate Structure

Round 1Performance Review

What happens: Decompose each ETF's actual return vs prediction. Label as hit / sideways / miss.

Output example: Example: SOXX +12.3% strong hit / SMH cut +9~10% rally — partial miss.

Round 2Macro Inflection Check

What happens: Macro AI presents 13-indicator score and MoM change. Other 3 AIs challenge or reinforce.

Output example: Example: aggregate -1.7pt, no regime shift → late-cycle maintained.

Round 3Sector Momentum Mapping

What happens: Map current momentum score + up-day count to 12 sector ETFs. Identify overheated / cheap candidates.

Output example: Example: semis 17 days up → no new chase buys.

Round 4Weight Consensus

What happens: Each AI proposes desired weights. Difference >±3pp → weaker argument concedes. Not majority — logic strength.

Output example: Example: SOXX 40 → 38% (2pp preemptive trim).

Round 5Verification & New Rules

What happens: Verifier codifies future check criteria for this month's decision. Adds to rulebook.

Output example: Example: '+15% over → auto-trim 5pp' new rule.

3. Weight Calibration

Base weights: 4 AIs equal (25% each)
3-month accuracy adjustment: Verifier scores monthly → below 70% = -5pp, above 90% = +5pp
Sector constraints: single ETF ≤ 50%, single sector ≤ 60%
Monthly weight change ±3pp is the statistically justified range. Beyond requires extra evidence.
+15% threshold auto-trim 5pp (rule added 2026-05)

4. Data Sources · License

Indicator	Source	License / Cadence
ISM PMI	Institute for Supply Management	Public release (monthly) FRED MANEMP series
Unemployment	BLS · U.S. Bureau of Labor Statistics	Public data (monthly) FRED UNRATE series
Yield Curve	FRED · Fed Reserve of St. Louis	Public data (FRED) FRED T10Y2Y series
Fed Dissent	FOMC Statements + Bloomberg/Reuters	Public + media citation Manual entry with citation
VIX	CBOE (via Yahoo Finance)	Index data (daily close) yfinance ^VIX
HY Spread	ICE BofA US High Yield Master II OAS	Public data (FRED) FRED BAMLH0A0HYM2
ETF Prices	iShares / State Street / Invesco	Public data (daily) yfinance + Supabase etf_prices

⚠ All sources are public data. We only re-process and re-interpret. Data accuracy is the source institution's responsibility.

5. Limits and publishing our misses

This methodology is simulation based on historical patterns and does not guarantee future returns. AI consensus has its own *collective bias* — all four can misread the same data the same way.

When misses occur, we log them on the Missed Museum with 4 columns: decision basis → actual result → what we missed → rule added. Reproducible reasoning, not raw accuracy, is our asset.

← Back to homepage