How four AIs reach monthly
ETF consensus through structured debate
Full disclosure of agent roles, round structure, weight calibration, and data sources. Same inputs must yield same conclusion — that's our single source of trust.
1. The Four AIs · Role Definitions
Macro Analyst
Aggregates 13 macro indicators (ISM PMI, unemployment, yield curve, HY spread, etc.) into a regime score. Monthly scoring on a +1.7 / -1.7 scale.
Momentum Tracker
Monitors 3-month and 12-month moving averages plus RSI of 12 US sector ETFs. Auto-trim recommendation on overheating signals (e.g., 17 consecutive up-days).
Risk Manager
Tracks VIX, HY spread, Fed dissent count. Enforces defensive (XLV / ITA) allocation when volatility hedge is needed.
Verifier
Scores the other 3 AIs' 3-month accuracy and historical miss frequency. AIs below 70% accuracy get auto-downweighted.
2. 5-Round Debate Structure
What happens: Decompose each ETF's actual return vs prediction. Label as hit / sideways / miss.
Output example: Example: SOXX +12.3% strong hit / SMH cut +9~10% rally — partial miss.
What happens: Macro AI presents 13-indicator score and MoM change. Other 3 AIs challenge or reinforce.
Output example: Example: aggregate -1.7pt, no regime shift → late-cycle maintained.
What happens: Map current momentum score + up-day count to 12 sector ETFs. Identify overheated / cheap candidates.
Output example: Example: semis 17 days up → no new chase buys.
What happens: Each AI proposes desired weights. Difference >±3pp → weaker argument concedes. Not majority — logic strength.
Output example: Example: SOXX 40 → 38% (2pp preemptive trim).
What happens: Verifier codifies future check criteria for this month's decision. Adds to rulebook.
Output example: Example: '+15% over → auto-trim 5pp' new rule.
3. Weight Calibration
- Base weights: 4 AIs equal (25% each)
- 3-month accuracy adjustment: Verifier scores monthly → below 70% = -5pp, above 90% = +5pp
- Sector constraints: single ETF ≤ 50%, single sector ≤ 60%
- Monthly weight change ±3pp is the statistically justified range. Beyond requires extra evidence.
- +15% threshold auto-trim 5pp (rule added 2026-05)
4. Data Sources · License
| Indicator | Source | License / Cadence |
|---|---|---|
| ISM PMI | Institute for Supply Management | Public release (monthly) FRED MANEMP series |
| Unemployment | BLS · U.S. Bureau of Labor Statistics | Public data (monthly) FRED UNRATE series |
| Yield Curve | FRED · Fed Reserve of St. Louis | Public data (FRED) FRED T10Y2Y series |
| Fed Dissent | FOMC Statements + Bloomberg/Reuters | Public + media citation Manual entry with citation |
| VIX | CBOE (via Yahoo Finance) | Index data (daily close) yfinance ^VIX |
| HY Spread | ICE BofA US High Yield Master II OAS | Public data (FRED) FRED BAMLH0A0HYM2 |
| ETF Prices | iShares / State Street / Invesco | Public data (daily) yfinance + Supabase etf_prices |
⚠ All sources are public data. We only re-process and re-interpret. Data accuracy is the source institution's responsibility.
5. Limits and *publishing our misses*
This methodology is simulation based on historical patterns and does not guarantee future returns. AI consensus has its own *collective bias* — all four can misread the same data the same way.
When misses occur, we log them on the Missed Museum with 4 columns: decision basis → actual result → what we missed → rule added. Reproducible reasoning, not raw accuracy, is our asset.