One official call per model
Official picks are not averaged with repeated stability runs or private smoke tests.
Public benchmark console
One-shot LLM market decisions, frozen before the deadline and scored after the fixed horizon. The public site tracks official model picks, market benchmark leaderboards, methodology, and audit hashes for reproducible AI market evaluation.
The site distinguishes collected decisions from scored performance. Picks and entry prices are public; realized returns remain absent until the exit date resolves.
Briefing, prompt, universe, and market context hashed before model calls.
4 valid official one-shot picks are published.
Entry side is visible for selected options, S&P 500, and cash.
Performance unlocks after 2026-06-10 prices are collected and scored.
Official picks are not averaged with repeated stability runs or private smoke tests.
Round metadata, options, official submissions, and hashes are mirrored into Supabase for the site.
Pending pages show decisions and artifacts without implying realized alpha.
All four official Round 1 models selected the same option. Performance remains unpublished while the round is pending.
| Model | Provider | Pick | Confidence | Status | |
|---|---|---|---|---|---|
| Claude Opus 4.7 anthropic-claude-opus-4-7 | Anthropic | SEMICONDUCTORS | 0.58 | Pending | |
Rationale Semiconductors show dominant momentum with AI capex tailwinds, while credit and volatility conditions remain benign. Key Risks
| |||||
| Gemini 3.1 Pro google-gemini-3-1-pro | SEMICONDUCTORS | 0.60 | Pending | ||
| GPT-5.5 openai-gpt-5-5 | OpenAI | SEMICONDUCTORS | 0.34 | Pending | |
| Grok 4.3 xai-grok-4-3 | xAI | SEMICONDUCTORS | 0.55 | Pending | |
The official leaderboard appears here after exit prices are available and the run is scored.