CapitalBench

The benchmark for AI capital allocation

AI models get the same market brief and choose their own portfolios. We track how those portfolios perform in the real market.

See how AI models perform against each other, how they invest and take risk, and how they perform in the real market.

Read the CapitalBench Manifesto

Open Live Positions & Returns Get score alerts Request API access

Model behavior patterns How do AI models allocate differently?

6 models

GPT-5.6 Sol OpenAI Early sample Early sampleHighest risk-takingMost consensus-alignedTechnology tilt

GPT-5.5 OpenAI Aggressive upside hunter Lowest turnoverTechnology tiltInternational tiltBinary results

Grok 4.5 xAI Early sample Early sampleTechnology tiltRisk 84.5/100Top holding 33.3%

Grok 4.3 xAI High-conviction concentrator Binary resultsOften different from peersRisk 79.3/100Top holding 37.5%

Gemini 3.1 Pro Google High-conviction concentrator Most concentratedMost distinctiveBinary resultsOften different from peers

Claude Fable 5 Anthropic Balanced allocator Balanced profileRisk 73.4/100Top holding 27.4%Tech tilt 28.4%

See full behavior

Benchmark results

Which models are performing best?

Monthly and weekly tracks stay separate. Higher benchmark score is better.

Claude Opus 4.8

Grok 4.3

Claude Opus 4.7

Gemini 3.1 Pro

GPT-5.5

S&P 500

Max possible What is this? hindsight best asset

Higher benchmark score is better.

Claude Opus 4.8 Anthropic · 8/8 scored rounds

0.5

Grok 4.3 xAI · 8/8 scored rounds

0.2

Claude Opus 4.7 Anthropic · 8/8 scored rounds

-0.4

Gemini 3.1 Pro Google · 8/8 scored rounds

-9.9

GPT-5.5 OpenAI · 8/8 scored rounds

-15.6

S&P 500 S&P 500 · 8/8 scored rounds

-2.6

Max possible Hindsight ceiling, not a model portfolio

What is this? 100.0

8 shared resolved rounds5 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-06-09-1M

Return context

Average Return Details

Average portfolio return across the same finished rounds.

Claude Opus 4.8

0.11%

Grok 4.3

0.03%

Claude Opus 4.7

-0.08%

Gemini 3.1 Pro

-2.07%

GPT-5.5

-3.25%

S&P S&P 500

-0.55%

MAX Max possible What is this?

20.81%

Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.

Monthly and weekly are separate comparison tracks. Scores are never mixed across horizons. Read scoring rules

View all comparison sets Read scoring rules

Latest official results

What happened in the latest scored rounds?

Finished monthly and weekly rounds scored against real market returns.

Monthly result1 of 11

Monthly official result

Monthly result scored Jul 9

Same-window returns, ranked after final prices.

Scored

Model portfolios S&P 500 benchmark Maximum possible return

Claude Opus 4.8

GPT-5.5

Claude Opus 4.7

Grok 4.3

Claude Fable 5

Gemini 3.1 Pro

S&P 500

Max

Claude Opus 4.8 Anthropic

3.10%

GPT-5.5 OpenAI

2.76%

Claude Opus 4.7 Anthropic

2.60%

Grok 4.3 xAI

2.23%

Claude Fable 5 Anthropic

1.58%

Gemini 3.1 Pro Google

0.18%

S&P 500 Benchmark

2.25%

Max possible XBI

25.16%

Portfolio context

Shows each model's saved portfolio weights.

Model portfolios

Ranked in the same order as the chart.

Claude Opus 4.8 Anthropic

Healthcare (XLV) 30% Dividend (SCHD) 20% Value (IWD) 20% Defense (ITA) 15% T-Bills (BIL) 15%

GPT-5.5 OpenAI

Healthcare (XLV) 30% Energy (XLE) 25% Financials (XLF) 20% Defense (ITA) 15% Equal-Weight S&P 500 (RSP) 10%

Claude Opus 4.7 Anthropic

Healthcare (XLV) 30% Dividend (SCHD) 25% Low Vol (SPLV) 20% Staples (XLP) 15% Defense (ITA) 10%

Grok 4.3 xAI

Healthcare (XLV) 40% Energy (XLE) 30% Defense (ITA) 30%

Claude Fable 5 Anthropic

Healthcare (XLV) 30% Energy (XLE) 25% Low Vol (SPLV) 15% Staples (XLP) 15% Value (IWD) 15%

Gemini 3.1 Pro Google

T-Bills (BIL) 30% Healthcare (XLV) 25% Gold (IAU) 20% Energy (XLE) 15% Staples (XLP) 10%

Reference points

Not model portfolios.

S&P 500 Benchmark

Benchmark return over the same scoring window

Max possible XBI

100% Biotechnology (XBI) hindsight ceiling

Official scored round

Monthly result scored Jul 9

Audit ID: CB-2026-06-09-1M

Audit packet Track results

ScoredJul 9WindowJun 9 to Jul 9Models6Asset choices70LeaderClaude Opus 4.8HorizonMonthly

View all benchmark results

Model behavior patterns How do AI models allocate differently?

6 models

GPT-5.6 Sol OpenAI Early sample Early sampleHighest risk-takingMost consensus-alignedTechnology tilt

GPT-5.5 OpenAI Aggressive upside hunter Lowest turnoverTechnology tiltInternational tiltBinary results

Grok 4.5 xAI Early sample Early sampleTechnology tiltRisk 84.5/100Top holding 33.3%

Grok 4.3 xAI High-conviction concentrator Binary resultsOften different from peersRisk 79.3/100Top holding 37.5%

Gemini 3.1 Pro Google High-conviction concentrator Most concentratedMost distinctiveBinary resultsOften different from peers

Claude Fable 5 Anthropic Balanced allocator Balanced profileRisk 73.4/100Top holding 27.4%Tech tilt 28.4%

See full behavior

AI positioning

What are AI models doing right now?

Live allocations before the next official score.

Current risk appetite As of July 10, 2026 89.5/100 Aggressive / Growth-led risk seeking

Consensus allocation As of July 10, 2026 38.8% Semiconductors (SMH) average live weight

Risk shift As of July 10, 2026 +7.4 Change vs Jul 9 portfolios

Model agreement As of July 10, 2026 Tight 3.0 point dispersion

Current risk appetite 89.5/100 As of July 10, 2026 / Aggressive

As of July 10, 2026 Growth-led risk seeking Combined monthly + weekly pulse from CB-2026-07-10-1M and CB-2026-07-10-1W.

Unscored portfolios 79.2/100 Separate read across every open portfolio before official scoring.

Largest current allocations

Semiconductors (SMH) 38.8% Taiwan Equities (EWT) 16.6% Financials Sector (XLF) 11.3% Technology Sector (XLK) 10.3% Cybersecurity (CIBR) 5.9% Biotechnology (XBI) 4.4%

Regime mix

Growth and technology 62.2% International equity 19.7% Broad and cyclical equity 16.3% Real assets and inflation 1.9%

Inspect current frozen portfolios View historical risk trend

Benchmark insights

What do the latest AI decisions suggest?

Fresh signals from positioning, risk, and scored results.

Current PositioningAs of Jul 10

Latest live portfolios2 live rounds16 modelsLive portfolios

Live AI portfolios are concentrated in Semiconductors (SMH)

Across the newest live weekly and monthly portfolios, Semiconductors (SMH) is the largest aggregate allocation at +38.75%.

Aggregate allocation averages the newest live model portfolios before final scores are known.

Medium confidenceMath: deterministicData through Jul 10, 2026

Aggregate Live Allocation: +38.8%

Market EnvironmentAs of Jul 9

Weekly market environments12 resolved rounds2 modelsReady sample

Weekly model leadership changes with the S&P 500 environment

Claude Opus 4.8 leads down environments at -2.70% across 8 tests; GPT-5.5 leads up environments at +2.11% across 4 tests.

Market environments group resolved rounds by the S&P 500 return over the same weekly or monthly window. Models are compared only on shared rounds; high confidence requires at least six observations and stable leadership.

Medium confidenceMath: deterministicData through Jul 9, 2026

Down Leader Average Return: -2.70%
Down Shared Rounds: 8
Down Leader Stability: 0.88

Risk RegimeAs of Jul 10

Latest live portfolios2 live rounds16 modelsLive portfolios

Live AI risk posture is aggressive

The newest live portfolios have a deterministic risk-taking score of 89.5 out of 100.

Risk-taking score is allocation-based, not performance-based: higher means more weight in growth, momentum, cyclical, and higher-risk assets.

Medium confidenceMath: deterministicData through Jul 10, 2026

Live Risk Taking Score: 89.5/100

Open full insight feed

Trust and proof

How does CapitalBench keep scores comparable?

Same brief, same choices, frozen portfolios, real prices.

Step 1 Same report
Every model reads the same market report.
Step 2 Same choices
Every model chooses from the same 70 assets.
Step 3 First portfolio locks
Each model's saved portfolio is frozen before results are known.
Step 4 Fixed wait window
The frozen portfolio sits untouched for 7 days or 1 month.
Step 5 Prices score it
Real ending prices decide which model did best.

Evidence context

What evidence supports each score?

Equal-run rules, score scale, and audit context for monthly and weekly tracks.

Monthly benchmarkMore established

Evidence levelMore establishedMonthly evidence has enough completed rounds for stronger pattern reads, while still needing ongoing live validation.

Monthly evidence11 resolved rounds / 53 model resultsCurrent threshold met at 3+ rounds

Equal-run comparison5 models on the same 8 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolMixed protocolCompleted history includes 10 portfolio, 1 single-pick, and 0 unlabelled rounds.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Weekly benchmarkMore established

Evidence levelMore establishedWeekly evidence has enough completed rounds for stronger pattern reads, while still needing ongoing live validation.

Weekly evidence25 resolved rounds / 128 model resultsCurrent threshold met at 6+ rounds

Equal-run comparison5 models on the same 23 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolPortfolio-onlyCompleted rounds use constrained multi-asset portfolios.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Read full methodology See score methodology

Benchmark universe

What can models choose from?

The active roster, asset menu, horizons, and open rounds.

Models 8

Asset choices 70

Round lengths 2

Open rounds 25

Protocol Single-turn Non-agentic calls

Explore models View asset universe Open live dashboard

Audit packet

How can you verify the benchmark?

Round packets expose the report, prompt, portfolios, prices, hashes, and result status.

Open monthly audit packet Open weekly audit packet View all rounds Repository

The benchmark for AI capital allocation

Which models are performing best?

Current Monthly Benchmark

CapitalBench Score

Average Return Details

Current Weekly Benchmark

CapitalBench Score

Average Return Details

What happened in the latest scored rounds?

Monthly result scored Jul 9

Monthly result scored Jul 9

Monthly result scored Jul 8

Monthly result scored Jul 8

Monthly result scored Jul 6

Monthly result scored Jul 6

Monthly result scored Jul 2

Monthly result scored Jul 2

Monthly result scored Jul 1

Monthly result scored Jul 1

Monthly result scored Jun 29

Monthly result scored Jun 29

Monthly result scored Jun 29

Monthly result scored Jun 29

Monthly result scored Jun 26

Monthly result scored Jun 26

Monthly result scored Jun 24

Monthly result scored Jun 24

Monthly result scored Jun 17

Monthly result scored Jun 17

Monthly result scored Jun 10

Monthly result scored Jun 10

Weekly result scored Jul 9

Weekly result scored Jul 9

Weekly result scored Jul 8

Weekly result scored Jul 8

Weekly result scored Jul 7

Weekly result scored Jul 7

Weekly result scored Jul 6

Weekly result scored Jul 6

Weekly result scored Jul 2

Weekly result scored Jul 2

Weekly result scored Jul 2

Weekly result scored Jul 2

Weekly result scored Jul 1

Weekly result scored Jul 1

Weekly result scored Jun 30

Weekly result scored Jun 30

Weekly result scored Jun 29

Weekly result scored Jun 29

Weekly result scored Jun 25

Weekly result scored Jun 25

Weekly result scored Jun 24

Weekly result scored Jun 24

Weekly result scored Jun 23

Weekly result scored Jun 23

Weekly result scored Jun 22

Weekly result scored Jun 22

Weekly result scored Jun 18

Weekly result scored Jun 18

Weekly result scored Jun 18

Weekly result scored Jun 18

Weekly result scored Jun 16

Weekly result scored Jun 16

Weekly result scored Jun 15

Weekly result scored Jun 15

Weekly result scored Jun 12

Weekly result scored Jun 12

Weekly result scored Jun 9

Weekly result scored Jun 9

Weekly result scored Jun 8

Weekly result scored Jun 8

Weekly result scored Jun 5

Weekly result scored Jun 5

Weekly result scored Jun 5

Weekly result scored Jun 5

Weekly result scored Jun 4

Weekly result scored Jun 4

Weekly result scored Jun 2

Weekly result scored Jun 2

Weekly result scored May 29