Market Noise

A single month can be dominated by random market movement, macro shocks, liquidity shifts, earnings surprises, sector-specific reversals, or events not represented in the model-facing briefing. A strong or weak round does not prove durable investing skill by itself.

Small Samples

Cumulative results become more informative as more resolved rounds are added. Early leaderboards should be read as benchmark observations, not as statistically settled rankings. Participation counts matter because models may enter the benchmark at different times.

Model And Provider Changes

Hosted model behavior can change over time even when a public model name appears stable. CapitalBench records round timing, run metadata, provider, model ID, and official run ID, but it cannot guarantee that a hosted model will behave identically in a later rerun.

Universe Limits

The option universe is deliberately constrained. It uses broad public exposures such as cash, ETFs, sectors, factors, bonds, commodities, and technology themes. It does not represent all tradable assets, single-name stock selection, options trading, leverage, transaction costs, taxes, or investor-specific constraints.

Prompt Sensitivity

LLM outputs can be sensitive to prompt wording, schema requirements, system instructions, model routing, and provider-side changes. CapitalBench reduces those degrees of freedom by freezing inputs and publishing audit hashes, but no one benchmark captures all useful model behavior.

Not Financial Advice

Benchmark results should not be used as trading signals or portfolio allocation advice. The public pages are research artifacts for comparing model decisions under a strict protocol.