Research: The Resolution Problem

The Resolution Problem

The finding: Every major prediction market controversy is a resolution failure. Not a bad prediction. Not market manipulation. Not a smart-money edge. Resolution.

Scott Alexander’s January 2026 survey of prediction market scandals (“Mantic Monday: The Monkey’s Paw Curls”) covers seven incidents. All seven are variations of the same problem: the real-world outcome was determined, but nobody could agree on what the market’s question actually asked.

“You just can’t specify every possible state of the world beforehand.” — Scott Alexander

The Case Study Archive

Zelensky Suit (Polymarket, 2025)

Market: Will Zelensky wear a business suit at a specific diplomatic meeting?

What happened: $100M+ in volume. Zelensky wore a suit — by most reasonable readings. The market resolved NO on a technicality about the exact definition of “business suit” per the market’s wording. Hundreds of users who correctly predicted the real-world outcome received nothing.

Why it matters: The market worked perfectly as an information aggregation tool. It failed as a financial product. The information was correct; the settlement was wrong.

Ukraine Minerals Deal (Polymarket, 2025)

Market: Will Ukraine sign a minerals deal with the US by [date]?

What happened: The UMA oracle almost resolved the market incorrectly in the final two minutes before deadline. A concentrated position attempted to manipulate the oracle vote to flip the resolution.

Why it matters: Oracle centralization means resolution is a single point of failure. The entire notional value of the market was exposed to a vote-buying attack that cost a fraction of the market’s value.

Oscar Viewership (Polymarket, 2025)

Market: Will the Oscar ceremony achieve a specific viewership threshold?

What happened: The market resolved on preliminary NYT viewership numbers, which were subsequently revised upward. The preliminary numbers showed the market resolving NO; the final numbers showed YES. The resolution was technically correct per the market’s terms (preliminary, not final numbers) but informationally wrong.

Why it matters: Prediction markets are supposed to aggregate information. Resolving on preliminary data that gets corrected undermines the entire purpose — the “prediction” was right (the event exceeded the threshold) but the market said otherwise.

Venezuela Invasion (Polymarket, 2024)

Market: Will Venezuela invade Guyana by [date]?

What happened: Venezuelan armed forces moved into disputed territory. The UMA oracle debated for days over whether this constituted an “invasion” per the market’s definition. The ambiguity in the question wording created a $10M+ resolution dispute.

Why it matters: Geopolitical events rarely have clean edges. A question that seemed unambiguous at creation (“will there be an invasion?”) turned out to be deeply ambiguous at resolution.

Russian Town Capture via Fake Map (Manifold, 2025)

Market: Will Russian forces capture [specific town] by [date]?

What happened: A user submitted a manipulated ISW map as resolution evidence showing Russian capture of the town. The oracle (a community vote on Manifold) initially accepted the fake evidence. The resolution had to be reversed after the forgery was discovered.

Why it matters: Community oracle systems are vulnerable to misinformation submitted as resolution evidence. Human reviewers are not equipped to detect all forms of manipulation, especially in fast-moving conflict tracking markets.

The Structural Analysis

These five cases aren’t random failures. They represent three categories of resolution problem:

Category	Example	Root cause
Specification ambiguity	Zelensky suit, Venezuela invasion	Real-world events don’t fit legal definitions. No question wording can anticipate every edge case.
Oracle manipulation	Ukraine minerals, Russian fake map	Resolution votes are cheaper to buy than the market is worth.
Data quality	Oscar viewership	Preliminary data sources used for resolution contain measurement error.

Why Traditional Solutions Don’t Work

Better question wording: Impossible to specify all edge cases in advance. The Venezuela market’s authors were not being careless — “invasion” is simply ambiguous when applied to real military movements.

Larger oracle / more voters: More voters doesn’t help if the manipulation cost scales proportionally. A larger oracle is still vulnerable — the attacker just needs to buy more votes.

Human expert panels: Slow, expensive, and still subject to ambiguous interpretation. Expert panels debate the same ambiguities lawyers debate.

The AI Resolution Argument

LLMs don’t solve prediction ambiguity — nothing does. But they offer two structural advantages over existing oracle designs:

1. Exhaustive edge case enumeration. An LLM can generate pages of interpretive edge cases for a proposed market question before the market opens. A human writer might anticipate 3–5 edge cases; an LLM might identify 50. Many resolution disputes could be prevented at market creation by stress-testing the question wording.

2. Consistent application of rulesets. Once a ruleset is established (“invasion requires crossing an internationally recognized border with armed forces in formation”), an LLM applies it consistently. Human oracles apply it inconsistently — different voters read the same ruleset differently.

Scott Alexander’s take (Jan 2026): “You can train them on good rulesets, and they’re tolerant enough of tedium to print out pages and pages of every possible edge case without going crazy.”

This is not a claim that AI resolution is perfect. It’s a claim that AI resolution is more consistent and more resistant to manipulation than vote-based oracles.

The Oracle Risk Layer

Beyond LLM resolution, the oracle itself needs structural protection. The Ukraine minerals incident shows what happens when oracle integrity is weak:

UMA’s vulnerability: UMA’s oracle uses a governance token vote for disputed resolutions. The governance token market cap is small relative to large prediction market OI. A well-capitalized attacker can buy enough governance tokens to flip a resolution. This attack has been documented.

The walk-away test (from Augur’s post-mortems): A robust oracle should survive if the development team disappears. It should not rely on trusted parties. It should be economically secured, not socially secured.

Chronomancy’s design principle: The AI resolution layer is a first-pass mechanism, not a trusted oracle. Disputed resolutions fall back to a staking-based appeals layer (FREEZE module) where the economic stakes are aligned — resolution stakers have capital at risk proportional to the market’s notional value.

Conclusion

The resolution problem is prediction markets’ Achilles heel. The market mechanism (aggregating probability estimates from many participants) is mathematically sound and empirically useful. The execution mechanism (determining what happened in the real world and mapping it to binary YES/NO) is broken in ways that can’t be fixed by writing better questions.

AI-based resolution doesn’t solve ambiguity. It provides:

Better pre-resolution stress testing (catch ambiguities before markets open)
More consistent interpretation of established rulesets
Resistance to the social manipulation that plagues vote-based oracles

The platforms that get resolution right will be the ones that capture institutional capital — which currently sits on the sidelines specifically because resolution is unreliable.

Related:

Freeze — dispute resolution and FREEZE mechanics
Identity & Sybil Resistance — anti-manipulation at the oracle layer
Anti-Induction — the deeper alignment problem in prediction markets
Competitive Landscape — how competitors handle resolution