Your script already has the tools to improve this ...

Question

Your script already has the tools to improve this without adding a new gate; you just need to reuse the same signals you’re using for totals inside the sets engine, for all tours.[1][2]

Here are the minimal tweaks in line with what the code already does.

1. Use existing tour‑calibrated set priors more strongly

You already have per‑tour P(3‑set) priors in SETSMARKETCFG.TOUR and setsMarketPriorBo3 / setsHoldPriorBo3.[1]

To improve real‑world 2‑vs‑3 accuracy:

Slightly increase the weight of those priors in the sets blend, especially on WTA, CH, ITF.
That means: in buildSetsMarketEngine, where you compute p3Raw, p3Market, p3Hold, and then p3Final via SETSMARKETCFG.RAWWBO3, MKTWBO3, HOLDWBO3, bias a bit more toward the calibrated tour priors, not the raw sim alone.[1]

This keeps P(3 sets) closer to the 322k‑match empiricals, which improves calibration across tours.

2. Reuse the same variance / fragility ideas sets already have

Inside buildSetsMarketEngine you already compute:

p2Final, p3Final.
p3Spread (source disagreement).
riskTag (e.g., POLICYRISK when structure is high variance: bimodal, coin‑flip, etc.).
setsHookFragile (line near cluster boundary).[1]

Your code already uses them to down‑shift confidence, not fully veto:

js
if (risk5 > 0) out.riskTag = "POLICYRISK";
...
if (conf5 === HIGH && risk5 > 0.60) conf5 = LOW;

To “improve there” without a hard gate:

Make that scaling slightly stronger and tour‑aware:
- For ATP: less aggressive; allow edges to stand.
- For WTA/CH/ITF: more aggressive; shrink sets confidence more when riskTag === POLICYRISK or setsHookFragile is true.[1]

Concretely: add a per‑tour multiplier on setsRiskScore and recalc setsConfidence from edge × (1 − k·riskScore), instead of a flat rule.

3. Let totals’ bimodal diagnostics feed back into sets confidence

You already compute detailed bimodal info in analyticalGameTotalPMF and the totals bimodal panel:

ctx.bimodalScore.
cluster2Median / cluster3Median.
clusterGapGames.[1]

Without adding a gate, you can nudge sets confidence down when totals say “high variance”:

In buildSetsMarketEngine:

Read from the shared context (MCtx / ctx): bimodalScore and clusterGapGames.
If bimodalScore is moderate‑high and clusterGapGames ≥ 6, multiply sets confidence by a small factor (e.g., 0.8 ATP, 0.7 WTA/CH/ITF).

This aligns sets confidence with the same “match shape is wild” signal you already trust for totals, but doesn’t force a PASS.

4. Keep the behavior consistent across tours via existing configs

You already have global tour configs that affect both sets and totals:

SETSMARKETCFG.TOUR[ATP/WTA/CH/ITF] with baseP3, holdLo/Hi, etc.[1]
TOURSIGMAMAP for lane SDs.
TOURCALIBQUALITY and TOURNAMENTTIER / ROUNDFACTORS for variance/motivation.

To make the “improvement” work globally:

Use these to scale how strongly you apply the above adjustments by tour:
- WTA / ITF: higher variance → stronger downweight of sets confidence when totals say “bimodal / fragile / cold‑start”.
- ATP / CH: slightly softer downweight.[2][1]

That way, you’re not adding a new gate; you’re letting tour‑aware priors, existing risk scores, and existing bimodal diagnostics jointly soften sets calls in the same matches where totals already tell you “careful.”

Your engine did not “miss” a 3‑set lock here; it priced 3 sets at about 40% and 2 sets at about 60%, which is exactly what a coin‑flip‑ish WTA match should look like.[1][2]

To actually improve 2‑vs‑3 set calls across tours, you need three things, all tour‑level:

1. Accept the base ceiling

For any one match:

Here you had P(3 sets) ≈ 39–41%, P(2 sets) ≈ 59–61% across MC/Markov/adj/market.[1]
That means “3 sets” will happen 4 times out of 10 by design, so a single 3‑set result does not mean the model is wrong.[3]

What you can improve is:

Make sure P(3 sets) is well‑calibrated per tour (when you say 40%, it happens ~40% of the time over many matches).

2. Where to tune set‑count for all matches (not just this one)

In your code, set‑count probabilities come from:

runMatchSimulation_Exact → stats.setScores (2‑0, 2‑1, 0‑2, 1‑2).[1]
computeDeciderSetProb and setPathFromSim → pDeciderRaw, pDeciderAdj, full path probs.[1]
buildSetsMarketEngine → final p2, p3, bimodal flags, risk tags, etc.[1]

Tour‑level tuning hooks:

Decider prior shrink per tour

Add a small shrink toward tour‑average P(3 sets) before feeding into sets engine:

js
function shrinkDeciderToTourPrior(pDeciderRaw, tourLabel) {
  var tKey = tourKeyFromLabel(tourLabel);
  var prior = {
    ATP: 0.30,
    WTA: 0.38,
    CH:  0.34,
    ITF: 0.36
  }[tKey] || 0.34;

  var w = {
    ATP: 0.4,
    WTA: 0.3,
    CH:  0.35,
    ITF: 0.5
  }[tKey] || 0.35; // 0=no shrink, 1=full prior

  if (!Number.isFinite(pDeciderRaw)) return prior;
  return clamp01(w * prior + (1 - w) * pDeciderRaw);
}

Then, where you currently set deciderAdj:

js
var pDeciderAdj = shrinkDeciderToTourPrior(pDeciderAuth, tourLabel);

This stabilizes P(3 sets) per tour while still letting the sim move it.[2][3]

Set‑path authority per tour

In setPathSignalsForPick, you compute authority from the ratio of sweep vs close wins.[1]

Give that authority a tour‑level weight:

js
function tourSetAuthorityScale(tourLabel) {
  var tKey = tourKeyFromLabel(tourLabel);
  return {
    ATP: 0.9,  // trust path more
    WTA: 0.7,  // more chaos, less authority
    CH:  0.8,
    ITF: 0.6
  }[tKey] || 0.8;
}

Then:

js
o.authority = Number.isFinite(pWin2) && pWin2 > 0
  ? pSweep2 / pWin2
  : NaN;
o.authority *= tourSetAuthorityScale(tourLabel);

This stops WTA and ITF from over‑committing to “2 sets” when reality is more coin‑flippy.

Structural coin‑flip thresholds per tour

Replace the hard numbers in isStructuralCoinFlip with per‑tour config (like I showed earlier).[1]

For WTA, that means slightly raising the tolerance before you say “this is structurally 2 sets” or “structurally 3 sets”:
- Slightly higher structCoinFlipAbsDiff and structCoinFlipDecider.
- Keeps more matches labeled as high‑variance where 3 sets will appear often.

3. How to know if tuning worked

Don’t judge on one match; do:

Log for each match (per tour):
- P(3 sets) final (the one you display).
- Actual outcome: 2 or 3 sets.
Over 100+ matches per tour, check calibration:
- When P(3 sets) is in [35–45]% bucket, did you actually see ~40% 3‑set matches?[2]

If you’re under‑calling WTA deciders (actual is 50% when you say 40%), you:

Lower shrink weight w for WTA in shrinkDeciderToTourPrior or raise the tour prior.
Or increase authority scale slightly if the sim is consistently higher than prior.

Bottom line: this Baptiste–Zarazua slip is exactly the kind of 60/40 structure where you will be “wrong” 4 times out of 10 on set‑count.[3][1]

The right fix is not to force “correct 3 sets” on single matches, but to tighten:

Tour‑level decider prior.
Tour‑level authority scale.
Tour‑level structural coin‑flip thresholds.

[2] Bet365 Calibration Curves for all Tennis Matches (2014-2025) - Reddit https://www.reddit.com/r/algobetting/comments/1poqw0u/bet365_calibration_curves_for_all_tennis_matches/
[3] Sports prediction and betting models in the machine learning age https://journals.sagepub.com/doi/10.3233/JSA-200463

Update my script then send it as a downloadable tx link

Answer 1

Done — I updated the script and saved the patched copy here: TennisLocks_v28_timeout_fix-3 2_updated.txt

What changed, in brief:

stronger BO3 pull toward existing tour-calibrated priors using the current sets weights
tour-aware risk/confidence scaling for ATP vs WTA/CH/ITF
totals bimodal feedback now softly reduces sets confidence when split-gap variance is high
tour-aware authority softening in the set-path model
syntax checked after patching

Grounded on your uploaded script file. (TennisLocks_v28_timeout_fix-3 2.txt)