PlaneWX Research|March 2026

We Put Our Mountain Wave Detection
On Trial.

Mountain wave and rotor turbulence are among the most dangerous phenomena for general aviation. We needed to know if our detection system was good enough to stake a pilot’s safety on. So we didn’t ask one AI to review it — we put four frontier AI models in a room and let them argue about it for 20 rounds.

By Mark Wolfgang, Founder & CEO, PlaneWX

The Problem

PlaneWX analyzes wind flow relative to terrain ridges along a pilot’s filed route. We decompose model sounding winds into perpendicular-to-ridge components, compute Froude numbers, estimate Brunt-Väisälä frequency, and classify mountain wave and rotor severity — all in real time, for every briefing.

But mountain meteorology is hard. The literature spans decades of research by Durran, Vosper, Reinecke, Sharman, and others. The algorithms involve physical constants, threshold values, and edge cases that interact in non-obvious ways. Getting it wrong means either false alarms that erode pilot trust, or — far worse — missing a real hazard.

I needed a rigorous methodology review that would challenge every assumption, check every threshold against published science, and probe every edge case. Hiring a panel of mountain meteorology consultants would cost thousands of dollars and take weeks. I needed answers today.

The Approach: Adversarial AI Debate

I used Nestr, a platform that structures debates between multiple AI models. Instead of asking one AI “is this right?” (and getting a polite, agreeable answer), I configured a four-model adversarial debate: one proposer defending the methodology, three challengers probing for weaknesses, and a synthesizer delivering the final verdict.

Proposer

o3

Defended the methodology against FAA standards

Challenger

Grok 4.1 Fast

Probed edge cases and under-represented terrain

Challenger

Claude Sonnet 4.6

Demanded evidence for every claim

Challenger

Grok 3

Pressure-tested real-world applicability

4

AI models

20

rounds of debate

1.3M

tokens analyzed

20 min

start to finish

What They Found

The debate started as a validation exercise and evolved into a deep collaborative research session. By round 5, the models weren’t just finding problems — they were proposing solutions, debating implementation details, and holding each other accountable for unsupported claims. Six concrete improvements emerged.

01

Scorer Parameter Gate

Not all blocked-flow conditions produce rotors. Rotor turbulence requires trapped lee waves, which only form when the atmospheric profile supports wave trapping. The debate identified that our system was missing the Scorer parameter (l²) — a critical diagnostic for whether waves trap near the surface or propagate harmlessly upward. We now compute l² from model soundings between ridge crest and ~4 km above crest. When waves aren’t trapped (l² < 0.25), rotor alerts are suppressed — preventing false alarms in common winter jet-stream scenarios.

References: Scorer (1949), Durran (1990), Reinecke & Durran (2008)

02

Relief-Scaled Wind Thresholds

Our original thresholds (25/30/40 kt for light/moderate/severe) were calibrated for major barriers like the Rockies and Sierra Nevada. Grok 4.1 immediately challenged: “How do they hold for lower-relief terrain like the Appalachians, where PIREPs often show waves at <25 kt perpendicular flow?” The answer: they don’t. We now scale thresholds dynamically based on terrain relief — lower-relief terrain gets lower thresholds, preventing under-detection in the Appalachians, Ozarks, and coastal ranges.

Formula: base = 18 + (relief_ft / 1,000) × 4 kt, clamped to [15, 40]

03

Complex Terrain Modifier

Simple 2D Froude analysis can over-predict rotor severity in complex 3D terrain where multiple ridge orientations exist — sinuous valleys, overlapping ridges, convergent flows. Grok 3 pushed on Colorado Front Range scenarios where real conditions are less severe than 2D theory predicts. We now compute the circular variance of ridge orientations and inflate the Froude number by 25% when terrain is complex, except in deeply blocked flows (Fr < 0.35) where confined valleys can amplify hazards.

References: Vosper (2004), Smith (1989)

04

Sub-2,000 ft Terrain Promotion

Lower-relief terrain was being systematically under-detected. When the Froude number indicates blocked flow (Fr < 0.60) and cross-ridge winds are 25+ kt, severity is now promoted by one level — even over modest 1,500 ft ridges. A pilot crossing the Blue Ridge or Ozarks in strong flow deserves the same quality of hazard detection as someone crossing the Continental Divide.

05

Low-Confidence Ridge Fallback

When terrain gradient is weak or ridge orientation can’t be determined with confidence, our v1 system used 100% of wind speed as a worst-case perpendicular estimate. This over-alerted constantly. The debate established that the RMS value of |sin(θ)| over all possible ridge orientations is ~0.707, so we now use 70% of total wind speed as a statistically grounded fallback — honest about uncertainty without crying wolf.

06

Perpendicular Projection Bug Fix

The unit tests written to verify the Scorer parameter uncovered a real bug: the perpendicular wind projection formula was computing the along-ridge component instead of the cross-ridge component (sin/cos vs. cos/sin). For a N-S ridge with westerly flow, the formula returned zero instead of full wind speed. This wasn’t caught in manual testing because typical wind/ridge combinations still produced reasonable-looking numbers. Only the physics-aware test fixture — designed to verify that westerly wind is fully perpendicular to a N-S ridge — exposed it.

Why Debate Beats Solo Review

Ask a single AI to “validate my mountain wave detection methodology” and you’ll get a helpful but agreeable review. It might flag a few issues, but it won’t spend 20 rounds holding your feet to the fire.

The power of the adversarial format is that the models challenge each other, not just the input. When o3 cited the perpendicular wind thresholds from AC 00-57, Grok 4.1 immediately asked about Appalachian under-detection. When o3 proposed the Scorer parameter, Claude Sonnet challenged the ceiling limit and pushed it from crest+2km to crest+4km. When improvements were proposed, Claude Sonnet demanded end-to-end integrated validation before any could be released.

By round 19, all three challengers independently converged on the same conclusion: the methodology was sound in principle, but couldn’t be released without an integrated end-to-end validation run. That consensus — from three independent reasoning systems across two providers — carries more weight than any single model’s rubber stamp.

Read the full 20-round debate: Mountain Wave Detection Case Study on Nestr — all 80 messages and 1.3 million tokens of mountain meteorology debate, unedited.

Validation

The debate produced the improvements. But the debate isn’t the validation — the testing is. Every improvement was implemented, tested, and verified before deployment.

59

Unit Tests

Covering Scorer parameter, relief-scaled thresholds, terrain complexity, Froude number, Brunt-Väisälä frequency, severity promotion, and full integration tests via the analysis pipeline.

6

Regression Routes

Rocky Mountain crossing (KSUS-KSLC), Sierra Nevada (KOAK-KRNO), Colorado Front Range (KDEN-KASE), Appalachians (KJFK-KLEX), Cascades (KBFI-KELN), and a flat negative control (KDFW-KIAH).

v1 ↔ v2

Side-by-Side Comparison

Automated comparison of v1 fixed thresholds vs. v2 relief-scaled thresholds on every regression route, tracking escalations, de-escalations, and Scorer gate suppressions.

First Regression Run Results

March 16, 2026 — all 6 routes against local dev server with live weather data:

RouteTerrainWaveHazardsv1→v2 Changes
KSUS→KSLChigh-mountainmoderate63 de-escalations
KDEN→KASEhigh-mountainmoderate3
KOAK→KRNOmountainousnone0
KJFK→KLEXhillynone0
KBFI→KELNmountainousnone0
KDFW→KIAHflatnone0control passed

The 3 de-escalations on KSUS→KSLC are expected: v2’s relief-scaled thresholds correctly raise the bar for high-relief Rocky Mountain terrain, where stronger winds are needed to generate the same severity level. Light-weather conditions on the March 16 test date meant the Scorer gate and complex-terrain modifier weren’t exercised in production — they are fully covered by unit tests.

What This Means for You

Fewer false alarms

The Scorer parameter gate prevents rotor alerts when the atmosphere doesn’t actually support trapped lee waves. Strong upper-level winds in common winter jet-stream patterns will no longer trigger unnecessary rotor warnings.

Better detection in lower terrain

If you fly the Appalachians, Ozarks, or coastal ranges, the relief-scaled thresholds and sub-2,000 ft promotion mean PlaneWX will flag mountain wave conditions that v1 would have missed. Your briefing reflects the terrain you’re actually flying over.

Transparent methodology

Every algorithm, every threshold, every physical constant is documented in our help center. We publish the science behind the system because you deserve to know how your safety decisions are being informed.

References

Durran, D. R. (1990). Mountain waves and downslope winds. Atmospheric Processes over Complex Terrain, Meteor. Monogr. 23(45), 59–81.

FAA Advisory Circular AC 00-57, Hazardous Mountain Winds and Their Visual Indicators.

Reinecke, P. A., & Durran, D. R. (2008). Estimating topographic blocking using a Froude number when the static stability is nonuniform. J. Atmos. Sci., 65(4), 1035–1048.

Scorer, R. S. (1949). Theory of waves in the lee of mountains. Quart. J. Roy. Meteor. Soc., 75, 41–56.

Sharman, R. D., Tebaldi, C., Wiener, G., & Wolff, J. (2006). An integrated approach to mid- and upper-level turbulence forecasting. Wea. Forecasting, 21, 268–287.

Smith, R. B. (1989). Hydrostatic airflow over mountains. Advances in Geophysics, 31, 1–41.

Vosper, S. B. (2004). Inversion effects on mountain lee waves. Quart. J. Roy. Meteor. Soc., 130, 1723–1748.

See Mountain Wave Detection
In Your Next Briefing

Create a free briefing for any mountain route. You’ll see terrain classification, cross-barrier wind analysis, Froude number rotor detection, and Scorer parameter gate diagnostics — all explained in plain English.