Forecasting wind conditions.

forecasting

rocket launch

Python

machine learning

Using machine learning to forecast wind conditions for rocket launch sites.

Author

Winfred

Published

March 14, 2025

Introduction

Selecting the nearest weather station isn’t just a box to check—it’s the difference between an rocket soaring smoothly or spiraling into a field for you to later pick up the pieces.

Wind conditions can shift dramatically within a few kilometers, and relying on distant stations is like trusting yesterday’s lottery numbers to predict tomorrow’s storm. For launch sites, hyperlocal data is everything. Machine learning steps in as the unsung hero here, crunching historical patterns, real-time feeds, and topographical quirks from the closest sensors to forecast wind behavior that generic models might miss. It’s not just about avoiding bad weather; it’s about rewriting the rules of when—and where—a smaller rocket can defy the breeze.

Selecting the nearest weather station to the launch site.

The data for this example can be obtained here.

Stations_id	von_datum	bis_datum	Stationshoehe	geoBreite	geoLaenge	Stationsname	Bundesland	Abgabe
12345	20100115	20190630	325	51.4872	9.7531	Bergdorf	Sachsen	Frei
67890	19950212	20210428	178	48.6291	11.3427	Wiesenau	Bayern	Frei
24680	20030917	20250305	86	53.8712	7.4893	Nordhafen	Niedersachsen	Frei
…	…	…	…	…	…	…	…	…

Table showing first 3 rows of 1188 weather station records

The Haversine formula can be used to find the nearest location based target latitude/longitude.

import math

def haversine(lat1, lon1, lat2, lon2):
    """
    Calculate the great-circle distance between two points 
    on the Earth (specified in decimal degrees)
    """
    lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])

    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

    R = 6371
    return R * c

def find_nearest_location(file_path, target_lat, target_lon):
    min_distance = float('inf')
    nearest_location = None

    with open(file_path, 'r') as file:
        for line in file:
            parts = line.strip().split()

            if len(parts) < 8:
                continue

            try:
                lat = float(parts[4])
                lon = float(parts[5])
                name = parts[6]
                state = parts[7]
            except (ValueError, IndexError):
                continue

            distance = haversine(lat, lon, target_lat, target_lon)
            
            if distance < min_distance:
                min_distance = distance
                nearest_location = {
                    'name': name,
                    'state': state,
                    'latitude': lat,
                    'longitude': lon,
                    'distance': distance
                }

    return nearest_location

target_lat = # insert your target latitude
target_lon = # insert your target longitude

nearest = find_nearest_location("../data/weather_station.txt", target_lat, target_lon)

if nearest:
    print(f"Nearest location: {nearest['name']}, {nearest['state']}")
    print(f"Coordinates: ({nearest['latitude']:.6f}, {nearest['longitude']:.6f})")
    print(f"Distance: {nearest['distance']:.2f} km")
else:
    print("No locations found or data file is empty")

Producing the following:

Nearest location: Nordhafen , Niedersachsen
Coordinates: (53.8712, 7.4893)
Distance: 7.10 km

Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge Stationsname Bundesland Abgabe
----------- --------- --------- ------------- --------- --------- ----------------------------------------- ---------- ------
24680 19520101 20250305              7     53.8712     7.4893   Nordhafen                   Niedersachsen                            Frei

Training the model (no GPU required!)

Data: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/subdaily/wind/

Paramaters which are critical for predicting wind speed is ambient temperature and pressure.


import pandas as pd

wind_data = pd.read_csv('../data/station_704_wind.txt', sep=';', dtype={'DK_TER': str})

wind_data.rename(columns={
    'STATIONS_ID': 'station_id',
    'MESS_DATUM': 'timestamp',
    'QN_4': 'quality_flag',
    'DK_TER': 'dk_ter',
    'FK_TER': 'fk_ter',
    'eor': 'end_of_record'
}, inplace=True)

wind_data['dk_ter'] = wind_data['dk_ter'].str.strip().astype(int)
wind_data = wind_data[wind_data['dk_ter'] != -999]

wind_data['timestamp'] = wind_data['timestamp'].astype(str) + '00'
wind_data['timestamp'] = pd.to_datetime(wind_data['timestamp'], format='%Y%m%d%H%M')

wind_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)

pressure_data = pd.read_csv('../data/station_704_pressure.txt', sep=';')

pressure_data.rename(columns={
    'STATIONS_ID': 'station_id',
    'MESS_DATUM': 'timestamp',
    'QN_4': 'quality_flag',
    'PP_TER': 'pressure',
    'eor': 'end_of_record'
}, inplace=True)

pressure_data['pressure'] = pd.to_numeric(pressure_data['pressure'], errors='coerce')

pressure_data['timestamp'] = pressure_data['timestamp'].astype(str) + '00'
pressure_data['timestamp'] = pd.to_datetime(pressure_data['timestamp'], format='%Y%m%d%H%M')
pressure_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)

temp_data = pd.read_csv('../data/station_704_temp.txt', sep=';')
temp_data.rename(columns={
    'STATIONS_ID': 'station_id',
    'MESS_DATUM': 'timestamp',
    'QN_4': 'quality_flag',
    'TT_TER': 'temperature',
    'RF_TER': 'humidity',
    'eor': 'end_of_record'
}, inplace=True)

temp_data['temperature'] = pd.to_numeric(temp_data['temperature'], errors='coerce')
temp_data['humidity'] = pd.to_numeric(temp_data['humidity'], errors='coerce')

temp_data['timestamp'] = temp_data['timestamp'].astype(str) + '00'
temp_data['timestamp'] = pd.to_datetime(temp_data['timestamp'], format='%Y%m%d%H%M')

temp_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)

merged_data = wind_data.merge(pressure_data, on=['timestamp', 'station_id'], how='inner')
merged_data = merged_data.merge(temp_data, on=['timestamp', 'station_id'], how='inner')

merged_data[['pressure', 'temperature', 'humidity']] = merged_data[['pressure', 'temperature', 'humidity']].interpolate().ffill()

print(merged_data.head())

Model accuracy

n_points = 100

plt.figure(figsize=(10,5))

plt.plot(
    test_y.index[-n_points:],
    test_y[-n_points:],
    label='Actual', color='blue'
)

plt.plot(
    test_y.index[-n_points:],
    test_pred[-n_points:],         
    label='Predicted', color='red', alpha=0.7
)

plt.title(f"Model Predictions vs. Actual Values (Last {n_points} points)")
plt.xlabel("Index / Time")
plt.ylabel("Target Variable")
plt.legend()
plt.show()

Forecast

import pandas as pd
from datetime import datetime

historical_avg = merged_data[['pressure', 'temperature', 'humidity', 'hour', 'day_of_year', 'day_of_week']].mean()

for lag in range(1, 25):
    historical_avg[f'fk_ter_lag_{lag}'] = merged_data['fk_ter'].mean()
    historical_avg[f'pressure_lag_{lag}'] = merged_data['pressure'].mean()
    historical_avg[f'temperature_lag_{lag}'] = merged_data['temperature'].mean()
    historical_avg[f'humidity_lag_{lag}'] = merged_data['humidity'].mean()

historical_avg['fk_ter_rolling_mean'] = merged_data['fk_ter'].mean()
historical_avg['fk_ter_rolling_std'] = merged_data['fk_ter'].std()

historical_avg = historical_avg[train_X.columns]

future_dates = pd.date_range(start='2025-05-01', end='2025-05-31 23:00:00', freq='H')
future_df = pd.DataFrame({'timestamp': future_dates})

future_df['hour'] = future_df['timestamp'].dt.hour
future_df['day_of_year'] = future_df['timestamp'].dt.dayofyear
future_df['day_of_week'] = future_df['timestamp'].dt.dayofweek

last_historical = merged_data.iloc[-24*3:]

def forecast_month(model, last_historical, future_dates):
    forecast = []
    current_window = last_historical.copy()
    
    for ts in future_dates:
        hour = ts.hour
        doy = ts.dayofyear
        dow = ts.dayofweek

        features = {
            'hour': hour,
            'day_of_year': doy,
            'day_of_week': dow,
            'pressure': current_window['pressure'].iloc[-1] if len(current_window) > 0 else np.nan,
            'temperature': current_window['temperature'].iloc[-1] if len(current_window) > 0 else np.nan,
            'humidity': current_window['humidity'].iloc[-1] if len(current_window) > 0 else np.nan,
            'fk_ter_rolling_mean': current_window['fk_ter'].rolling(window=24).mean().iloc[-1] if len(current_window) >= 24 else np.nan,
            'fk_ter_rolling_std': current_window['fk_ter'].rolling(window=24).std().iloc[-1] if len(current_window) >= 24 else np.nan,
        }

        for lag in range(1, 25):
            features[f'fk_ter_lag_{lag}'] = current_window['fk_ter'].iloc[-lag] if len(current_window) >= lag else np.nan
            features[f'pressure_lag_{lag}'] = current_window['pressure'].iloc[-lag] if len(current_window) >= lag else np.nan
            features[f'temperature_lag_{lag}'] = current_window['temperature'].iloc[-lag] if len(current_window) >= lag else np.nan
            features[f'humidity_lag_{lag}'] = current_window['humidity'].iloc[-lag] if len(current_window) >= lag else np.nan

        feature_df = pd.DataFrame([features], columns=train_X.columns)

        feature_df = feature_df.fillna(historical_avg)

        pred = model.predict(feature_df)[0]
        forecast.append(pred)

        current_window = pd.concat([
            current_window,
            pd.DataFrame([{
                'timestamp': ts,
                'fk_ter': pred,
                'pressure': features['pressure'],
                'temperature': features['temperature'],
                'humidity': features['humidity'],
            }])
        ]).iloc[-24:]
    
    return forecast

may_2025_predictions = forecast_month(model, last_historical, future_dates)

future_df['predicted_wind_speed'] = may_2025_predictions

future_df['uncertainty'] = np.std(may_2025_predictions[-24:])

import matplotlib.pyplot as plt

plt.figure(figsize=(15, 6))
plt.plot(future_df['timestamp'], future_df['predicted_wind_speed'], label='Predicted')
plt.fill_between(future_df['timestamp'], 
                 future_df['predicted_wind_speed'] - future_df['uncertainty'],
                 future_df['predicted_wind_speed'] + future_df['uncertainty'],
                 alpha=0.2)

plt.title('May 2025 Wind Speed Forecast')
plt.xlabel('Date')
plt.ylabel('Wind Speed (m/s)')
plt.legend()
plt.show()