Forecasting wind conditions.

Introduction
Selecting the nearest weather station isn’t just a box to check—it’s the difference between an rocket soaring smoothly or spiraling into a field for you to later pick up the pieces.
Wind conditions can shift dramatically within a few kilometers, and relying on distant stations is like trusting yesterday’s lottery numbers to predict tomorrow’s storm. For launch sites, hyperlocal data is everything. Machine learning steps in as the unsung hero here, crunching historical patterns, real-time feeds, and topographical quirks from the closest sensors to forecast wind behavior that generic models might miss. It’s not just about avoiding bad weather; it’s about rewriting the rules of when—and where—a smaller rocket can defy the breeze.
Selecting the nearest weather station to the launch site.
The data for this example can be obtained here.
| Stations_id | von_datum | bis_datum | Stationshoehe | geoBreite | geoLaenge | Stationsname | Bundesland | Abgabe |
|---|---|---|---|---|---|---|---|---|
| 12345 | 20100115 | 20190630 | 325 | 51.4872 | 9.7531 | Bergdorf | Sachsen | Frei |
| 67890 | 19950212 | 20210428 | 178 | 48.6291 | 11.3427 | Wiesenau | Bayern | Frei |
| 24680 | 20030917 | 20250305 | 86 | 53.8712 | 7.4893 | Nordhafen | Niedersachsen | Frei |
| … | … | … | … | … | … | … | … | … |
Table showing first 3 rows of 1188 weather station records
The Haversine formula can be used to find the nearest location based target latitude/longitude.
import math
def haversine(lat1, lon1, lat2, lon2):
"""
Calculate the great-circle distance between two points
on the Earth (specified in decimal degrees)
"""
lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
R = 6371
return R * c
def find_nearest_location(file_path, target_lat, target_lon):
min_distance = float('inf')
nearest_location = None
with open(file_path, 'r') as file:
for line in file:
parts = line.strip().split()
if len(parts) < 8:
continue
try:
lat = float(parts[4])
lon = float(parts[5])
name = parts[6]
state = parts[7]
except (ValueError, IndexError):
continue
distance = haversine(lat, lon, target_lat, target_lon)
if distance < min_distance:
min_distance = distance
nearest_location = {
'name': name,
'state': state,
'latitude': lat,
'longitude': lon,
'distance': distance
}
return nearest_location
target_lat = # insert your target latitude
target_lon = # insert your target longitude
nearest = find_nearest_location("../data/weather_station.txt", target_lat, target_lon)
if nearest:
print(f"Nearest location: {nearest['name']}, {nearest['state']}")
print(f"Coordinates: ({nearest['latitude']:.6f}, {nearest['longitude']:.6f})")
print(f"Distance: {nearest['distance']:.2f} km")
else:
print("No locations found or data file is empty")Producing the following:
Nearest location: Nordhafen , Niedersachsen
Coordinates: (53.8712, 7.4893)
Distance: 7.10 km
Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge Stationsname Bundesland Abgabe
----------- --------- --------- ------------- --------- --------- ----------------------------------------- ---------- ------
24680 19520101 20250305 7 53.8712 7.4893 Nordhafen Niedersachsen Frei
Training the model (no GPU required!)
Data: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/subdaily/wind/
Paramaters which are critical for predicting wind speed is ambient temperature and pressure.
import pandas as pd
wind_data = pd.read_csv('../data/station_704_wind.txt', sep=';', dtype={'DK_TER': str})
wind_data.rename(columns={
'STATIONS_ID': 'station_id',
'MESS_DATUM': 'timestamp',
'QN_4': 'quality_flag',
'DK_TER': 'dk_ter',
'FK_TER': 'fk_ter',
'eor': 'end_of_record'
}, inplace=True)
wind_data['dk_ter'] = wind_data['dk_ter'].str.strip().astype(int)
wind_data = wind_data[wind_data['dk_ter'] != -999]
wind_data['timestamp'] = wind_data['timestamp'].astype(str) + '00'
wind_data['timestamp'] = pd.to_datetime(wind_data['timestamp'], format='%Y%m%d%H%M')
wind_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)
pressure_data = pd.read_csv('../data/station_704_pressure.txt', sep=';')
pressure_data.rename(columns={
'STATIONS_ID': 'station_id',
'MESS_DATUM': 'timestamp',
'QN_4': 'quality_flag',
'PP_TER': 'pressure',
'eor': 'end_of_record'
}, inplace=True)
pressure_data['pressure'] = pd.to_numeric(pressure_data['pressure'], errors='coerce')
pressure_data['timestamp'] = pressure_data['timestamp'].astype(str) + '00'
pressure_data['timestamp'] = pd.to_datetime(pressure_data['timestamp'], format='%Y%m%d%H%M')
pressure_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)
temp_data = pd.read_csv('../data/station_704_temp.txt', sep=';')
temp_data.rename(columns={
'STATIONS_ID': 'station_id',
'MESS_DATUM': 'timestamp',
'QN_4': 'quality_flag',
'TT_TER': 'temperature',
'RF_TER': 'humidity',
'eor': 'end_of_record'
}, inplace=True)
temp_data['temperature'] = pd.to_numeric(temp_data['temperature'], errors='coerce')
temp_data['humidity'] = pd.to_numeric(temp_data['humidity'], errors='coerce')
temp_data['timestamp'] = temp_data['timestamp'].astype(str) + '00'
temp_data['timestamp'] = pd.to_datetime(temp_data['timestamp'], format='%Y%m%d%H%M')
temp_data.drop(columns=['end_of_record', 'quality_flag'], inplace=True)
merged_data = wind_data.merge(pressure_data, on=['timestamp', 'station_id'], how='inner')
merged_data = merged_data.merge(temp_data, on=['timestamp', 'station_id'], how='inner')
merged_data[['pressure', 'temperature', 'humidity']] = merged_data[['pressure', 'temperature', 'humidity']].interpolate().ffill()
print(merged_data.head())Model accuracy
.png)
n_points = 100
plt.figure(figsize=(10,5))
plt.plot(
test_y.index[-n_points:],
test_y[-n_points:],
label='Actual', color='blue'
)
plt.plot(
test_y.index[-n_points:],
test_pred[-n_points:],
label='Predicted', color='red', alpha=0.7
)
plt.title(f"Model Predictions vs. Actual Values (Last {n_points} points)")
plt.xlabel("Index / Time")
plt.ylabel("Target Variable")
plt.legend()
plt.show()Forecast
.png)
import pandas as pd
from datetime import datetime
historical_avg = merged_data[['pressure', 'temperature', 'humidity', 'hour', 'day_of_year', 'day_of_week']].mean()
for lag in range(1, 25):
historical_avg[f'fk_ter_lag_{lag}'] = merged_data['fk_ter'].mean()
historical_avg[f'pressure_lag_{lag}'] = merged_data['pressure'].mean()
historical_avg[f'temperature_lag_{lag}'] = merged_data['temperature'].mean()
historical_avg[f'humidity_lag_{lag}'] = merged_data['humidity'].mean()
historical_avg['fk_ter_rolling_mean'] = merged_data['fk_ter'].mean()
historical_avg['fk_ter_rolling_std'] = merged_data['fk_ter'].std()
historical_avg = historical_avg[train_X.columns]
future_dates = pd.date_range(start='2025-05-01', end='2025-05-31 23:00:00', freq='H')
future_df = pd.DataFrame({'timestamp': future_dates})
future_df['hour'] = future_df['timestamp'].dt.hour
future_df['day_of_year'] = future_df['timestamp'].dt.dayofyear
future_df['day_of_week'] = future_df['timestamp'].dt.dayofweek
last_historical = merged_data.iloc[-24*3:]
def forecast_month(model, last_historical, future_dates):
forecast = []
current_window = last_historical.copy()
for ts in future_dates:
hour = ts.hour
doy = ts.dayofyear
dow = ts.dayofweek
features = {
'hour': hour,
'day_of_year': doy,
'day_of_week': dow,
'pressure': current_window['pressure'].iloc[-1] if len(current_window) > 0 else np.nan,
'temperature': current_window['temperature'].iloc[-1] if len(current_window) > 0 else np.nan,
'humidity': current_window['humidity'].iloc[-1] if len(current_window) > 0 else np.nan,
'fk_ter_rolling_mean': current_window['fk_ter'].rolling(window=24).mean().iloc[-1] if len(current_window) >= 24 else np.nan,
'fk_ter_rolling_std': current_window['fk_ter'].rolling(window=24).std().iloc[-1] if len(current_window) >= 24 else np.nan,
}
for lag in range(1, 25):
features[f'fk_ter_lag_{lag}'] = current_window['fk_ter'].iloc[-lag] if len(current_window) >= lag else np.nan
features[f'pressure_lag_{lag}'] = current_window['pressure'].iloc[-lag] if len(current_window) >= lag else np.nan
features[f'temperature_lag_{lag}'] = current_window['temperature'].iloc[-lag] if len(current_window) >= lag else np.nan
features[f'humidity_lag_{lag}'] = current_window['humidity'].iloc[-lag] if len(current_window) >= lag else np.nan
feature_df = pd.DataFrame([features], columns=train_X.columns)
feature_df = feature_df.fillna(historical_avg)
pred = model.predict(feature_df)[0]
forecast.append(pred)
current_window = pd.concat([
current_window,
pd.DataFrame([{
'timestamp': ts,
'fk_ter': pred,
'pressure': features['pressure'],
'temperature': features['temperature'],
'humidity': features['humidity'],
}])
]).iloc[-24:]
return forecast
may_2025_predictions = forecast_month(model, last_historical, future_dates)
future_df['predicted_wind_speed'] = may_2025_predictions
future_df['uncertainty'] = np.std(may_2025_predictions[-24:])
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 6))
plt.plot(future_df['timestamp'], future_df['predicted_wind_speed'], label='Predicted')
plt.fill_between(future_df['timestamp'],
future_df['predicted_wind_speed'] - future_df['uncertainty'],
future_df['predicted_wind_speed'] + future_df['uncertainty'],
alpha=0.2)
plt.title('May 2025 Wind Speed Forecast')
plt.xlabel('Date')
plt.ylabel('Wind Speed (m/s)')
plt.legend()
plt.show()