The validation module uses Pydantic to ensure data integrity and catch malformed responses from the CDN.
Overview
JSON data from the CDN can be validated using Pydantic models before conversion to DataFrames. This catches:
- Missing required fields
- Incorrect data types
- Inconsistent array lengths
- Invalid enum values
- Null-like string values ("", “none”, “null”, “nan”)
Validation is disabled by default for optimal performance. Enable it during development or when data quality is uncertain.
Validation Functions
validate_laps
Validate lap timing data structure.
def validate_laps(data: dict) -> LapData
```python
**Parameters:**
- `data`: Raw JSON dictionary from CDN
**Returns:**
- Validated `LapData` Pydantic model
**Raises:**
- `InvalidDataError`: If validation fails
**Example:**
```python
from tif1.validation import validate_laps
raw_data = {
"time": [90.123, 89.456, 88.789],
"lap": [1.0, 2.0, 3.0],
"s1": [30.1, 29.8, 29.5],
"s2": [30.0, 29.7, 29.4],
"s3": [30.0, 29.9, 29.9],
"compound": ["SOFT", "SOFT", "SOFT"],
"stint": [1, 1, 1],
"life": [1, 2, 3],
"pos": [1, 1, 1],
"status": ["1", "1", "1"],
"pb": [False, True, True],
}
validated = validate_laps(raw_data)
print(f"Validated {len(validated.lap)} laps")
validate_telemetry
Validate high-frequency telemetry data.
def validate_telemetry(data: dict) -> TelemetryData
```python
**Parameters:**
- `data`: Raw JSON dictionary from CDN
**Returns:**
- Validated `TelemetryData` Pydantic model
**Raises:**
- `InvalidDataError`: If validation fails
**Example:**
```python
from tif1.validation import validate_telemetry
raw_data = {
"time": [0.0, 0.1, 0.2],
"speed": [250.5, 251.2, 252.0],
"rpm": [12000, 12100, 12200],
"gear": [7, 7, 7],
"throttle": [100.0, 100.0, 99.5],
"brake": [False, False, False],
"drs": [True, True, True],
}
validated = validate_telemetry(raw_data)
print(f"Validated {len(validated.time)} samples")
validate_drivers
Validate driver information data.
def validate_drivers(data: dict) -> DriversData
```python
**Parameters:**
- `data`: Raw JSON dictionary from CDN
**Returns:**
- Validated `DriversData` Pydantic model
**Example:**
```python
from tif1.validation import validate_drivers
raw_data = {
"drivers": [
{
"driver": "VER",
"team": "Red Bull Racing",
"dn": "33",
"fn": "Max",
"ln": "Verstappen",
"tc": "3671C6",
"url": "https://example.com/verstappen.png"
}
]
}
validated = validate_drivers(raw_data)
print(f"Validated {len(validated.drivers)} drivers")
validate_weather
Validate weather data structure.
def validate_weather(data: dict) -> WeatherData
```python
**Parameters:**
- `data`: Raw JSON dictionary from CDN
**Returns:**
- Validated `WeatherData` Pydantic model
**Example:**
```python
from tif1.validation import validate_weather
raw_data = {
"wT": [0.0, 60.0, 120.0],
"wAT": [25.5, 25.7, 25.9],
"wTT": [35.2, 35.5, 35.8],
"wH": [60.0, 61.0, 62.0],
"wP": [1013.0, 1013.2, 1013.5],
"wR": [False, False, False],
"wWD": [180.0, 185.0, 190.0],
"wWS": [5.0, 5.5, 6.0],
}
validated = validate_weather(raw_data)
print(f"Validated {len(validated.time)} weather samples")
``` ---
### `validate_race_control`
Validate race control messages.
```python
def validate_race_control(data: dict) -> RaceControlData
```python
**Parameters:**
- `data`: Raw JSON dictionary from CDN
**Returns:**
- Validated `RaceControlData` Pydantic model
**Example:**
```python
from tif1.validation import validate_race_control
raw_data = {
"time": [0.0, 300.0],
"cat": ["Flag", "SafetyCar"],
"msg": ["GREEN FLAG", "SAFETY CAR DEPLOYED"],
"status": ["1", "4"],
"flag": ["GREEN", "YELLOW"],
"scope": ["Track", "Track"],
}
validated = validate_race_control(raw_data)
print(f"Validated {len(validated.time)} messages")
Pydantic Models
LapData
Model for lap timing data with consistent length validation.
Required Fields:
time: Lap time in seconds (float | None)
lap: Lap number (float | None)
s1, s2, s3: Sector times in seconds (float | None)
compound: Tire compound (str | None)
stint: Stint number (int | None)
life: Tire age in laps (int | None)
pos: Position (int | None)
status: Track status code (str | None)
pb: Personal best indicator (bool | None)
Optional Fields (with aliases):
session_time (alias: sesT): Session time (float | None)
source_driver (alias: drv): Driver code (str | None)
driver_number (alias: dNum): Driver number (str | None)
pit_out_time (alias: pout): Pit out time (float | None)
pit_in_time (alias: pin): Pit in time (float | None)
sector1_session_time (alias: s1T): Sector 1 session time (float | None)
sector2_session_time (alias: s2T): Sector 2 session time (float | None)
sector3_session_time (alias: s3T): Sector 3 session time (float | None)
speed_i1 (alias: vi1): Speed trap I1 (float | None)
speed_i2 (alias: vi2): Speed trap I2 (float | None)
speed_fl (alias: vfl): Speed trap finish line (float | None)
speed_st (alias: vst): Speed trap straight (float | None)
fresh_tyre (alias: fresh): Fresh tire indicator (bool | None)
deleted (alias: del): Deleted lap indicator (bool | None)
- Weather fields:
air_temp, track_temp, humidity, pressure, rainfall, wind_direction, wind_speed
Validation:
- All non-empty lists must have same length
- Stint numbers must be >= 1
- Tire life must be >= 0
- Null-like strings ("", “none”, “null”, “nan”) converted to None
Example:
from tif1.validation import LapData
lap_data = LapData(
time=[90.1, 89.5, 88.9],
lap=[1.0, 2.0, 3.0],
s1=[30.0, 29.8, 29.5],
s2=[30.1, 29.9, 29.6],
s3=[30.0, 29.8, 29.8],
compound=["SOFT", "SOFT", "SOFT"],
stint=[1, 1, 1],
life=[1, 2, 3],
pos=[1, 1, 1],
status=["1", "1", "1"],
pb=[False, True, True],
)
print(f"Valid lap data with {len(lap_data.lap)} laps")
TelemetryData
Model for high-frequency telemetry data.
Required Fields:
time: Time from lap start in seconds (float | None)
speed: Speed in km/h (float | None)
Optional Fields:
rpm: Engine RPM (float | None)
gear: Gear number 0-8 (int | None)
throttle: Throttle position 0-100% (float | None)
brake: Brake status (bool | None)
drs: DRS status (bool | None)
distance: Distance from lap start in meters (float | None)
rel_distance: Relative distance 0-1 (float | None)
driver_ahead (alias: DriverAhead): Driver ahead code (str | None)
distance_to_driver_ahead (alias: DistanceToDriverAhead): Gap in meters (float | None)
x, y, z: 3D coordinates (float | None)
acc_x, acc_y, acc_z: Acceleration components (float | None)
data_key (alias: dataKey): Data source key (str | None)
Special Handling:
- Supports nested
tel object that gets unwrapped during validation
- Boolean coercion for
brake and drs fields
- Null-like strings converted to None
Validation:
- All non-empty lists must have same length
- Empty lists are allowed for optional fields
Example:
from tif1.validation import TelemetryData
tel_data = TelemetryData(
time=[0.0, 0.1, 0.2],
speed=[250.0, 251.0, 252.0],
rpm=[12000, 12100, 12200],
gear=[7, 7, 7],
throttle=[100.0, 100.0, 99.5],
brake=[False, False, True],
drs=[True, True, False],
distance=[0.0, 25.0, 50.0],
x=[0.0, 1.0, 2.0],
y=[0.0, 0.0, 0.0],
z=[0.0, 0.0, 0.0],
)
print(f"Valid telemetry with {len(tel_data.time)} samples")
WeatherData
Model for weather information.
Required Field:
time (alias: wT): Timestamp in seconds (float | None)
Optional Fields (all with aliases):
air_temp (alias: wAT): Air temperature in °C (float | None)
track_temp (alias: wTT): Track temperature in °C (float | None)
humidity (alias: wH): Relative humidity % (float | None)
pressure (alias: wP): Atmospheric pressure in mbar (float | None)
rainfall (alias: wR): Rainfall indicator (bool | None)
wind_direction (alias: wWD): Wind direction in degrees (float | None)
wind_speed (alias: wWS): Wind speed in km/h (float | None)
Special Handling:
- Accepts both PascalCase (Time, AirTemp) and aliased keys (wT, wAT)
- PascalCase keys automatically normalized to snake_case
Validation:
- All non-empty lists must have same length
- Null-like strings converted to None
RaceControlData
Model for race control messages.
Required Field:
time: Message timestamp in seconds (float | None)
Optional Fields:
category (alias: cat): Message category (str | None)
message (alias: msg): Message text (str | None)
status: Track status code (str | None)
flag: Flag type (str | None)
scope: Message scope (str | None)
sector: Affected sector (int | str | None)
racing_number (alias: dNum): Affected driver number (str | None)
lap: Lap number (int | None)
Validation:
- All non-empty lists must have same length
- Null-like strings converted to None
DriversData
Model for driver information.
Fields:
drivers: List of DriverInfo objects
DriverInfo
Model for individual driver information.
Fields:
driver: 3-letter code, must match pattern ^[A-Z]{3}$ (str)
team: Team name, 1-100 characters (str)
dn: Driver number (str)
fn: First name (str)
ln: Last name (str)
tc: Team color hex code (str)
url: Headshot photo URL (str)
Validation:
- Driver code must be exactly 3 uppercase letters
- Team name must be 1-100 characters
Enums
TireCompound
Valid tire compound values.
class TireCompound(str, Enum):
SOFT = "SOFT"
MEDIUM = "MEDIUM"
HARD = "HARD"
INTERMEDIATE = "INTERMEDIATE"
WET = "WET"
UNKNOWN = "UNKNOWN"
TEST_UNKNOWN = "TEST-UNKNOWN"
SessionType
Valid session type values.
class SessionType(str, Enum):
PRACTICE_1 = "Practice 1"
PRACTICE_2 = "Practice 2"
PRACTICE_3 = "Practice 3"
QUALIFYING = "Qualifying"
SPRINT = "Sprint"
SPRINT_QUALIFYING = "Sprint Qualifying"
SPRINT_SHOOTOUT = "Sprint Shootout"
RACE = "Race"
LapStatus
Valid lap status values.
class LapStatus(str, Enum):
VALID = "VALID"
INVALID = "INVALID"
OUTLAP = "OUTLAP"
INLAP = "INLAP"
Anomaly Detection
detect_lap_anomalies
Detect anomalies in lap data (outliers, missing data, etc.).
def detect_lap_anomalies(laps: list[dict]) -> list[Anomaly]
```python **Parameters:**
- `laps`: List of lap dictionaries
**Returns:**
- List of detected `Anomaly` objects
**Example:**
```python
from tif1.validation import detect_lap_anomalies
laps = [
{"lap": 1, "time": 90.0},
{"lap": 2, "time": 89.5},
{"lap": 4, "time": 89.2}, # Missing lap 3
{"lap": 5, "time": 270.0}, # Outlier (3x average)
]
anomalies = detect_lap_anomalies(laps)
for anomaly in anomalies:
print(f"[{anomaly.severity}] {anomaly.type}: {anomaly.description}")
print(f" Details: {anomaly.details}")
Anomaly
Model for detected anomalies.
Fields:
type: Anomaly type (AnomalyType enum)
severity: Severity level - “low”, “medium”, or “high” (str)
description: Human-readable description (str)
details: Additional context dictionary (dict[str, Any])
AnomalyType
Types of anomalies that can be detected.
class AnomalyType(str, Enum):
MISSING_LAPS = "missing_laps"
DUPLICATE_LAPS = "duplicate_laps"
OUTLIER_TIMES = "outlier_times"
Configuration
Disable Validation
For production environments where data quality is trusted:
import tif1
config = tif1.get_config()
config.set("validate_data", False)
# Validation is now skipped, ~10-15% faster
session = tif1.get_session(2025, "Monaco", "Race")
```python
---
### Strict Validation
Enable strict validation for development:
```python
from tif1.validation import validate_lap_data
# Strict mode raises on warnings
validated = validate_lap_data(raw_data, strict=True)
```python
---
## Configuration
Validation in tif1 is controlled through the `validate_data` configuration option, which is disabled by default for optimal performance.
### Enable Validation
```python
import tif1
config = tif1.get_config()
config.set("validate_data", True)
# Now validation will be applied to drivers, weather, and race control data
session = tif1.get_session(2021, "Belgian Grand Prix", "Race")
Validation Behavior
validate_data=True: Validates drivers.json, weather.json, and rcm.json payloads
validate_data=False (default): Skips validation for maximum performance
- Validation is automatically disabled in ultra-cold start mode regardless of config
Strict Mode
Individual validation functions support strict mode:
from tif1.validation import validate_lap_data
# Strict mode raises on validation errors
validated = validate_lap_data(raw_data, strict=True)
# Non-strict mode returns original data on validation failure
validated = validate_lap_data(raw_data, strict=False)
Complete Examples
Custom Validation
from tif1.validation import validate_laps, LapData
def validate_and_clean_laps(raw_data: dict) -> LapData:
"""Validate laps and clean invalid data."""
try:
validated = validate_laps(raw_data)
return validated
except Exception as e:
print(f"Validation error: {e}")
# Clean data
cleaned = clean_lap_data(raw_data)
# Retry validation
return validate_laps(cleaned)
def clean_lap_data(data: dict) -> dict:
"""Remove invalid entries from lap data."""
# Remove laps with missing times
valid_indices = [
i for i, time in enumerate(data.get("time", []))
if time is not None and time > 0
]
# Filter all fields
cleaned = {}
for key, values in data.items():
if isinstance(values, list):
cleaned[key] = [values[i] for i in valid_indices]
else:
cleaned[key] = values
return cleaned
# Usage
raw_data = load_raw_lap_data()
validated = validate_and_clean_laps(raw_data)
Anomaly Detection Workflow
from tif1.validation import detect_lap_anomalies, AnomalyType
import tif1
def analyze_session_quality(year, gp, session_name):
"""Analyze data quality for a session."""
session = tif1.get_session(year, gp, session_name)
# Get all laps
laps = session.laps
# Convert to list of dicts
lap_dicts = laps.to_dict('records')
# Detect anomalies
anomalies = detect_lap_anomalies(lap_dicts)
# Categorize anomalies
missing = [a for a in anomalies if a.type == AnomalyType.MISSING_LAPS]
duplicates = [a for a in anomalies if a.type == AnomalyType.DUPLICATE_LAPS]
outliers = [a for a in anomalies if a.type == AnomalyType.OUTLIER_TIMES]
print(f"Data Quality Report:")
print(f" Total laps: {len(lap_dicts)}")
print(f" Missing laps: {len(missing)}")
print(f" Duplicate laps: {len(duplicates)}")
print(f" Outlier times: {len(outliers)}")
# Show details
for anomaly in anomalies:
print(f" [{anomaly.severity}] {anomaly.description}")
if anomaly.details:
print(f" {anomaly.details}")
return anomalies
# Usage with 2021 Belgian Grand Prix Race
anomalies = analyze_session_quality(2021, "Belgian Grand Prix", "Race")
Validation with Logging
from tif1.validation import validate_laps, validate_telemetry
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def validate_with_logging(data: dict, data_type: str):
"""Validate data with detailed logging."""
logger.info(f"Validating {data_type} data...")
try:
if data_type == "laps":
validated = validate_laps(data)
elif data_type == "telemetry":
validated = validate_telemetry(data)
else:
raise ValueError(f"Unknown data type: {data_type}")
logger.info(f"Validation successful")
return validated
except Exception as e:
logger.error(f"Validation failed: {e}")
raise
# Usage
validated = validate_with_logging(raw_data, "laps")
Best Practices
- Use strict mode during development: Catches data issues early.
validated = validate_lap_data(data, strict=True)
- Handle validation errors gracefully: Don’t crash on bad data.
from tif1.exceptions import InvalidDataError
try:
validated = validate_laps(data)
except Exception as e:
# Log and use fallback
logger.warning(f"Validation failed: {e}")
pass
-
Run anomaly detection periodically: Monitor data quality over time.
-
Clean data before validation: Remove obvious errors first using normalization functions.
-
Leverage null-like string conversion: The validation module automatically converts "", “none”, “null”, “nan” to None.
Troubleshooting
Validation Errors
from tif1.exceptions import InvalidDataError
try:
validated = validate_laps(data)
except Exception as e:
print(f"Error: {e}")
# Check specific fields by inspecting the error message
if "stint" in str(e).lower():
print("Issue with stint numbers")
elif "life" in str(e).lower():
print("Issue with tire life values")
Inconsistent Lengths
# Check array lengths before validation
lengths = {key: len(val) for key, val in data.items() if isinstance(val, list)}
print(f"Array lengths: {lengths}")
# All non-empty arrays should be equal
non_empty_lengths = [l for l in lengths.values() if l > 0]
if len(set(non_empty_lengths)) > 1:
print("Inconsistent array lengths detected")
# Validation is already optimized and disabled by default
# For manual validation, use non-strict mode
validated = validate_laps(data, strict=False)
# Or skip validation entirely by not calling validation functions
# The library handles this automatically based on performance settings