io_pipeline module contains internal helpers for transforming raw JSON data into structured DataFrames. These functions handle column renaming, type coercion, and data validation.
Overview
The I/O pipeline follows this flow:Functions
_validate_json_payload
path: Resource path for error context (e.g., “laps/VER/19_tel.json”)data: Raw JSON dictionary
- Validated JSON dictionary
InvalidDataError: If validation fails
This function uses the global config singleton. The underlying implementation in
async_fetch.py accepts a config parameter, but the exported version in io_pipeline.py uses the global config automatically._extract_driver_codes
drivers: List of driver dictionaries from session JSON, or None
- Set of 3-letter driver codes (e.g., )
_extract_driver_info_map
drivers: List of driver dictionaries from session JSON, or None
- Dictionary mapping driver codes to raw metadata dictionaries containing:
driver: 3-letter driver codedn: Driver number (as string)team: Team namefirst_name: Driver’s first namelast_name: Driver’s last nameteam_color: Hex color codeheadshot_url: URL to driver photo
The returned dictionary contains raw JSON keys (snake_case), not the renamed DataFrame columns (PascalCase). Column renaming happens in
_process_lap_df._create_lap_df
lap_data: Dictionary of lap data arrays (not a list of dicts). Keys are internal JSON field names like"lap","time","s1", etc.driver: 3-letter driver code (e.g., “VER”)team: Team name (e.g., “Red Bull Racing”)lib: DataFrame library to use (“pandas” or “polars”)
- DataFrame with raw lap timing data (before column renaming)
lap: Lap number (1-indexed)time: Lap time in secondss1,s2,s3: Sector timescompound: Tire compoundlife: Tire age in lapsstint: Stint numberpb: Personal best flagvi1,vi2,vfl,vst: Speed trap valuesstatus: Track status codepos: Position at lap enddel: Lap deleted flagdelR: Deletion reasonff1G: FastF1 generated flagDriver: Driver code (added by this function)Team: Team name (added by this function)
_create_session_df
data: Raw data dictionary with arraysrename_map: Column rename mapping (e.g.,WEATHER_RENAME_MAPorRACE_CONTROL_RENAME_MAP)lib: DataFrame library to use (“pandas” or “polars”)
- DataFrame with renamed columns according to the provided rename map
_process_lap_df
lap_df: Raw lap DataFrame from_create_lap_dflib: DataFrame library (“pandas” or “polars”)
- Processed DataFrame with:
- Renamed columns (snake_case → PascalCase)
- Proper data types (timedelta for lap times, float64 for numeric, etc.)
- Categorical types for Driver, Team, Compound, TrackStatus (pandas only by default)
- FastF1-compatible column order
- Column renaming via
LAP_RENAME_MAP(e.g.,"lap"→"LapNumber","time"→"LapTime") - Type coercion:
LapTime: float seconds → timedelta64[ns]Time,Sector1Time, etc.: float seconds → timedelta64[ns]- Numeric columns: → float64
- Boolean columns: → bool
- Add
LapTimeSecondscolumn (float representation of LapTime) - Apply categorical types (pandas only, unless
polars_lap_categoricalconfig is enabled) - Reorder columns to match FastF1 convention
Column naming conventions
The I/O pipeline transforms raw JSON keys to FastF1-compatible column names:| JSON Key | DataFrame Column | Type | Description |
|---|---|---|---|
lap | LapNumber | float64 | Lap number (1-indexed) |
time | LapTime | timedelta64[ns] | Lap time |
s1 | Sector1Time | timedelta64[ns] | Sector 1 time |
s2 | Sector2Time | timedelta64[ns] | Sector 2 time |
s3 | Sector3Time | timedelta64[ns] | Sector 3 time |
compound | Compound | str/category | Tire compound (SOFT, MEDIUM, HARD, INTERMEDIATE, WET) |
life | TyreLife | float64 | Tire age in laps |
stint | Stint | float64 | Stint number |
pb | IsPersonalBest | bool | Personal best lap flag |
vi1 | SpeedI1 | float64 | Speed trap 1 (km/h) |
vi2 | SpeedI2 | float64 | Speed trap 2 (km/h) |
vfl | SpeedFL | float64 | Finish line speed (km/h) |
vst | SpeedST | float64 | Speed trap (km/h) |
status | TrackStatus | str/category | Track status code |
pos | Position | float64 | Position at lap end |
del | Deleted | boolean | Lap deleted flag |
delR | DeletedReason | str | Reason for deletion |
ff1G | FastF1Generated | bool | FastF1 generated data flag |
sesT | Time | timedelta64[ns] | Session time at lap end |
dNum | DriverNumber | str | Driver number |
pout | PitOutTime | timedelta64[ns] | Pit out time |
pin | PitInTime | timedelta64[ns] | Pit in time |
The complete mapping is defined in
LAP_RENAME_MAP in src/tif1/core_utils/constants.py. Both validated (snake_case) and raw (abbreviated) JSON keys are supported.Library Support
The pipeline supports both pandas and polars libraries:- pandas: Uses
pd.DataFrame(data, copy=False)for zero-copy construction - polars: Uses
pl.DataFrame(data, strict=False)with schema inference - pandas: Applies categorical types by default for Driver, Team, Compound, TrackStatus
- polars: Categorical types disabled by default (enable with
polars_lap_categoricalconfig)
Data Validation
Whenvalidate_data is enabled in config, _validate_json_payload validates raw JSON using Pydantic schemas:
- Required fields: Ensures all required fields are present in JSON
- Type checking: Validates data types match schema definitions
- Value ranges: Checks values are within expected ranges
- Referential integrity: Validates driver codes, lap numbers, etc.
Validation is controlled by the
validate_data config option. When disabled, raw JSON is passed through without validation for maximum performance.Performance Considerations
The I/O pipeline is heavily optimized for speed:- Zero-copy construction: Uses
copy=Falsein pandas,strict=Falsein polars - Batch processing: Processes all laps at once, not row-by-row
- Vectorized operations: Uses numpy/pandas vectorization for type coercion
- Minimal allocations: Reuses arrays where possible, avoids intermediate copies
- Lazy categorical: Categorical types applied only when beneficial
- Process 50 laps: ~2-5ms
- Process 1000 laps: ~20-40ms
- Full session (20 drivers × 50 laps): ~100-200ms
Internal Implementation
Column Renaming Strategy
Column Renaming Strategy
The pipeline maintains two sets of column names:
- JSON keys: Abbreviated keys like
"lap","s1","vi1"(raw) or snake_case like"lap_number","sector_1_time"(validated) - DataFrame columns: PascalCase like
"LapNumber","Sector1Time","SpeedI1"
_process_lap_df() using LAP_RENAME_MAP from core_utils/constants.py. The map supports both raw and validated JSON keys for maximum compatibility.Type Coercion
Type Coercion
The pipeline coerces types to ensure FastF1 compatibility:
- Lap times (float seconds) → timedelta64[ns]
- Session times (float seconds) → timedelta64[ns]
- Lap numbers → float64 (not int, to allow NaN)
- Boolean flags → bool (fillna False for non-nullable)
- Deleted flag → boolean (nullable bool)
- Categorical data → category (pandas only by default)
- Driver numbers → str (not int, to preserve leading zeros)
Missing Data Handling
Missing Data Handling
Missing values are handled gracefully:
- Numeric fields: NaN (pandas) or null (polars)
- String fields: empty string or null
- Boolean fields: False (fillna applied)
- Deleted field: null (nullable boolean)
- Timedelta fields: NaT (not-a-time)
InvalidDataError for missing required fields.Array Length Normalization
Array Length Normalization
_create_lap_df normalizes mismatched array lengths (required in Python 3.12+):- Calculates max length across all arrays
- Pads short arrays with None values
- Replicates scalar values to match max length