I/O Pipeline API

The io_pipeline module contains internal helpers for transforming raw JSON data into structured DataFrames. These functions handle column renaming, type coercion, and data validation.

This module contains internal implementation details. The API is subject to change. Most users should use the high-level Session API instead.

Overview

The I/O pipeline follows this flow:

Functions

`_validate_json_payload`

def _validate_json_payload(
    path: str,
    data: dict[str, Any]
) -> dict[str, Any]

Validate raw JSON payload using Pydantic schemas if validation is enabled in the global config. Parameters:

path: Resource path for error context (e.g., “laps/VER/19_tel.json”)
data: Raw JSON dictionary

Returns:

Validated JSON dictionary

Raises:

InvalidDataError: If validation fails

This function uses the global config singleton. The underlying implementation in async_fetch.py accepts a config parameter, but the exported version in io_pipeline.py uses the global config automatically.

`_extract_driver_codes`

def _extract_driver_codes(drivers: list[dict] | None) -> set[str]

Extract set of driver codes from drivers payload. Parameters:

drivers: List of driver dictionaries from session JSON, or None

Returns:

Set of 3-letter driver codes (e.g., )

Example:

drivers = [
    {"driver": "VER", "dn": "33", "team": "Red Bull Racing"},
    {"driver": "HAM", "dn": "44", "team": "Mercedes"}
]
codes = _extract_driver_codes(drivers)
# Returns: {"VER", "HAM"}

`_extract_driver_info_map`

def _extract_driver_info_map(
    drivers: list[dict] | None
) -> dict[str, dict]

Extract driver metadata from drivers payload into a lookup map. Parameters:

drivers: List of driver dictionaries from session JSON, or None

Returns:

Dictionary mapping driver codes to raw metadata dictionaries containing:
- driver: 3-letter driver code
- dn: Driver number (as string)
- team: Team name
- first_name: Driver’s first name
- last_name: Driver’s last name
- team_color: Hex color code
- headshot_url: URL to driver photo

Example:

drivers = [
    {
        "driver": "VER",
        "dn": "33",
        "team": "Red Bull Racing",
        "first_name": "Max",
        "last_name": "Verstappen",
        "team_color": "#3671C6",
        "headshot_url": "https://..."
    }
]
info_map = _extract_driver_info_map(drivers)
# Returns: {"VER": {"driver": "VER", "dn": "33", ...}}

The returned dictionary contains raw JSON keys (snake_case), not the renamed DataFrame columns (PascalCase). Column renaming happens in _process_lap_df.

`_create_lap_df`

def _create_lap_df(
    lap_data: dict,
    driver: str,
    team: str,
    lib: str
) -> DataFrame

Create a DataFrame from raw lap data JSON with driver and team metadata. Parameters:

lap_data: Dictionary of lap data arrays (not a list of dicts). Keys are internal JSON field names like "lap", "time", "s1", etc.
driver: 3-letter driver code (e.g., “VER”)
team: Team name (e.g., “Red Bull Racing”)
lib: DataFrame library to use (“pandas” or “polars”)

Returns:

DataFrame with raw lap timing data (before column renaming)

Raw columns created (before renaming):

lap: Lap number (1-indexed)
time: Lap time in seconds
s1, s2, s3: Sector times
compound: Tire compound
life: Tire age in laps
stint: Stint number
pb: Personal best flag
vi1, vi2, vfl, vst: Speed trap values
status: Track status code
pos: Position at lap end
del: Lap deleted flag
delR: Deletion reason
ff1G: FastF1 generated flag
Driver: Driver code (added by this function)
Team: Team name (added by this function)

Example:

# 2021 Belgian GP Race - Verstappen lap data
lap_data = {
    "lap": [1, 2, 3],
    "time": [132.765, 108.901, 107.523],
    "s1": [44.123, 35.234, 34.987],
    "s2": [48.234, 38.123, 37.891],
    "s3": [40.408, 35.544, 34.645],
    "compound": ["INTERMEDIATE", "INTERMEDIATE", "INTERMEDIATE"],
    "life": [1, 2, 3],
    "stint": [1, 1, 1]
}
df = _create_lap_df(lap_data, "VER", "Red Bull Racing", "pandas")

This function does NOT rename columns. Raw JSON keys are preserved. Use _process_lap_df to apply column renaming and type coercion.

`_create_session_df`

def _create_session_df(
    data: dict[str, Any],
    rename_map: dict[str, str],
    lib: str
) -> DataFrame

Create a DataFrame from session-level data (weather, race control messages, etc.). Parameters:

data: Raw data dictionary with arrays
rename_map: Column rename mapping (e.g., WEATHER_RENAME_MAP or RACE_CONTROL_RENAME_MAP)
lib: DataFrame library to use (“pandas” or “polars”)

Returns:

DataFrame with renamed columns according to the provided rename map

Example:

from tif1.core_utils.constants import WEATHER_RENAME_MAP

weather_data = {
    "wT": [0, 60, 120],
    "wAT": [18.5, 18.7, 18.9],
    "wTT": [22.1, 22.3, 22.5]
}
df = _create_session_df(weather_data, WEATHER_RENAME_MAP, "pandas")
# Columns: Time, AirTemp, TrackTemp

`_process_lap_df`

def _process_lap_df(
    lap_df: DataFrame,
    lib: str
) -> DataFrame

Post-process lap DataFrame by renaming columns, applying type coercion, and reordering columns. Parameters:

lap_df: Raw lap DataFrame from _create_lap_df
lib: DataFrame library (“pandas” or “polars”)

Returns:

Processed DataFrame with:
- Renamed columns (snake_case → PascalCase)
- Proper data types (timedelta for lap times, float64 for numeric, etc.)
- Categorical types for Driver, Team, Compound, TrackStatus (pandas only by default)
- FastF1-compatible column order

Transformations applied:

Column renaming via LAP_RENAME_MAP (e.g., "lap" → "LapNumber", "time" → "LapTime")
Type coercion:
- LapTime: float seconds → timedelta64[ns]
- Time, Sector1Time, etc.: float seconds → timedelta64[ns]
- Numeric columns: → float64
- Boolean columns: → bool
Add LapTimeSeconds column (float representation of LapTime)
Apply categorical types (pandas only, unless polars_lap_categorical config is enabled)
Reorder columns to match FastF1 convention

Column order (FastF1-compatible):

[
    "Time", "Driver", "DriverNumber", "LapTime", "LapNumber",
    "Stint", "PitOutTime", "PitInTime",
    "Sector1Time", "Sector2Time", "Sector3Time",
    "Sector1SessionTime", "Sector2SessionTime", "Sector3SessionTime",
    "SpeedI1", "SpeedI2", "SpeedFL", "SpeedST",
    "IsPersonalBest", "Compound", "TyreLife", "FreshTyre",
    "Team", "LapStartTime", "LapStartDate",
    "TrackStatus", "Position", "Deleted", "DeletedReason",
    "FastF1Generated", "IsAccurate",
    "WeatherTime", "AirTemp", "Humidity", "Pressure", "Rainfall",
    "TrackTemp", "WindDirection", "WindSpeed",
    "LapTimeSeconds", "QualifyingSession"
]

Column naming conventions

The I/O pipeline transforms raw JSON keys to FastF1-compatible column names:

JSON Key	DataFrame Column	Type	Description
`lap`	`LapNumber`	float64	Lap number (1-indexed)
`time`	`LapTime`	timedelta64[ns]	Lap time
`s1`	`Sector1Time`	timedelta64[ns]	Sector 1 time
`s2`	`Sector2Time`	timedelta64[ns]	Sector 2 time
`s3`	`Sector3Time`	timedelta64[ns]	Sector 3 time
`compound`	`Compound`	str/category	Tire compound (SOFT, MEDIUM, HARD, INTERMEDIATE, WET)
`life`	`TyreLife`	float64	Tire age in laps
`stint`	`Stint`	float64	Stint number
`pb`	`IsPersonalBest`	bool	Personal best lap flag
`vi1`	`SpeedI1`	float64	Speed trap 1 (km/h)
`vi2`	`SpeedI2`	float64	Speed trap 2 (km/h)
`vfl`	`SpeedFL`	float64	Finish line speed (km/h)
`vst`	`SpeedST`	float64	Speed trap (km/h)
`status`	`TrackStatus`	str/category	Track status code
`pos`	`Position`	float64	Position at lap end
`del`	`Deleted`	boolean	Lap deleted flag
`delR`	`DeletedReason`	str	Reason for deletion
`ff1G`	`FastF1Generated`	bool	FastF1 generated data flag
`sesT`	`Time`	timedelta64[ns]	Session time at lap end
`dNum`	`DriverNumber`	str	Driver number
`pout`	`PitOutTime`	timedelta64[ns]	Pit out time
`pin`	`PitInTime`	timedelta64[ns]	Pit in time

The complete mapping is defined in LAP_RENAME_MAP in src/tif1/core_utils/constants.py. Both validated (snake_case) and raw (abbreviated) JSON keys are supported.

Library Support

The pipeline supports both pandas and polars libraries:

# Create DataFrame with pandas lib
lap_data = {"lap": [1, 2], "time": [90.5, 89.2]}
df_pandas = _create_lap_df(lap_data, "VER", "Red Bull Racing", "pandas")
processed = _process_lap_df(df_pandas, "pandas")

# Create DataFrame with polars lib
df_polars = _create_lap_df(lap_data, "VER", "Red Bull Racing", "polars")
processed = _process_lap_df(df_polars, "polars")

Library-specific optimizations:

pandas: Uses pd.DataFrame(data, copy=False) for zero-copy construction
polars: Uses pl.DataFrame(data, strict=False) with schema inference
pandas: Applies categorical types by default for Driver, Team, Compound, TrackStatus
polars: Categorical types disabled by default (enable with polars_lap_categorical config)

Data Validation

When validate_data is enabled in config, _validate_json_payload validates raw JSON using Pydantic schemas:

Required fields: Ensures all required fields are present in JSON
Type checking: Validates data types match schema definitions
Value ranges: Checks values are within expected ranges
Referential integrity: Validates driver codes, lap numbers, etc.

Example validation error:

from tif1 import InvalidDataError

try:
    validated = _validate_json_payload("laps/VER", invalid_data)
except InvalidDataError as e:
    print(e)
    # InvalidDataError: Invalid data at laps/VER
    #   - Missing required field: lap
    #   - Invalid type for time: expected float, got str

Validation is controlled by the validate_data config option. When disabled, raw JSON is passed through without validation for maximum performance.

Performance Considerations

The I/O pipeline is heavily optimized for speed:

Zero-copy construction: Uses copy=False in pandas, strict=False in polars
Batch processing: Processes all laps at once, not row-by-row
Vectorized operations: Uses numpy/pandas vectorization for type coercion
Minimal allocations: Reuses arrays where possible, avoids intermediate copies
Lazy categorical: Categorical types applied only when beneficial

Typical performance (pandas lib):

Process 50 laps: ~2-5ms
Process 1000 laps: ~20-40ms
Full session (20 drivers × 50 laps): ~100-200ms

For maximum performance, disable validation (validate_data=False) and use pandas. Polars is faster for very large datasets (>10k laps) but has higher overhead for small datasets.

Internal Implementation

Column Renaming Strategy

The pipeline maintains two sets of column names:

JSON keys: Abbreviated keys like "lap", "s1", "vi1" (raw) or snake_case like "lap_number", "sector_1_time" (validated)
DataFrame columns: PascalCase like "LapNumber", "Sector1Time", "SpeedI1"

Renaming happens in _process_lap_df() using LAP_RENAME_MAP from core_utils/constants.py. The map supports both raw and validated JSON keys for maximum compatibility.

Type Coercion

The pipeline coerces types to ensure FastF1 compatibility:

Lap times (float seconds) → timedelta64[ns]
Session times (float seconds) → timedelta64[ns]
Lap numbers → float64 (not int, to allow NaN)
Boolean flags → bool (fillna False for non-nullable)
Deleted flag → boolean (nullable bool)
Categorical data → category (pandas only by default)
Driver numbers → str (not int, to preserve leading zeros)

Missing Data Handling

Missing values are handled gracefully:

Numeric fields: NaN (pandas) or null (polars)
String fields: empty string or null
Boolean fields: False (fillna applied)
Deleted field: null (nullable boolean)
Timedelta fields: NaT (not-a-time)

The pipeline never raises errors for missing optional fields. Only validation (when enabled) can raise InvalidDataError for missing required fields.

Array Length Normalization

_create_lap_df normalizes mismatched array lengths (required in Python 3.12+):

Calculates max length across all arrays
Pads short arrays with None values
Replicates scalar values to match max length

This ensures both pandas and polars can construct DataFrames without errors.

Getting Started

Core API

Data Pipeline

Configuration & Cache

Visualization & Tools

Utilities & Helpers

Compatibility & Errors

I/O Pipeline API

Overview

Functions

`_validate_json_payload`

`_extract_driver_codes`

`_extract_driver_info_map`

`_create_lap_df`

`_create_session_df`

`_process_lap_df`

Column naming conventions

Library Support

Data Validation

Performance Considerations

Internal Implementation

Getting Started

Core API

Data Pipeline

Configuration & Cache

Visualization & Tools

Utilities & Helpers

Compatibility & Errors

​Overview

​Functions

​_validate_json_payload

​_extract_driver_codes

​_extract_driver_info_map

​_create_lap_df

​_create_session_df

​_process_lap_df

​Column naming conventions

​Library Support

​Data Validation

​Performance Considerations

​Internal Implementation

Overview

Functions

`_validate_json_payload`

`_extract_driver_codes`

`_extract_driver_info_map`

`_create_lap_df`

`_create_session_df`

`_process_lap_df`

Column naming conventions

Library Support

Data Validation

Performance Considerations

Internal Implementation