Fuzzy string matching for event and session name resolution
The fuzzy matching module provides intelligent string matching for resolving event and session names. This allows you to use partial or informal names when loading data.
Grand Prix names (e.g., “Monaco” → “Monaco Grand Prix”, “Spa” → “Belgian Grand Prix”)
Event locations and countries (e.g., “Belgium” → “Belgian Grand Prix”)
Session names use exact dictionary lookups with abbreviations (e.g., “Q” → “Qualifying”, “FP1” → “Practice 1”), not fuzzy matching.This makes the API more forgiving and user-friendly.
query: The string to match (e.g., “Monaco”, “Q”, “FP1”)
reference: List of lists where each sub-list contains feature strings for one element
Returns:
Tuple of (index, exact) where:
index: Index of best matching element in reference list
exact: True if query is an exact substring of exactly one feature string, False if fuzzy ratio matching was used
Matching Strategy:
Normalize query and reference (lowercase, remove spaces)
Check for exact substring matches first
If exactly one substring match found, return as exact
Otherwise, use fuzzy ratio matching with RapidFuzz
Return best match with confidence indicator
Example:
from tif1.fuzzy import fuzzy_matcher# Reference data: each sub-list represents one eventreference = [ ["Monaco Grand Prix", "Monaco", "Monte Carlo"], ["British Grand Prix", "Silverstone", "Britain"], ["Italian Grand Prix", "Monza", "Italy"]]# Exact substring match (query is substring of exactly one feature)index, exact = fuzzy_matcher("Monaco", reference)# Returns: (0, True) - "monaco" is substring of "monaco grand prix" and exact match of "monaco"# Fuzzy match (no exact substring, uses Levenshtein distance)index, exact = fuzzy_matcher("Monac", reference)# Returns: (0, False) - closest match via fuzzy ratio# Multiple feature stringsindex, exact = fuzzy_matcher("Silverstone", reference)# Returns: (1, True) - exact substring match
First, check if query is a substring of any feature string. If the query is a substring of features in exactly one element, return it as exact:
query = "monaco"features = ["monacograndprix", "monaco", "montecarlo"]# "monaco" is substring of "monacograndprix" and exact match of "monaco"# Only one element (index 0) contains this substring# Returns: (0, True)
If multiple elements have the same max ratio, disambiguate by deprioritizing common features that appear in multiple events:
# If "Grand Prix" appears in multiple events, prioritize unique featuresreference = [ ["Monaco Grand Prix", "Monaco"], ["British Grand Prix", "Silverstone"]]query = "Grand Prix"# "Grand Prix" appears in both, so those matches are zeroed out# Disambiguates using unique features "Monaco" vs "Silverstone"
import tif1# All of these work for event names:session = tif1.get_session(2021, "Belgium", "Race")session = tif1.get_session(2021, "belgian grand prix", "Race")session = tif1.get_session(2021, "Spa", "Race")session = tif1.get_session(2021, "spa-francorchamps", "Race")# All resolve to "Belgian Grand Prix"
# Belgian Grand Prix (2021 example)"Belgium", "belgian", "BELGIUM""Belgian Grand Prix", "Belgian GP""Spa", "Spa-Francorchamps"# Monaco Grand Prix"Monaco", "monaco", "MONACO""Monaco Grand Prix", "Monaco GP""Monte Carlo"# British Grand Prix"British", "Britain", "Silverstone""British Grand Prix", "British GP"
tif1 uses RapidFuzz for fast fuzzy string matching. RapidFuzz is a C++ implementation of Levenshtein distance that’s 10-100x faster than pure Python implementations.
Caching
Fuzzy match results are not cached since matching is already very fast (<1ms). The overhead of caching would exceed the matching time.
Normalization Strategy
Normalization removes spaces and converts to lowercase to maximize match success while maintaining reasonable accuracy. Special characters are preserved to distinguish similar names.
import tif1from tif1.events import get_event_by_name# Example: Load 2021 Belgian Grand Prix Racesession = tif1.get_session(2021, "Belgium", "Race")print(f"Event: {session.event}")print(f"Session: {session.session_name}")# Event: Belgian Grand Prix# Session: Race# Using different variationssession = tif1.get_session(2021, "Spa", "R") # Location + abbreviationsession = tif1.get_session(2021, "belgian grand prix", "race") # Full names# Get event object with fuzzy matchingevent = get_event_by_name(2021, "Belgium")print(f"Resolved to: {event.EventName}")# Resolved to: Belgian Grand Prix# Check what sessions are availablefrom tif1.events import get_sessionssessions = get_sessions(2021, "Belgian Grand Prix")print(f"Available sessions: {sessions}")# Available sessions: ['Practice 1', 'Practice 2', 'Practice 3', 'Qualifying', 'Race']# Load specific sessionsession = tif1.get_session(2021, "Belgium", "FP1")session.load()print(f"Loaded {len(session.laps)} laps")
Accepting partial event names (“Belgium” instead of “Belgian Grand Prix”)
Accepting location names (“Spa” for “Belgian Grand Prix”)
Being case-insensitive for both events and sessions
Supporting session abbreviations via dictionary lookup (“Q”, “FP1”, “R”)
Providing exact match mode for validation
Fast performance (<1ms per match)
Key distinction: Event names use fuzzy matching with RapidFuzz, while session names use exact dictionary lookups with predefined abbreviations.Use fuzzy matching for interactive applications and exact matching for validation or automated systems.