Fuzzy Matching

The fuzzy matching module provides intelligent string matching for resolving event and session names. This allows you to use partial or informal names when loading data.

Overview

tif1 uses fuzzy matching to resolve:

Grand Prix names (e.g., “Monaco” → “Monaco Grand Prix”, “Spa” → “Belgian Grand Prix”)
Event locations and countries (e.g., “Belgium” → “Belgian Grand Prix”)

Session names use exact dictionary lookups with abbreviations (e.g., “Q” → “Qualifying”, “FP1” → “Practice 1”), not fuzzy matching. This makes the API more forgiving and user-friendly.

`fuzzy_matcher`

Core fuzzy matching function using RapidFuzz for fast string similarity.

def fuzzy_matcher(
    query: str,
    reference: list[list[str]]
) -> tuple[int, bool]

Parameters:

query: The string to match (e.g., “Monaco”, “Q”, “FP1”)
reference: List of lists where each sub-list contains feature strings for one element

Returns:

Tuple of (index, exact) where:
- index: Index of best matching element in reference list
- exact: True if query is an exact substring of exactly one feature string, False if fuzzy ratio matching was used

Matching Strategy:

Normalize query and reference (lowercase, remove spaces)
Check for exact substring matches first
If exactly one substring match found, return as exact
Otherwise, use fuzzy ratio matching with RapidFuzz
Return best match with confidence indicator

Example:

from tif1.fuzzy import fuzzy_matcher

# Reference data: each sub-list represents one event
reference = [
    ["Monaco Grand Prix", "Monaco", "Monte Carlo"],
    ["British Grand Prix", "Silverstone", "Britain"],
    ["Italian Grand Prix", "Monza", "Italy"]
]

# Exact substring match (query is substring of exactly one feature)
index, exact = fuzzy_matcher("Monaco", reference)
# Returns: (0, True) - "monaco" is substring of "monaco grand prix" and exact match of "monaco"

# Fuzzy match (no exact substring, uses Levenshtein distance)
index, exact = fuzzy_matcher("Monac", reference)
# Returns: (0, False) - closest match via fuzzy ratio

# Multiple feature strings
index, exact = fuzzy_matcher("Silverstone", reference)
# Returns: (1, True) - exact substring match

How fuzzy matching works

1. Normalization

All strings are normalized before matching:

Convert to lowercase (using casefold())
Remove spaces

"Monaco Grand Prix" → "monacograndprix"
"Spa-Francorchamps" → "spa-francorchamps"  # Special characters preserved
"Practice 1" → "practice1"

2. Exact substring matching

First, check if query is a substring of any feature string. If the query is a substring of features in exactly one element, return it as exact:

query = "monaco"
features = ["monacograndprix", "monaco", "montecarlo"]
# "monaco" is substring of "monacograndprix" and exact match of "monaco"
# Only one element (index 0) contains this substring
# Returns: (0, True)

3. Fuzzy ratio matching

If zero or multiple substring matches found, use Levenshtein distance ratio:

from rapidfuzz import fuzz

query = "monac"
feature = "monaco"
ratio = fuzz.ratio(query, feature)  # 91 (out of 100)

4. Disambiguation

If multiple elements have the same max ratio, disambiguate by deprioritizing common features that appear in multiple events:

# If "Grand Prix" appears in multiple events, prioritize unique features
reference = [
    ["Monaco Grand Prix", "Monaco"],
    ["British Grand Prix", "Silverstone"]
]

query = "Grand Prix"
# "Grand Prix" appears in both, so those matches are zeroed out
# Disambiguates using unique features "Monaco" vs "Silverstone"

Usage in tif1

Fuzzy matching is used internally by get_session(), get_event(), and get_event_by_name():

Event name matching

import tif1

# All of these work for event names:
session = tif1.get_session(2021, "Belgium", "Race")
session = tif1.get_session(2021, "belgian grand prix", "Race")
session = tif1.get_session(2021, "Spa", "Race")
session = tif1.get_session(2021, "spa-francorchamps", "Race")

# All resolve to "Belgian Grand Prix"

Session name matching

Session names use exact dictionary lookups (case-insensitive), not fuzzy matching:

import tif1

# Supported session name variations (case-insensitive):
session = tif1.get_session(2021, "Belgium", "Qualifying")
session = tif1.get_session(2021, "Belgium", "Q")  # Abbreviation
session = tif1.get_session(2021, "Belgium", "quali")  # Partial name
session = tif1.get_session(2021, "Belgium", "QUALIFYING")  # Any case

# Practice sessions - use FP abbreviations (not P1/P2/P3):
session = tif1.get_session(2021, "Belgium", "Practice 1")
session = tif1.get_session(2021, "Belgium", "FP1")  # Correct abbreviation
session = tif1.get_session(2021, "Belgium", "practice1")  # No space works

# Supported abbreviations:
# FP1, FP2, FP3 - Practice sessions
# Q - Qualifying
# S - Sprint
# SS - Sprint Shootout
# SQ - Sprint Qualifying
# R - Race

Exact Matching

To disable fuzzy matching and require exact event names:

from tif1.events import get_event_by_name

# Fuzzy matching (default)
event = get_event_by_name(2021, "Belgium")  # Works - fuzzy matches to "Belgian Grand Prix"

# Exact matching (case-insensitive)
event = get_event_by_name(2021, "Belgium", exact_match=True)  # Fails - not exact name
event = get_event_by_name(2021, "Belgian Grand Prix", exact_match=True)  # Works
event = get_event_by_name(2021, "belgian grand prix", exact_match=True)  # Works - case insensitive

Performance

Fuzzy matching is optimized for speed:

Exact substring matching: O(n×m) where n=reference size, m=feature count
Fuzzy matching: O(n×m×k) where k=string length
Typical performance: <1ms for event/session matching

Benchmarks:

Event name matching: ~0.5ms
Session name matching: ~0.3ms (dictionary lookup, not fuzzy)
100 fuzzy matches: ~50ms

Common Patterns

Event name variations

# Belgian Grand Prix (2021 example)
"Belgium", "belgian", "BELGIUM"
"Belgian Grand Prix", "Belgian GP"
"Spa", "Spa-Francorchamps"

# Monaco Grand Prix
"Monaco", "monaco", "MONACO"
"Monaco Grand Prix", "Monaco GP"
"Monte Carlo"

# British Grand Prix
"British", "Britain", "Silverstone"
"British Grand Prix", "British GP"

Session name variations

# Practice (use FP abbreviations, not P1/P2/P3)
"Practice 1", "FP1", "practice1"
"Practice 2", "FP2", "practice2"
"Practice 3", "FP3", "practice3"

# Qualifying
"Qualifying", "Q", "qualifying"

# Sprint
"Sprint", "S", "sprint"

# Sprint Shootout
"Sprint Shootout", "SS", "sprint shootout"

# Sprint Qualifying (2021-2022 only)
"Sprint Qualifying", "SQ", "sprint qualifying"

# Race
"Race", "R", "race"

Error Handling

If no good match is found, tif1 raises DataNotFoundError:

import tif1
from tif1.exceptions import DataNotFoundError

try:
    session = tif1.get_session(2021, "InvalidEventName", "Race")
except (DataNotFoundError, ValueError) as e:
    print(f"Error: {e}")
    # Show available events
    from tif1.events import get_events
    events = get_events(2021)
    print(f"Available events: {list(events)}")

Best Practices

Use location names or abbreviations: “Belgium”, “Spa”, “Monaco” are all recognized for events.
Don’t worry about case: Event matching is case-insensitive.
Spaces don’t matter for events: “Belgian Grand Prix” = “BelgianGrandPrix”.
Use exact abbreviations for sessions: “FP1” not “P1”, “Q” not “Quali” (though “qualifying” works).
Use exact match for validation: When you need to ensure exact event names.

from tif1.events import get_event_by_name

# Validation mode
event = get_event_by_name(2021, user_input, exact_match=True)

Check available names: Use get_events() and get_sessions() to see valid names.

from tif1.events import get_events, get_sessions

events = get_events(2021)
print(f"Valid events: {list(events)}")

sessions = get_sessions(2021, "Belgian Grand Prix")
print(f"Valid sessions: {sessions}")

Provide feedback: Show resolved name to user for confirmation.

session = tif1.get_session(2021, "Belgium", "Q")
print(f"Loaded: {session.event} - {session.session_name}")

Implementation Details

RapidFuzz Integration

tif1 uses RapidFuzz for fast fuzzy string matching. RapidFuzz is a C++ implementation of Levenshtein distance that’s 10-100x faster than pure Python implementations.

Caching

Fuzzy match results are not cached since matching is already very fast (<1ms). The overhead of caching would exceed the matching time.

Normalization Strategy

Normalization removes spaces and converts to lowercase to maximize match success while maintaining reasonable accuracy. Special characters are preserved to distinguish similar names.

Complete Example

import tif1
from tif1.events import get_event_by_name

# Example: Load 2021 Belgian Grand Prix Race
session = tif1.get_session(2021, "Belgium", "Race")
print(f"Event: {session.event}")
print(f"Session: {session.session_name}")
# Event: Belgian Grand Prix
# Session: Race

# Using different variations
session = tif1.get_session(2021, "Spa", "R")  # Location + abbreviation
session = tif1.get_session(2021, "belgian grand prix", "race")  # Full names

# Get event object with fuzzy matching
event = get_event_by_name(2021, "Belgium")
print(f"Resolved to: {event.EventName}")
# Resolved to: Belgian Grand Prix

# Check what sessions are available
from tif1.events import get_sessions
sessions = get_sessions(2021, "Belgian Grand Prix")
print(f"Available sessions: {sessions}")
# Available sessions: ['Practice 1', 'Practice 2', 'Practice 3', 'Qualifying', 'Race']

# Load specific session
session = tif1.get_session(2021, "Belgium", "FP1")
session.load()
print(f"Loaded {len(session.laps)} laps")

Summary

Fuzzy matching makes tif1 more user-friendly by:

Accepting partial event names (“Belgium” instead of “Belgian Grand Prix”)
Accepting location names (“Spa” for “Belgian Grand Prix”)
Being case-insensitive for both events and sessions
Supporting session abbreviations via dictionary lookup (“Q”, “FP1”, “R”)
Providing exact match mode for validation
Fast performance (<1ms per match)

Key distinction: Event names use fuzzy matching with RapidFuzz, while session names use exact dictionary lookups with predefined abbreviations. Use fuzzy matching for interactive applications and exact matching for validation or automated systems.

Getting Started

Core API

Data Pipeline

Configuration & Cache

Visualization & Tools

Utilities & Helpers

Compatibility & Errors

Fuzzy Matching

Overview

`fuzzy_matcher`

How fuzzy matching works

1. Normalization

2. Exact substring matching

3. Fuzzy ratio matching

4. Disambiguation

Usage in tif1

Event name matching

Session name matching

Exact Matching

Performance

Common Patterns

Event name variations

Session name variations

Error Handling

Best Practices

Implementation Details

Complete Example

Summary

Getting Started

Core API

Data Pipeline

Configuration & Cache

Visualization & Tools

Utilities & Helpers

Compatibility & Errors

​Overview

​fuzzy_matcher

​How fuzzy matching works

​1. Normalization

​2. Exact substring matching

​3. Fuzzy ratio matching

​4. Disambiguation

​Usage in tif1

​Event name matching

​Session name matching

​Exact Matching

​Performance

​Common Patterns

​Event name variations

​Session name variations

​Error Handling

​Best Practices

​Implementation Details

​Complete Example

​Summary

Overview

`fuzzy_matcher`

How fuzzy matching works

1. Normalization

2. Exact substring matching

3. Fuzzy ratio matching

4. Disambiguation

Usage in tif1

Event name matching

Session name matching

Exact Matching

Performance

Common Patterns

Event name variations

Session name variations

Error Handling

Best Practices

Implementation Details

Complete Example

Summary