Chat with data the easy way in R or Python

 game_data_all <- rio::import(“https://raw.githubusercontent.com/nflverse/nfldata/refs/heads/master/data/games.csv”) |>    filter(season %in% c(2024, 2025) & !is.na(result))

The load_schedules() function returns a data frame with 46 variables for metrics including game time, temperature, wind, playing surface, outdoor or dome, point spreads, and more. Run print(dictionary_schedules) to see a data frame with a data dictionary of all the fields.

Now that I have the data, I need to process it. I’m going to remove some ID fields I know I don’t want and keep everything else:

cols_to_remove <- c("old_game_id", "gsis", "nfl_detail_id", "pfr", "pff", 
                    "espn", "ftn", "away_qb_id", "home_qb_id", "stadium_id")

games <- game_data_all |>
  select(-all_of(cols_to_remove))

Although it’s obvious from the scores which teams won and lost, there aren’t actually columns for the winning and losing teams. In my tests, the LLM didn’t always write appropriate SQL when I asked about winning percentages. Adding team_won and team_lost columns makes that clearer for a model and simplifies the SQL queries needed. Then, I save the results to a feather file, a fast format for either R or Python: