Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Scrape and analyze FBREF data with kickR.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

jeffreyohene/kickR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title
kickR 1.0.0

Overview

kickR is a comprehensive R package designed for web scraping of football metrics fromFBRef. Whether you're an analyst, data scientist, or a football enthusiast, kickR provides you with the tools to access player and team statistics from various football leagues around the world. This package makes it easy to gather football metrics for your analysis and/or data vizzes.

kickR is written in the R programming language and to get started, if you have never used R before, you will have to download and install R and RStudio here for your computer by selecting the OS you use:

If you are more comfortable in other programming languages like Python, you can scrape the data you need here in R and there are functions kickR provides that help you save the scraped data and export it to continue your analysis or use the data for the visualizations you want to work with.

For some examples as to what you can do with kickR data, kindly check thisrepo out.

Features

  • Scraping Data: Easily retrieve football statistics, player data, and team metrics from websites like FBref.

  • League Coverage:

    • Premier League
    • Championship
    • Serie A
    • Ligue 1
    • La Liga
    • Segunda División
    • Serie B
    • Bundesliga
    • Eredivisie
    • Campeonato Brasileiro Série A
    • Liga MX
    • Major League Soccer
    • Primeira Liga
    • Bundesliga 2
    • Belgian Pro League
    • Ligue 2
  • Data Analysis:FBRef has metrics grouped under 9 categories which are listed with the top metrics for each category below:

    • Standard: General team metrics, xG, npxG, xG performance
    • Goalkeeping: Clean sheets, save percentage, goals conceded
    • Advanced Goalkeeping: Free kick and corner kick goals conceded, post shot expected goals(PSxG), PSxG performance, average passing length, goal kicks breakdown, crosses faced and crosses stopped an sweeping metrics.
    • Shooting: Goals, shots and shots on target, average shot distance, xG and npxG performance.
    • Passing: Total passes, pass blength breakdown, total passing distance, total progressive passing distance, pass length breakdown(attempted, completed and completion rates of shor, medium and long passes), final third passes, expected assisted goals, expected assists, key passes, final third passes and crosses and passes into the penalty area.
    • Pass Types: Switches, throw ins, through balls, crosses, live in-play and dead passes, in and out-swinging corner kicks and passes offside.
    • Goal & Shot Creation: Goal & Shot creating actions from live and dead passes, take-ons, shots, fouls drawn and defensive actions.
    • Defensive Actions: tackles, challenges, blocks, interceptions, clearances and errors leading to an opponent shot.
    • Possession: Touches, take-ons, carries and passes received.
    • Playing Time: Matches played, minutes played, minutes per matches played, starts, points per match, goals scored and goals conceded when playing, xg performance when playing
    • Miscellaneous: disciplinary record, fouls made and drawn, ball recoveries and offsides
  • Data Export: After scraping data, you can export it as a .rds file if you use R or as a .csv, .xlsx or .json file.

  • Additional Feature: kickR also has a function for calculating player/team similarity using the find_similar_players and find_similar_teams functions. This can help you scout players and teams who play in a certain way. Note that, similarity does not directly equate to style. Two players when compared for their touches in possession, touches in the middle third and carries made can be considered similar but their playing style may differ.

  • Open Source: kickR is open-source software distributed under the MIT License.

Installation

To install, run the code below:

# Install latest development version of kickR.if (!requireNamespace('devtools',quietly=T)) {  install.packages('devtools')}devtools::install_github('jeffreyohene/kickR')

How to Scrape League Stats

Here is the function syntax for scraping team data from a league:

fbref_team_stats<-function(league=NULL,season=NULL,type=NULL)

Function parameters

Leagues

Below are the leagues that kickR support and these values are to be passed to the leaue argument in the function. Do note that when a league os not supplied, the English Premier League will be automatically selected.

  • premier_league
  • championship
  • serie_a
  • la_liga
  • ligue_1
  • segunda_division
  • serie_b
  • bundesliga
  • mls
  • eredivisie
  • br_serie_a
  • liga_mx
  • primera_liga
  • bundesliga_2
  • belgian_pro_league
  • ligue_2

Metrics

Below are the metrics available to be scraped and are to be passed to the type parameter. When the type argument is null, it will be defaulted to the standard metric.

  • standard
  • goalkeeping
  • advanced_goalkeeping
  • shooting
  • passing
  • pass_types
  • goal_creation
  • defensive_actions
  • possession
  • playing_time
  • miscellaneous

Season

The season parameter is the last argument to pass to the function. When left blank, it defaults to the current season. It should be supplied in the formatYYYY/YYYY so if you want data for the 2022 to 2023 season, you can supply2022/2023 to the year argument. FBREF started collecting metrics for most leagues in 2017, 2018 so should your function return nothing for the league you selected, visit the website to check if data is actually available for that season.

To scrape team statistics from the available football leagues usingkickR, follow these steps:

  1. Load thekickR package in your R environment.
library(kickR)
# To scrape bundesliga league goalkeeping data for 2020/2021bundesliga_goalkeeping<- fbref_team_stats(league="bundesliga",season="2023/2024",type="goalkeeping")# Expected output as at 12/10/2023# A tibble: 18 × 21clubleaguematches_playedsquadtotal_minutes_playedmins_per_90goals_againstgoals_against_per90<chr><chr><chr><chr><chr><chr><chr><chr>1Augsburg234343,06034.0601.762BayernMunich334343,06034.0451.323Bochum234343,06034.0742.184Darmstadt98234343,06034.0862.535Dortmund234343,06034.0431.266EintFrankfurt234343,06034.0501.477Freiburg134343,06034.0581.718Gladbach234343,06034.0671.979Heidenheim134343,06034.0551.6210Hoffenheim134343,06034.0661.9411Köln134343,06034.0601.7612Leverkusen234343,06034.0240.7113Mainz05234343,06034.0511.5014RBLeipzig234343,06034.0391.1515Stuttgart234343,06034.0391.1516UnionBerlin234343,06034.0581.7117WerderBremen234343,06034.0541.5918Wolfsburg234343,06034.0561.65# ℹ 13 more variables: shots_on_target_against <chr>, saves <chr>, save_percentage <chr>, wins <chr>, draws <chr>,#   losses <chr>, clean_sheets <chr>, clean_sheet_percentage <chr>, penalties_attempted <chr>,#   penalty_kicks_allowed <chr>, penalty_kicks_saved <chr>, penalty_kicks_missed <chr>,#   penalty_kicks_save_percentage <chr>
# Passing data for La Liga# If you want latest statistics for a league you can always leave the season parameter out like thisla_liga_passing<- fbref_team_stats(league="la_liga",season="2023/2024",type="goalkeeping")la_liga_passing# A tibble: 20 × 26clubnumber_of_players_usedmins_per_90total_passes_completedtotal_passes_attemptedpass_completion_perc…¹<chr><chr><chr><chr><chr><chr>1Alavés3038.0100911411671.52Almería3538.0127741654077.23AthleticClub2738.0145091872477.54AtléticoMadrid2738.0170642070982.45Barcelona2938.0215062476186.96Betis3538.0153101897980.77Cádiz3438.0107631499071.88CeltaVigo3138.0138901777678.19Getafe3338.0107131519370.510Girona2538.0187932191485.811Granada4038.0121271597675.912LasPalmas2938.0191052288283.513Mallorca2538.0114761564573.414Osasuna2938.0128741733574.315RayoVallecano2638.0132821745976.116RealMadrid2738.0217942469188.317RealSociedad3138.0152631924979.318Sevilla3538.0146281859978.619Valencia2938.0122331625575.320Villarreal3238.0153491863782.4# ℹ abbreviated name: ¹​pass_completion_percentage# ℹ 20 more variables: total_passing_distance <chr>, total_progressive_distance <chr>, short_passes_completed <chr>,#   short_passes_attempted <chr>, short_pass_completion_percentage <chr>, medium_passes_completed <chr>,#   medium_passes_attempted <chr>, medium_pass_completion_percentage <chr>, long_passes_completed <chr>,#   long_passes_attempted <chr>, long_pass_completion_percentage <chr>, assists <chr>, xAG <chr>, xA <chr>,#   xag_performance <chr>, key_passes <chr>, passes_into_final_third <chr>, passes_into_penalty_box <chr>,#   crosses_into_penalty_box <chr>, progressive_passes <chr>kickRalsosupportsleagueoutsideofEuropeliketheMexicanLigaMx.Itfollowsthesamepatternlikescrapingforotherleagues.Ifyouleavetheseasonargumentblank,kickRscrapesdataforthecurrentseason,soifwewantedtoseethelatestshotandgoalcreationstatsacrossclubsintheMexicanleagues,wecandoitlikethis```R# Scrape latest liga mx shot and goal creation statsliga_mx_sca_gca<- fbref_team_stats(league="liga_mx",type="goal_creation")liga_mx_sca_gca# A tibble: 18 × 19clubnumber_of_players_usedmins_per_90shot_creating_actionsshot_creating_action…¹sca_live_passessca_dead_passes<chr><chr><chr><chr><chr><chr><chr>1América244.09323.256882Atlas214.07919.7550133Atléti204.08020.005864CruzA204.012030.0089165FCJuá…204.06817.005146Guadal194.09523.757357León204.08721.756088Mazatl204.07318.255399Monter204.08421.0074210Necaxa214.08521.25641011Pachuca214.08521.25541212Puebla194.011428.50811413Querét224.06115.2543814Santos224.04611.5026715Tijuana214.08721.7568616Toluca194.07719.2563717UANL184.010927.25801018UNAM224.011228.007116# ℹ abbreviated name: ¹​shot_creating_actions_per90# ℹ 12 more variables: sca_take_ons <chr>, sca_shots <chr>, sca_fouls <chr>, sca_defensive_actions <chr>,#   goal_creating_actions <chr>, goal_creating_actions_per90 <chr>, gca_live_passes <chr>, gca_dead_passes <chr>,#   gca_take_ons <chr>, gca_shots <chr>, gca_fouls <chr>, gca_defensive_actions <chr>

How to Scrape Player Data

With this version you can access player data of every available league on FBREF. Do note that the player data scraping is a little different from the team data scraping and since the player data tables on the site are dynamically rendered, we will use Javascript to scrape the data. To use this function, you will need to haveMozilla Firefox installed on your computer. Note that if you encounter any problems during scraping, use Ctrl + Shift + F10 to restart your R session then use the function again

To scrape the EFL championship passing data for players for the 2022/2023 season for example, you can use this

># Scrape passing stats for all EFL players in the 2022/2023 season>efl_passing_players<- fbref_player_stats(season="2022/2023",+league="championship",+type="passing")>># Expected output>efl_passing_players# A tibble: 750 × 30playernationpositionclubagebirth_yearmins_per_90total_passes_completedtotal_passes_attempted<chr><chr><chr><chr><chr><chr><chr><chr><chr>1MaxAaronsengENGDFNorwichC22200042.8200825362TheloAasgaardnoNORFW,MFWiganAth20200217.63995073NelsonAbbeyengENGDFReading1820030.2694KelvinAbrefaengENGMF,DFReading1820031.420365FinlayAdairengENGFW,MFPreston1720050.7386ElijahAdebayoengENGFWLutonTown24199835.73886257TobyAdeyemoengENGMF,FWWatford1720051.19158AlbertAdomahghGHAFW,MFQPR34198714.12504239MichaelAdu-PokuengENGFWWatford1620050.10110BenikAfobecdCODFWMillwall29199310.3127192# ℹ 740 more rows# ℹ 21 more variables: pass_completion_percentage <chr>, total_passing_distance <chr>, total_progressive_distance <chr>,#   short_passes_completed <chr>, short_passes_attempted <chr>, short_pass_completion_percentage <chr>,#   medium_passes_completed <chr>, medium_passes_attempted <chr>, medium_pass_completion_percentage <chr>,#   long_passes_completed <chr>, long_passes_attempted <chr>, long_pass_completion_percentage <chr>, assists <chr>,#   xAG <chr>, xA <chr>, xag_performance <chr>, key_passes <chr>, passes_into_final_third <chr>,#   passes_into_penalty_box <chr>, crosses_into_penalty_box <chr>, progressive_passes <chr># ℹ Use `print(n = ...)` to see more rows

How to Find Similar Players

To use this function, you will need to have Firefox installed on your computer.

# extract player passing datadf<- fbref_player_stats(season="2023/2024",league="premier_league",type="passing")# find players similar to Martin Ødegaard in tge English Premier Leaguem_odegaard_sim_pl<- find_similar_players(df=df,player="Martin Ødegaard",metrics= c("key_passes","passes_into_final_third"),formula="euclidean",top_n=15)m_odegaard_sim_plplayerdistance580Martin Ødegaard0.00000184BrunoFernandes29.54657415ColePalmer38.41875552JamesWard-Prowse44.41846317JamesMaddison46.09772220BrunoGuimarães46.64762201MorganGibbs-White53.85165313DouglasLuiz54.12947316AlexisMacAllister55.03635196ConorGallagher55.9017018TrentAlexander-Arnold56.63921434PedroPorro56.79789457AndrewRobertson58.00000114LewisCook60.16644418LucasPaquetá61.40033

It is usually better to have a larger dataframe. You can use the fbref_player_stats() function to scrape player stats from as many leagues as you can and use therbind() function to combine them into a larger dataframe to have a very deep pool of players so you can really unearth hidden players who are really good but play in a less known league. For an example, we luckily have FBREF having all players in the top 5 league in a single table which you can scrape with kickR using the fbref_big5_player_stats() function. If we wanted to really see which players perform similarly to Martin Ødegaard in terms of key passes and passes into the final third, we can go about it like this:

# Scrape passing data for all players in top 5 leagues: Premier League, La Liga, Serie A, Bundesliga, Ligue 1df<- fbref_big5_player_stats(season="2023/2024",type="passing")# find players similar to Martin Ødegaard in Europe's Top 5 Leagues using cosine similaritym_odegaard_sim_big5_cos<- find_similar_players(df=df,player="Martin Ødegaard",metrics= c("key_passes","passes_into_final_third"),formula="cosine",top_n=15)m_odegaard_sim_big5_cosplayersimilarity2849Martin Ødegaard1.0000000108FelipeAnderson0.99999972429BernardoSilva0.99999842583JanThielmann0.99999842660KacperUrbanski0.99999841762TakumiMinamino0.9999960735RitsuDoan0.99999402068AdriàPedrosa0.99999282336AlexisSánchez0.999992881MiguelAlmirón0.9999911547JordanClark0.99999111409GrejohnKyei0.99999111036VincenzoGrifo0.9999904191RidleBaku0.99998732093AyozePérez0.9999873# find players similar to Martin Ødegaard in Europe's Top 5 Leagues uing euclidean distancem_odegaard_sim_big5_eucl<- find_similar_players(df=df,player="Martin Ødegaard",metrics= c("key_passes","passes_into_final_third"),formula="euclidean",top_n=15)m_odegaard_sim_big5_euclplayerdistance2849Martin Ødegaard0.000002366TéjiSavanier28.231191071      İlkayGündoğan28.44293862BrunoFernandes29.546571195Isco36.619672027ColePalmer38.4187563LuisAlberto39.84972361BenjaminBourigeaud40.804411339JoshuaKimmich41.036572512KevinStöger43.011632743JamesWard-Prowse44.418461553JamesMaddison46.097721063BrunoGuimarães46.647621784LukaModrić47.042532211TijjaniReijnders52.61179

As you can see, cosine and euclidean distance measure similarity in two different approaches. An article will be added to this repo's description to talk more about it and if you have any suggestions on how to adjust it, do reach out to me.

A tip I would like to include is this. In our example, Martin Ødegaard is a midfielder. It would make sense to filter the scraped data to include only midfielders or defenders/midfielders or forwards/midfielders. This will improve the formula's ability to find similar players as the context is clearer. If you wanted to filter your dataframe for only midfielders before calling the find_similar_players() function, you could use this in base R:

# filter dataframe for only midfieldersdf<-df[df$position=="MF", ]# call similarity function againm_odegaard_sim_big5_eucl<- find_similar_players(df=df,player="Martin Ødegaard",metrics= c("key_passes","passes_into_final_third"),formula="euclidean",top_n=15)m_odegaard_sim_big5_euclplayerdistance568Martin Ødegaard0.00000480TéjiSavanier28.23119207    İlkayGündoğan28.44293231Isco36.6196713LuisAlberto39.84972515KevinStöger43.01163553JamesWard-Prowse44.41846311JamesMaddison46.09772205BrunoGuimarães46.64762362LukaModrić47.04253437TijjaniReijnders52.61179265TeunKoopmeiners54.03702304DouglasLuiz54.12947184AngelGomes54.1479532MaximilianArnold54.74486

These are the available positions for players on FBREF:

unique(df$position) [1]"DF""MF,FW""MF""FW""FW,MF""DF,FW""GK""DF,MF""MF,DF""FW,DF"

So if you want to extend your midfielder search, you would have to filter for players who are primarily registered as midfielders so MF, MF/FW, MF/DF. To filter for multiple values in base R, you can use this snippet:

df<-df[df$position%in% c("MF","MF,FW","MF,DF"), ]

Data Cleaning

This package was built onrvest,jsonlite andopenxlsx. Since the first release is a purely scraping package release, you would have to load dplyr into your R environment for helpful data manipulation functions like renaming columns and also changing column data types from character to numeric for example.

It is also worth noting that the scraping package cleans the column names into more descriptive names for easier analysis. You can always rename the columns in your analysis workflow to what suits you best.

How to Use thesave_table Function

Thesave_table function is designed to save a given data frame in various formats such as JSON, CSV, XLSX, or RDS. It offers flexibility for choosing the desired format.

Prerequisites

Thesave_table function saves a specified dataframe to your working directory provided you want to store it locally or work with it later. If we want to save our La Liga passing table from early on to make a viz that proves why Real Madrid is the most potent passing team in La Liga for example we can do that below with the following code.

# Initialize variablesdf<-la_liga_passing_latestfilename<-'la_liga_passing_latest'format<-'csv'save_table(df,filename,format)# Expected output:Tablesavedas' la_liga_passing_latest.csv'

Dependencies

kickR relies on the following R packages:

  • jsonlite
  • rvest
  • RSelenium
  • openxlsx

Author and Maintainer

License

This package is released under the MIT License. See the LICENSE file fordetails.

Contributing

If you would like to contribute to this project, please check the contributions file for this package.

Reporting Issues

I regularly monitor the packages' functions' performance and functionality andrelease updates as needed to ensure its reliability and from time to time, small updates will be released to fix bugs or comply with FBREF's scraping policy. If you encounter anyissues or have suggestions for improvements, please don't hesitate to open anissue on therepo and provide as much detail as possible to help me understand and address the issue.

Project icon from icon8.com

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp