Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

Overview

PremiershipPlayerAnalysis

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. Note : My understanding is the squad data on this site can change at any time so your results might be different

Improvement : Calculate age to finer degree than just years

The was developed in Jupyter Notebook and this walkthrough willl assume you are doing the same

Once you have ran the scraping

original = pd.DataFrame(playersList) # Convert the data scraped into a Pandas DataFrame 

original.to_csv('premiershipplayers.csv') # Keep a back up of the data to save time later if required 

df2 = original.copy() # Working copy of the DataFrame (just in case) 


df2.info()


   
    
RangeIndex: 578 entries, 0 to 577
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   club         578 non-null    object
 1   name         578 non-null    object
 2   shirtNo      572 non-null    object
 3   nationality  562 non-null    object
 4   dob          562 non-null    object
 5   height       500 non-null    object
 6   weight       474 non-null    object
 7   appearances  578 non-null    object
 8   goals        578 non-null    object
 9   wins         578 non-null    object
 10  losses       578 non-null    object
dtypes: object(11)
memory usage: 49.8+ KB

   

*** A total of 578 player. ***

6 without shirt number

16 without nationality listed

16 without dob listed

78 without height listed

104 without weight listed

Cleanup Data

  1. Remove spaces and newline from dob, appearances, goals, wins and losses columns

  2. Change type of dob to date

  3. change type of appearances, goals, wins, losses to int

     df2['dob'] = df2['dob'].str.replace('\n','').str.strip(' ')
     df2['appearances'] = df2['appearances'].str.replace('\n','').str.strip(' ')
     df2['goals'] = df2['goals'].str.replace('\n','').str.strip(' ')
     df2['wins'] = df2['wins'].str.replace('\n','').str.strip(' ')
     df2['losses'] = df2['losses'].str.replace('\n','').str.strip(' ')
    
     # change type of dob, appearances, goals, wins, losses
     from datetime import  date
    
     df2['dob'] = pd.to_datetime(df2['dob'],format='%d/%m/%Y').dt.date
     df2["appearances"] = pd.to_numeric(df2["appearances"])
     df2["goals"] = pd.to_numeric(df2["goals"])
     df2["wins"] = pd.to_numeric(df2["wins"])
     df2["losses"] = pd.to_numeric(df2["losses"])
     df2['height'] = df2['height'].str[:-2]
     df2["height"] = pd.to_numeric(df2["height"])
     
     
     # Create age column
    
     today = date.today()
    
     def age(born):
         if born:
             return today.year - born.year - ((today.month, 
                                           today.day) < (born.month, 
                                                         born.day))
         else:
             return np.nan
    
     df2['age'] = df2['dob'].apply(age)
    

10 Oldest Players

    df2.sort_values('age',ascending=False).head(10)

image

10 Youngest Players

    df2.sort_values('age',ascending=True).head(10)

image

Squad Sizes

    df2.groupby(['club'])['club'].count().sort_values(ascending=False)

image

Team's Average Player Age

    plt.ylim([20, 30])
    df2.groupby(['club'])['age'].mean().sort_values(ascending=False).plot.bar()

image

Burnley appear to not only have one of the highest average player ages but also the owest number of registered players

Top 10 Premiership Appearances

    df2.sort_values('appearances',ascending=False).head(10)

image

Collective Premiership Appearances per Club

    df2.groupby(['club'])['appearances'].sum().sort_values(ascending=False)

image

    df2.groupby(['club'])['appearances'].sum().sort_values(ascending=False).plot.bar()

image

10 Tallest Playes

    df2.sort_values('height',ascending=False).head(10)

image

10 Shortest Playes

    df2.sort_values('height',ascending=True).head(10)

image

Nationality totals of Players

    pd.set_option('display.max_rows', 100)
    df.groupby(['nationality'])['club'].count().sort_values(ascending=False)

Nationality totals per club

    pd.set_option('display.max_rows', 500)
    df.groupby(['club','nationality'])['nationality'].count()
High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

Du Jinhong 2 Mar 22, 2022
Convert tables stored as images to an usable .csv file

Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for

711 Dec 26, 2022
collect training and calibration data for gaze tracking

Collect Training and Calibration Data for Gaze Tracking This tool allows collecting gaze data necessary for personal calibration or training of eye-tr

Pascal 5 Dec 17, 2022
Basis Set Format Converter

Basis Set Format Converter Repository for the online tool that allows you to enter a basis set in the form of text input for a variety of Quantum Chem

Manas Sharma 3 Jun 27, 2022
InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

CRISPRanalysis InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family. In this work, we present a workflow

2 Jan 31, 2022
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
Maximum Covariance Analysis in Python

xMCA | Maximum Covariance Analysis in Python The aim of this package is to provide a flexible tool for the climate science community to perform Maximu

Niclas Rieger 39 Jan 03, 2023
COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
Tkinter Izhikevich Neuron Model With Python

TKINTER IZHIKEVICH NEURON MODEL WITH PYTHON Hodgkin-Huxley Model It is a mathematical model for the generation and transmission of action potentials i

Rabia KOÇ 8 Jul 16, 2022
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

Spectacular AI 94 Jan 04, 2023
Programmatically access the physical and chemical properties of elements in modern periodic table.

API to fetch elements of the periodic table in JSON format. Uses Pandas for dumping .csv data to .json and Flask for API Integration. Deployed on "pyt

the techno hack 3 Oct 23, 2022
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

3 Nov 10, 2021
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required)

Binomial Option Pricing Calculator Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required) Background A derivative is a fi

sammuhrai 1 Nov 29, 2021
Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

Bhavya Gopal 3 Jan 31, 2022
A real data analysis and modeling project - restaurant inspections

A real data analysis and modeling project - restaurant inspections Jafar Pourbemany 9/27/2021 This project represents data analysis and modeling of re

Jafar Pourbemany 2 Aug 21, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

Brain Imaging Data Structure 180 Dec 18, 2022