An Analysis of the FIFA22 Players Dataset
Exploratory Data Analysis(EDA) is a crucial first step in the data analysis process. it invloves the use of various techniques, tools, and visualization to understand the structure, patterns, and characteristics of a data-set. The primary goal of EDA is to gain insights from the data, identify patterns and trends, detect anomalies, and prepare the data for further analysis and modeling.
The FIFA22
dataset is extracted from the popular football video games of FIFA
. Every year, there is a new game released with updated characteristics and skills for all players from all the biggest competitions in the world. This means that there are up-to-date characteristics and skills of the players that were transferred between 2015
and 2022
. The player characteristics can be found in the variables: player name
, age
, height in centimeters
, weight in kilogram
, nationality
, player
position
, preferred foot
, international reputation
, and contract duration
etc.The field player skills are divided into six subcategories, namely passing
, defending
, physical
, dribbling
, shooting
, and pace
.
For goalkeepers, the subcategories are kicking
, handling
, shooting
, reflexes
, diving
, and speed
. The subcategories all have subcategories themselves as well. Also, every player has an overall
rating and a potential
rating in the game. All ratings are in the range of 1
and 99
, where the lowest is 1
and where 99
is the highest possible score.
The information in the datasets is derived from sofifa.com, which is a well-known website with an overview of all FIFA data per year. A copy of the data-set can be downloaded from this kaggle data repositry
The aim of this article is to use EDA technique to dive into the FIFA22 data-set, derive some interesting insights from the data-set and identify anomalies.
The comon question we want to answer include:
- Who are the top fastest players in FIFA 2022?
- Which Players are paid the most?
- Who are the Top tallest players?
- Who are the Top strongest players?
- Who are the best players with Long passes?
- Who are the best defenders?
PREREQUISITE
You need to have a basic knowledge of SQl in order to follow through this guide, also you can install any of the SQL tool which we are going to be using to run the codes i.e. MySQL, MSSQL, PostgreSQL etc.
LOADING THE DATA SET
At first we need to load the dataset we are going to work with. There are several ways to load/import data set into an SQl database, for this guidei am going to use python to import the data set into PostgreSQL.
Note: other SQL work bench allow's to import directly from the workbench eg.MySQL. Since i am using Postgres where importing data set is somehow complecated, i decided to use python to import the data set into the PSQL db which is a more faster and convinience way.
To import data set into SQL using python;
The to_sql() function in Python is used to write records from a Pandas DataFrame to a SQL database. It is a very powerful function that can be used to store data in a variety of database formats.
The syntax for the to_sql() function is as follows:
import pandas as pd
player_df = pd.read_csv('player_22')
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:mlyt09@localhost:5432/projectsql')
Here:
postgresql://
> the sql_tool,postgres:
> the user,> the password,mlyt09
@localhost:
> the host,5432
> the port,/projectsql
the database.
player_df.to_sql(name='players', con=engine,index=False if_exists='replace')
-
player_df
is the Pandas DataFrame that you want to write to the database. -
name
is the name of the table that you want to create in the database. - con is a connection object to the database.
-
if_exists
specifies what to do if the table already exists. The possible values are fail, replace, and append. -
index
specifies whether to write the index of the DataFrame to the database. -
dtype
specifies the data types of the columns in the DataFrame. -
schema
specifies the schema of the database.
Check to confirm the data is in PSQL...
projectsql=# \dt players
List of relations
Schema | Name | Type | Owner |
---|---|---|---|
public | players | table | postgres |
projectsql=# \d players
Table "public.players"
Column | Type | Collation | Nullable | Default |
---|---|---|---|---|
sofifa_id | bigint | |||
player_url | text | |||
short_name | text | |||
long_name | text | |||
year | bigint | |||
player_positions | text | |||
overall | bigint | |||
potential | bigint | |||
value_eur | double precision | |||
wage_eur | double precision | |||
age | bigint | |||
dob | date | |||
height_cm | bigint | |||
weight_kg | bigint |
100+ other columns.....
Let Start by Using SQL Queries to answer some of our quetions.
(Qr1) Check to confirm the total number of records/players in the dataset:
Note: Qr here means query, so Qr1,2,3 > Query1,2,3 etc.
SELECT COUNT(*) AS total_no_of_players FROM players;
total_no_of_players |
---|
19239 |
(Qr2) Top 10 Nationality of Players for 2022:
SELECT DISTINCT nationality_name, COUNT(nationality_name) AS total_no_of_players
FROM players GROUP BY nationality_name
ORDER BY total_no_of_players DESC LIMIT 10;
nationality_name | total_no_of_players |
---|---|
England | 1719 |
Germany | 1214 |
Spain | 1086 |
France | 980 |
Argentina | 960 |
Brazil | 897 |
Japan | 546 |
Netherlands | 439 |
United States | 413 |
Poland | 403 |
As can be noted above, we use the DISTINCT
key word to select the unique/distinct nationality of players. With the query above, using the COUNT
function we are able to count the total number of players represeting each country,
from the result we could see that england
has the highest representative in club-football world @2022.
(Qr3) Countries with the most Football Talented (by pontential) Players:
The potential of a football player is a complex and multifaceted concept. It can be defined as the player's ability to reach their full potential as a footballer. This potential is determined by a number of factors, including the player's physical attributes, technical skills, tactical awareness, mental strength, and psychological makeup.
The potential of a football player is not something that is fixed. It can be developed and improved through training, hard work, and dedication. Players who are willing to put in the hard work and dedication can reach their full potential and become world-class footballers.
SELECT sum(potential) FROM players WHERE potential > 79;
SELECT DISTINCT nationality_name, ROUND((SUM(potential)/147969)*100,2) potential
FROM players
WHERE potential > 79
GROUP BY nationality_name
ORDER BY potential DESC LIMIT 10;
nationality_name | potential |
---|---|
Spain | 13.03 |
England | 9.29 |
France | 8.85 |
Brazil | 6.13 |
Argentina | 5.95 |
Germany | 5.04 |
Portugal | 4.78 |
Netherlands | 4.30 |
Italy | 3.93 |
Belgium | 2.86 |
(10 rows) |
Age and Performance are use to rate the potential of a player, the more younger and good performance the more highly potentialy rated. These are one of the metrics club-scouts use to scout for new young and talented players to joined their project. As can be seen above, we calculated for the percentage of potiential players each county has, it's no suprise to see that players from european and south american country have the best potential players.
(Qr4) Top 10 Fastest Players:
In the following code, these atrribute listed below will be use to rank the fastest players;
movement_acceleration, movement_sprint_speed, pace,club_position.
The most important player attribute from movement_acceleration, movement_sprint_speed, and pace is movement_acceleration. This is because acceleration is the rate at which a player can change their speed. This is important for players in all positions, but it is especially important for players who play in attacking positions, such as wingers and strikers.
Movement_sprint_speed is also important, but it is not as important as movement_acceleration. This is because sprint_speed is the maximum speed that a player can reach. However, players do not need to reach their maximum speed very often. They need to be able to accelerate quickly in order to get past defenders and create scoring opportunities.
Pace is a combination of movement_acceleration and movement_sprint_speed. It is a good overall measure of a player's speed, but it is not as specific as movement_acceleration or movement_sprint_speed.
SELECT short_name, movement_acceleration,
movement_sprint_speed, pace,club_position,nationality_name
FROM players ORDER BY movement_acceleration DESC LIMIT 10;
short_name | movement_acceleration | movement_sprint_speed | pace | club_position | nationality_name |
---|---|---|---|---|---|
K. Mbappé | 97 | 97 | 97 | ST | France |
Adama Traoré | 97 | 96 | 96 | LW | Spain |
M. Diaby | 96 | 92 | 94 | RM | France |
D. James | 96 | 95 | 95 | SUB | Wales |
A. Davies | 96 | 96 | 96 | LB | Canada |
A. Hakimi | 95 | 95 | 95 | RB | Morocco |
Vinícius Jr. | 95 | 95 | 95 | SUB | Brazil |
R. Sterling | 95 | 88 | 91 | SUB | England |
U. Antuna | 95 | 90 | 92 | SUB | Mexico |
C. Ejuke | 95 | 93 | 94 | LM | Nigeria |
(10 rows)
Acceleration usually defines the rate at which someone/thing moves. together with other varibles concider we could see that the most fastest player is Kylian mbape
which is no suprising due to his high attacking game.
BONUS: Code Explanation
Let's break down the code line by line:
- SELECT is the keyword that tells the database to select specific columns from a table.
- long_name, movement_acceleration, movement_sprint_speed, pace, club_position, and nationality_name are the columns that we want to select.
- FROM is the keyword that tells the database which table to select the columns from. In this case, the table is called players.
- ORDER BY is the keyword that tells the database to order the results by the movement_acceleration column in descending order.
- DESC is the keyword that tells the database to order the results in descending order.
- LIMIT is the keyword that limits the results to the first 10 rows.
The code will return the following results:
The long_name, movement_acceleration, movement_sprint_speed, pace, club_position, and nationality_name of the 10 players with the highest movement_acceleration.
The results will be ordered by the movement_acceleration column in descending order.
(Qr5) Top 10 Strongest Players by Stregnth:
SELECT short_name, power_strength, power_stamina,physic, weight_kg, club_name
FROM players
ORDER BY power_strength DESC LIMIT 10;
short_name | power_strength | power_stamina | physic | weight_kg | club_name |
---|---|---|---|---|---|
A. Akinfenwa | 97 | 34 | 74 | 110 | Wycombe Wanderers |
T. Petrášek | 96 | 57 | 79 | 99 | Raków Częstochowa |
D. Dike | 96 | 59 | 78 | 100 | Orlando City Soccer Club |
R. Lukaku | 95 | 71 | 83 | 94 | Chelsea |
A. Seck | 95 | 73 | 86 | 95 | Royal Antwerp FC |
A. Méndez | 95 | 81 | 86 | 82 | Club Nacional de Football |
G. Kondogbia | 94 | 82 | 89 | 76 | Atlético de Madrid |
S. Coates | 94 | 80 | 87 | 92 | Sporting CP |
C. Luyindama | 94 | 73 | 85 | 91 | Galatasaray SK |
K. Koulibaly | 94 | 70 | 85 | 89 | Napoli |
(10 rows)
The strength and stamina of a players usually determines thier stability, compose and control of the ball. Players with high qualities of the aforemention usually have an edge over oposition players(defend or attack).
(Qr6) Top 10 Strongest Players by stamina:
SELECT long_name, power_strength, power_stamina,physic, weight_kg, club_name
FROM players
ORDER BY power_stamina DESC LIMIT 10
long_name | power_strength | power_stamina | physic | weight_kg | club_name |
---|---|---|---|---|---|
N'Golo Kanté | 72 | 97 | 83 | 70 | Chelsea |
Jewgienij Baszkirow | 52 | 96 | 67 | 71 | Zagłębie Lubin |
Rhyan Bert Grant | 77 | 95 | 82 | 79 | Sydney FC |
Vladimír Darida | 48 | 95 | 63 | 67 | Hertha BSC |
Andrew Robertson | 65 | 95 | 76 | 64 | Liverpool |
Nicol├▓ Barella | 69 | 95 | 78 | 68 | Inter |
奥埜 博亮 | 71 | 95 | 73 | 68 | Cerezo Osaka |
Pieter Gerkens | 58 | 94 | 71 | 72 | Royal Antwerp FC |
Didier Andrés Moreno Asprilla | 79 | 94 | 79 | 77 | Junior FC |
Denzel Justus Morris Dumfries | 89 | 94 | 89 | 80 | Inter |
(10 rows)
Best Players FIFA22 (position wise)
Who were the best players(position wise) for the FIFA22 video game, concidering variables that are related to each player position we are going to answer the following questions;
(Qr7) Best Wingers:
Looking at the dataset, we will realize both the lw(Left-Wing) and rw(Right-Wing) column are formatted as text even though they contain numbers, with the below query we could understand why.
Some of the number contain character that are not number, we will have to clean this up and convert the columns into an integer!.
--lw
SELECT lw FROM players WHERE lw LIKE '%-%';
The code above is a
SQL SELECT
statement. It is used to select all rows from theplayers
table where thelw
column contains a hyphen (-).
lw |
---|
83-1 |
83-1 |
82-1 |
81-1 |
81-1 |
...... 300+ other values
SELECT lw, SUBSTRING(lw,1,POSITION('-' in lw)-1) AS lw_cleaned
FROM players
WHERE lw LIKE '%-%';
The code above is an
SQL SELECT
statement that will return thelw
column from theplayers
table, along with a new column calledlw_cleaned
. Thelw_cleaned
column will contain the value of thelw
column, but with the hyphen (-) removed.
lw | lw_cleaned |
---|---|
83-1 | 83 |
83-1 | 83 |
82-1 | 82 |
81-1 | 81 |
81-1 | 81 |
.... 300+ other values
UPDATE players SET lw = SUBSTRING(lw,1,POSITION('-' in lw)-1) WHERE lw LIKE '%-%'
UPDATE 326
The code above is an
SQL UPDATE
statement that will update thelw
column in theplayers
table. The update will remove the hyphen (-) from thelw
column, if it is present.
Finally we need to convert the the column to integer to be able to apply any arithmetic operation.
ALTER TABLE players ALTER COLUMN lw TYPE Bigint USING lw::bigint
ALTER TABLE
The above is an
SQL ALTER TABLE
statement that will change the data type of thelw
column in the players table toBigint
.
Let's apply the above to rw(right-winger):
-- rw
SELECT rw FROM players WHERE rw LIKE '%-%'
rw |
---|
75-1 |
83-1 |
82-1 |
81-1 |
81-1 |
.....300+
SELECT rw, SUBSTRING(rw,1,POSITION('-' in rw)-1) AS rw
FROM players
WHERE rw LIKE '%-%'
rw | rw |
---|---|
75-1 | 75 |
83-1 | 83 |
82-1 | 82 |
81-1 | 81 |
81-1 | 81 |
80-1 | 80 |
80-1 | 80 |
80-1 | 80 |
UPDATE players SET rw = SUBSTRING(rw,1,POSITION('-' in rw)-1) WHERE rw LIKE '%-%'
UPDATE 326
ALTER TABLE players ALTER COLUMN rw type Bigint USING rw::bigint
ALTER TABLE
Finally we can add the values for lw & rw for each players to get the highest rating player, playing in the wing role!.
The code below is an
SQL SELECT
statement that will select specific columns from theplayers
table and order the results by the wingers column in descending order. TheLIMIT
clause limits the results to the first 10 rows.
The most important attribute of a winger from skill_dribbling, attacking_short_passing, and movement_acceleration is skill_dribbling. This is because dribbling is the ability to control the ball while moving, which is essential for wingers who are tasked with beating defenders and creating scoring opportunities.
Wingers need to be able to dribble past defenders in order to create space for themselves and their teammates. They also need to be able to dribble in tight spaces, such as in the penalty area.
Attacking_short_passing is also important for wingers, as they need to be able to pass the ball accurately and quickly to their teammates in order to create scoring opportunities. However, it is not as important as skill_dribbling, as wingers do not need to pass the ball as often as they need to dribble.
Movement_acceleration is also important for wingers, as they need to be able to accelerate quickly in order to get past defenders. However, it is not as important as skill_dribbling or attacking_short_passing, as wingers do not need to accelerate as often as they need to dribble or pass the ball.
SELECT short_name, rw + lw wingers,skill_dribbling,attacking_short_passing,
movement_acceleration,overall,age, club_name
FROM players ORDER BY wingers DESC LIMIT 10;
short_name | wingers | skill_dribbling | attacking_short_passing | movement_acceleration | overall | age | club_name |
---|---|---|---|---|---|---|---|
L. Messi | 184 | 96 | 91 | 91 | 93 | 34 | Paris Saint-Germain |
Neymar Jr | 180 | 95 | 86 | 93 | 91 | 29 | Paris Saint-Germain |
K. Mbappé | 180 | 93 | 85 | 97 | 91 | 22 | Paris Saint-Germain |
Cristiano Ronaldo | 176 | 88 | 80 | 85 | 91 | 36 | Manchester United |
M. Salah | 176 | 90 | 84 | 89 | 89 | 29 | Liverpool |
K. De Bruyne | 176 | 88 | 94 | 76 | 91 | 30 | Manchester City |
R. Sterling | 174 | 87 | 83 | 95 | 88 | 26 | Manchester City |
H. Son | 174 | 87 | 84 | 85 | 89 | 28 | Tottenham Hotspur |
S. Mané | 174 | 90 | 84 | 93 | 89 | 29 | Liverpool |
P. Dybala | 174 | 90 | 87 | 88 | 87 | 27 | Juventus |
(10 rows)
Here is an explanation of the 'rw + lw' AS 'wingers' part of the code:
- The rw + lw expression adds the values of the rw and lw columns.
- The AS wingers clause gives the new column the name wingers.
- This means that the wingers column will contain the sum of the values of the rw and lw columns. This is useful because it allows us to rank the players by their overall attacking ability, regardless of whether they are primarily right wingers, left wingers, or both.
(Qr8) Best Forwarder (Striker):
In the following code, these atrribute listed below will be use to rank the best forwarders;
shooting, attacking_finishing, attacking_heading_accuracy, skill_dribbling, power_shot_power.
The most important attribute for a striker is shooting. This is because the main job of a striker is to score goals, and the shooting attribute determines how likely they are to do so. A good striker will have a high shooting attribute, and they will be able to score goals from a variety of different positions.
The other attributes listed are also important, but they are not as important as shooting. For example, a striker with a high attacking finishing attribute may be able to score goals from close range, but if they do not have a good shooting attribute, they will not be able to score goals from other positions.
Ultimately, the best way to judge a good striker is to watch them play. However, if you are only able to look at the attributes, then the shooting attribute is the most important one to consider.
SELECT COUNT(*) FROM players WHERE shooting IS NULL;
count |
---|
2132 |
UPDATE players SET shooting = 0 WHERE shooting IS NULL;
UPDATE 2132
SELECT short_name, shooting, attacking_finishing, power_shot_power,
attacking_heading_accuracy, skill_dribbling, club_name
FROM players ORDER BY shooting DESC LIMIT 10;
short_name | shooting | attacking_finishing | power_shot_power | attacking_heading_accuracy | skill_dribbling | club_name |
---|---|---|---|---|---|---|
Cristiano Ronaldo | 94 | 95 | 94 | 90 | 88 | Manchester United |
L. Messi | 92 | 95 | 86 | 70 | 96 | Paris Saint-Germain |
R. Lewandowski | 92 | 95 | 90 | 90 | 85 | FC Bayern M├╝nchen |
E. Haaland | 91 | 94 | 94 | 69 | 78 | Borussia Dortmund |
H. Kane | 91 | 94 | 91 | 86 | 83 | Tottenham Hotspur |
L. Suárez | 90 | 93 | 89 | 84 | 83 | Atlético de Madrid |
S. Ag├╝ero | 89 | 93 | 90 | 78 | 86 | FC Barcelona |
K. Mbappé | 88 | 93 | 86 | 72 | 93 | Paris Saint-Germain |
H. Son | 87 | 88 | 88 | 68 | 87 | Tottenham Hotspur |
M. Salah | 87 | 91 | 82 | 59 | 90 | Liverpool |
(10 rows)
(Qr9) Best Midfielder:
In the following code, these atrribute listed below will be use to rank the best Midfielders;
mentality_vision, movement_balance, attacking_crossing, mentality_interceptions, defending_marking_awareness
The most important attribute to judge a midfielder is mentality_vision. This is because midfielders need to be able to see the game ahead of them and make good decisions, both offensively and defensively. A midfielder with a high mentality_vision attribute will be able to see where their teammates are and where the space is, and they will be able to make passes and crosses that create scoring opportunities.
The other attributes listed are also important, but they are not as important as mentality_vision. For example, a midfielder with a high attacking_crossing attribute may be able to make good crosses, but if they do not have a good mentality_vision attribute, they will not be able to see where their teammates are and they will not be able to make the right passes.
Ultimately, the best way to judge a midfielder is to watch them play. However, if you are only able to look at the attributes, then the mentality_vision attribute is the most important one to consider.
SELECT short_name, mentality_vision, movement_balance, attacking_crossing,
mentality_interceptions, defending_marking_awareness, club_name
FROM players ORDER BY mentality_vision DESC LIMIT 10;
short_name | mentality_vision | movement_balance | attacking_crossing | mentality_interceptions | defending_marking_awareness | club_name |
---|---|---|---|---|---|---|
L. Messi | 95 | 95 | 85 | 40 | 20 | Paris Saint-Germain |
K. De Bruyne | 94 | 78 | 94 | 66 | 68 | Manchester City |
Luis Alberto | 92 | 83 | 69 | 59 | 58 | Lazio |
P. Dybala | 91 | 94 | 82 | 42 | 32 | Juventus |
T. Kroos | 90 | 71 | 88 | 80 | 71 | Real Madrid CF |
Bruno Fernandes | 90 | 79 | 87 | 66 | 72 | Manchester United |
Neymar Jr | 90 | 84 | 85 | 37 | 35 | Paris Saint-Germain |
L. Modri─ç | 90 | 92 | 86 | 80 | 70 | Real Madrid CF |
David Silva | 90 | 89 | 83 | 50 | 58 | Real Sociedad |
Iniesta | 90 | 76 | 75 | 59 | 68 | Vissel Kobe |
(10 rows)
(Qr10) Best Defender:
In the following code, these atrribute listed below will be use to rank the best Defenders;
movement_balance, mentality_aggression, mentality_interceptions, defending_marking_awareness, defending_sliding_tackle, defending_standing_tackle
The most important attribute to judge a defender is defending_marking_awareness. This is because defenders need to be able to mark their opponents and prevent them from scoring. A defender with a high defending_marking_awareness attribute will be able to stay close to their opponents and prevent them from getting into good scoring positions.
The other attributes listed above are also important, but they are not as important as defending_marking_awareness. For example, a defender with a high mentality_aggression attribute may be able to put in hard tackles, but if they do not have a good defending_marking_awareness attribute, they will not be able to mark their opponents effectively.
SELECT short_name, defending_marking_awareness, defending_sliding_tackle, defending_standing_tackle,
mentality_interceptions, mentality_aggression, club_name
FROM players
ORDER BY defending_marking_awareness DESC LIMIT 10;
short_name | defending_marking_awareness | defending_sliding_tackle | defending_standing_tackle | mentality_interceptions | mentality_aggression | club_name |
---|---|---|---|---|---|---|
G. Chiellini | 93 | 88 | 89 | 89 | 88 | Juventus |
V. van Dijk | 92 | 86 | 92 | 90 | 83 | Liverpool |
R├║ben Dias | 90 | 85 | 89 | 85 | 92 | Manchester City |
M. Škriniar | 90 | 82 | 88 | 86 | 84 | Inter |
K. Koulibaly | 90 | 86 | 88 | 85 | 83 | Napoli |
S. Savić | 90 | 83 | 86 | 87 | 86 | Atlético de Madrid |
N. Kanté | 90 | 86 | 93 | 91 | 93 | Chelsea |
M. Hummels | 90 | 86 | 90 | 89 | 71 | Borussia Dortmund |
S. de Vrij | 89 | 85 | 87 | 87 | 77 | Inter |
Marquinhos | 89 | 89 | 89 | 88 | 81 | Paris Saint-Germain |
(10 rows)
(Qr11) Best Goalkeeper:
In the following code, these atrribute listed below will be use to rank the best Goal Keepers;
goalkeeping_reflexes, goalkeeping_handling, goalkeeping_diving, goalkeeping_kicking
The most important attribute to judge a goalkeeper is goalkeeping_reflexes. This is because goalkeepers need to be able to react quickly to save shots. A goalkeeper with a high goalkeeping_reflexes attribute will be able to make saves that most other goalkeepers would miss.
The other attributes listed are also important, but they are not as important as goalkeeping_reflexes. For example, a goalkeeper with a high goalkeeping_handling attribute may be able to catch and hold onto the ball well, but if they do not have good reflexes, they will not be able to save shots that are hit with a lot of power.
SELECT short_name, goalkeeping_reflexes, goalkeeping_handling, goalkeeping_diving,
goalkeeping_kicking,height_cm, club_name
FROM players
ORDER BY goalkeeping_reflexes DESC LIMIT 10;
short_name | goalkeeping_reflexes | goalkeeping_handling | goalkeeping_diving | goalkeeping_kicking | height_cm | club_name |
---|---|---|---|---|---|---|
M. ter Stegen | 90 | 85 | 88 | 88 | 187 | FC Barcelona |
H. Lloris | 90 | 83 | 88 | 65 | 188 | Tottenham Hotspur |
G. Donnarumma | 90 | 83 | 91 | 79 | 196 | Paris Saint-Germain |
K. Schmeichel | 90 | 78 | 84 | 80 | 189 | Leicester City |
J. Oblak | 90 | 92 | 87 | 78 | 188 | Atlético de Madrid |
Alisson | 89 | 86 | 86 | 84 | 191 | Liverpool |
K. Navas | 89 | 84 | 89 | 75 | 185 | Paris Saint-Germain |
M. Neuer | 88 | 88 | 88 | 91 | 193 | FC Bayern M├╝nchen |
Ederson | 88 | 82 | 87 | 93 | 188 | Manchester City |
T. Courtois | 88 | 89 | 84 | 74 | 199 | Real Madrid CF |
(10 rows)
(Qr12) Most Paid player(Wage in eur):
Player wages are the salaries that players receive from their clubs. They are typically paid weekly or monthly, and they can vary depending on a number of factors, including the player's age, experience, ability, and the club's financial situation.
Player wages are an important part of the economics of professional football. They can be a significant financial burden for clubs, but they are also necessary to attract and retain the best players.
Player wages can also be affected by other factors, such as the player's contract length, the player's image rights, and the player's performance.
In recent years, player wages have been rising steadily. This is due to a number of factors, including the increasing commercialization of football and the increasing competition for talent.
As a result, player wages have become a major issue for clubs. Some clubs are struggling to afford the wages of their players, and this is leading to financial problems.
SELECT COUNT(*) FROM players WHERE wage_eur IS NULL;
count |
---|
61 |
UPDATE players SET wage_eur = 0 WHERE wage_eur IS NULL;
UPDATE 61
SELECT short_name, wage_eur, value_eur, overall, club_position,nationality_name, club_name
FROM players
ORDER BY wage_eur DESC LIMIT 10;
short_name | wage_eur | value_eur | overall | club_position | nationality_name | club_name |
---|---|---|---|---|---|---|
K. De Bruyne | 350000 | 125500000 | 91 | RCM | Belgium | Manchester City |
K. Benzema | 350000 | 66000000 | 89 | CF | France | Real Madrid CF |
L. Messi | 320000 | 78000000 | 93 | RW | Argentina | Paris Saint-Germain |
Casemiro | 310000 | 88000000 | 89 | CDM | Brazil | Real Madrid CF |
T. Kroos | 310000 | 75000000 | 88 | LCM | Germany | Real Madrid CF |
R. Sterling | 290000 | 107500000 | 88 | SUB | England | Manchester City |
R. Lewandowski | 270000 | 119500000 | 92 | ST | Poland | FC Bayern M├╝nchen |
Cristiano Ronaldo | 270000 | 45000000 | 91 | ST | Portugal | Manchester United |
M. Salah | 270000 | 101000000 | 89 | RW | Egypt | Liverpool |
Neymar Jr | 270000 | 129000000 | 91 | LW | Brazil | Paris Saint-Germain |
(10 rows)
(Qr13) Top Rated Player player(Overall):
The overall rating of a player is a number that represents their overall ability in FIFA. It is calculated based on a number of factors, including their attributes, their position, and their form.
The overall rating of a player can range from 1 to 99, with 99 being the highest possible rating. Players with a high overall rating are considered to be the best players in the world.
The overall rating of a player is a dynamic measure, which means that it can change over time. The player's attributes can improve as they get more experience. Their form can also improve or decline, as a result, the overall rating of a player is a good way to track their progress and to see how they compare to other players.
SELECT short_name, overall,age,club_position,club_name
FROM players
ORDER BY overall DESC LIMIT 10;
short_name | overall | age | club_position | club_name |
---|---|---|---|---|
L. Messi | 93 | 34 | RW | Paris Saint-Germain |
R. Lewandowski | 92 | 32 | ST | FC Bayern M├╝nchen |
Cristiano Ronaldo | 91 | 36 | ST | Manchester United |
K. De Bruyne | 91 | 30 | RCM | Manchester City |
J. Oblak | 91 | 28 | GK | Atlético de Madrid |
K. Mbappé | 91 | 22 | ST | Paris Saint-Germain |
Neymar Jr | 91 | 29 | LW | Paris Saint-Germain |
H. Kane | 90 | 27 | ST | Tottenham Hotspur |
N. Kanté | 90 | 30 | RCM | Chelsea |
M. Neuer | 90 | 35 | GK | FC Bayern M├╝nchen |
(10 rows)
(Qr14) Preferred Foot of a Player:
A player's preferred foot is the foot that they use most often to kick the ball. This can have a significant impact on their playing style and their ability to perform certain skills.
For example, a player who is right-footed will typically be better at crossing the ball with their right foot and shooting with their right foot. They may also be better at dribbling with their right foot and passing with their right foot.
On the other hand, a player who is left-footed will typically be better at crossing the ball with their left foot and shooting with their left foot. They may also be better at dribbling with their left foot and passing with their left foot.
Of course, there are some players who are ambidextrous and can use both feet equally well. However, most players have a preferred foot that they use more often.
The preferred foot of a player can be determined by a number of factors, including their genetics, their training, and their playing style.
Finally, a player's playing style can also affect their preferred foot. For example, a player who is a striker will typically need to be good at shooting with both feet, as they will need to be able to score goals from both sides of the goal. However, a player who is a defender may only need to be good at using one foot, as they will typically only need to defend from one side of the field.
SELECT DISTINCT preferred_foot, COUNT(*) AS total
FROM players
GROUP BY preferred_foot
ORDER BY total DESC;
preferred_foot | total |
---|---|
Right | 14674 |
Left | 4565 |
(2 rows)
(Qr15) What is the player age distribution?:
Below, we could note that the player age distribution in soccer is typically bimodal, with a peak in the early 20s and another peak in the late 30s. This is because players tend to reach their peak physical and technical abilities in their early 20s, but they can also maintain a high level of performance into their late 30s if they stay fit and healthy.
There are a few reasons for the bimodal distribution. First, young players are often more physically gifted than older players, and they have more energy. This allows them to run more, tackle harder, and jump higher. Second, young players are often more technically gifted than older players, and they have better ball control and passing skills. This allows them to create more chances and score more goals.
However, young players often lack experience and decision-making skills. This can lead to mistakes, and it can also make them more susceptible to injuries. As players get older, they gain experience and decision-making skills. This allows them to make better decisions on the field, and it also helps them to avoid injuries.
The age distribution of players also varies depending on the position. For example, goalkeepers tend to be older than other players, because they need to have a high level of experience and decision-making skills. Defenders also tend to be older than other players, because they need to be physically strong and have a good understanding of the game.
SELECT DISTINCT age, COUNT(*) total
FROM players GROUP BY age ORDER BY total DESC;
age | total |
---|---|
21 | 1547 |
22 | 1446 |
24 | 1442 |
25 | 1394 |
23 | 1387 |
20 | 1377 |
27 | 1200 |
26 | 1197 |
29 | 1178 |
28 | 1129 |
19 | 1099 |
30 | 901 |
31 | 825 |
18 | 733 |
32 | 634 |
33 | 468 |
34 | 354 |
17 | 271 |
35 | 258 |
36 | 146 |
37 | 105 |
38 | 62 |
39 | 39 |
16 | 20 |
40 | 14 |
41 | 7 |
43 | 3 |
42 | 2 |
54 | 1 |
(29 rows)
BONUS
(Qr16) What is the current player age(Now):
ALTER TABLE players ALTER COLUMN dob TYPE DATE USING dob::DATE;
ALTER TABLE
SELECT EXTRACT(YEAR FROM AGE(NOW(), dob)) age_now FROM players LIMIT 6;
age_now |
---|
36 |
34 |
38 |
31 |
32 |
24 |
(6 rows)
ALTER TABLE players ADD age_now INT;
ALTER TABLE
UPDATE players SET age_now = EXTRACT(YEAR FROM AGE(NOW(), dob))
UPDATE 19239
SELECT short_name, age AS age_2022, age_now
FROM players
ORDER BY overall DESC LIMIT 8;
short_name | age_2022 | age_now |
---|---|---|
L. Messi | 34 | 36 |
R. Lewandowski | 32 | 34 |
J. Oblak | 28 | 30 |
K. De Bruyne | 30 | 32 |
Neymar Jr | 29 | 31 |
Cristiano Ronaldo | 36 | 38 |
K. Mbappé | 22 | 24 |
H. Kane | 27 | 30 |
(8 rows)
HIGHLIGHTS AND CONCLUSION
In this EDA of FIFA data set, we have explored a variety of features, including player age, overall rating, preferred foot, wage, and best position. We have found that there are a number of interesting trends and relationships within the data. For example, we have found that:
- Player age distribution of players is bimodal, with a peak in early 20s and another in late 30s.
- The best players playing in their role/position often have high technical abilities/qualities and can effect their team overall performance.
- The High overall rating of players of result from their form/performance and their atrribute, it can also be from how they effect their team to play better. This is a good way of tracking a player progress.
- lastly the wage of players are skyrocketing by year, this as cause major issue for football clubs. Some clubs are struggling to afford the wages of their players, and this is leading to financial problems.
Through this EDA, we have learned a number of things about Exporatory Data Analysis in SQL. For example, we have learned how to:
- Use SQL queries to explore data sets.
- Identify trends and relationships within data sets.
- Test hypotheses about data sets.
- We have also learned that EDA is a powerful tool that can be used to gain insights into data sets. By exploring data sets, we can identify trends and relationships that would not be obvious from simply looking at the data. This information can be used to make better decisions about the data, or to build better models.
The next step in this project would be to conduct more in-depth analysis of the data. This could involve using statistical tests to test our hypotheses, or building machine learning models to predict player performance.
Thanks for coming this long....
If you like what you read, consider subscribing to my newsletter
Top comments (0)