DEV Community

Khushi Singla
Khushi Singla

Posted on

Predicting Football Player Market Value with a Simple ML Pipeline (Pandas + Scikit-Learn)

In this project, I explore how to predict football player market value using a clean and simple ML pipeline based on Python, Pandas, Seaborn, and Scikit-Learn.

📌 Full code and notebook available on GitHub:
👉 https://github.com/KhushiSingla-tech/Football-player-price-pridiction


Dataset & Setup

We start by loading a local data.csv.

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd
import seaborn as sns

dataset = pd.read_csv('data.csv')
Enter fullscreen mode Exit fullscreen mode

First look:

dataset.head()
dataset.columns
dataset.describe()
dataset.shape
dataset.dtypes
dataset['nationality'].value_counts()
Enter fullscreen mode Exit fullscreen mode

Tip: keep an eye on data types and missing values. position_cat should already be numeric in this workflow—if it weren’t, we’d need to encode it first.


Quick EDA

Below are the core visuals I generated. I’m including placeholders so you can drop screenshots from your notebook. Keep the titles the same to stay consistent.

1. Name vs Age (top 50)

plt.figure(figsize=(10,6))
graph = sns.barplot(x='name', y='age', data=dataset[:50], palette="rocket")
graph.set(xlabel="Name", ylabel="Age", title="Name VS Age")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('talk'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs Age

2. Members per Club

plt.figure(figsize=(10,6))
graph = sns.countplot(x='club', data=dataset, palette="vlag")
graph.set(xlabel="Club", ylabel="Member", title="Members per club")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('talk'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Members per Club

3. Name vs Market Value (top 50)

plt.figure(figsize=(16,6))
graph = sns.barplot(x='name', y='market_value', data=dataset[:50], palette="colorblind")
graph.set(xlabel="Name", ylabel="Market Value", title="Name VS Market Value")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs Market Value

4. Name vs Position Category (top 50)

plt.figure(figsize=(16,6))
graph = sns.pointplot(x='name', y='position_cat', data=dataset[:50], palette="deep")
graph.set(xlabel="Name", ylabel="Position category", title="Name VS Position Category")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('talk'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs Position

5. Name vs Region (top 50)

plt.figure(figsize=(16,6))
graph = sns.pointplot(x='name', y='region', data=dataset[:50], palette="rocket")
graph.set(xlabel="Name", ylabel="Region", title="Name VS Region")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('poster'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs Region

6. Players by Nationality

plt.figure(figsize=(20,6))
graph = sns.countplot(x='nationality', data=dataset, palette="muted")
graph.set(xlabel="Nationality", ylabel="Players", title="No. of players amoung different nationality")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('paper'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Players by Nationality

7. Players by Region

graph = sns.countplot(x='region', data=dataset, palette="vlag")
graph.set(xlabel="Region", ylabel="Players", title="No. of players amoung various regions")
sns.set_context('paper'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Players by Region

8. Name vs FPL Points (top 50)

plt.figure(figsize=(16,6))
graph = sns.barplot(x='name', y='fpl_points', data=dataset[:50], palette="pastel")
graph.set(xlabel="Name", ylabel="FPL Points", title="Name VS FPL points")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('poster'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs FPL Points

9. Name vs FPL Value (top 50)

plt.figure(figsize=(16,6))
graph = sns.pointplot(x='name', y='fpl_value', data=dataset[:50], palette="dark")
graph.set(xlabel="Name", ylabel="FPL Value", title="Name VS FPL value")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

Name vs FPL Value

10. New Foreign (Count)

graph = sns.countplot(x='new_foreign', data=dataset, palette="dark")
graph.set(xlabel="New Foreign", ylabel="Amount", title="How many are new signing from a different league")
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

New Foreign (Count)

11. New Foreign (By Name)

plt.figure(figsize=(20,6))
graph = sns.pointplot(x='name', y='new_foreign', data=dataset[:100], palette="dark")
graph.set(xlabel="Name", ylabel="New Foreign", title="Whether a new signing from a different league")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

New Foreign (By Name)

12. New Signing (Count)

graph = sns.countplot(x='new_signing', data=dataset, palette="rocket")
graph.set(xlabel="New Signing", ylabel="Amount", title="How many are new signing ")
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

New Signing (Count)

13. New Signing (By Name)

plt.figure(figsize=(20,6))
graph = sns.pointplot(x='name', y='new_signing', data=dataset[:100], palette="bright")
graph.set(xlabel="Name", ylabel="New Signing", title="Whether a new signing")
graph.set_xticklabels(graph.get_xticklabels(), rotation=90)
sns.set_context('notebook'); sns.despine(); plt.show()
Enter fullscreen mode Exit fullscreen mode

New Signing (By Name)


Feature Selection

For modeling, I use the following five predictors:

dataset = pd.read_csv('data.csv') 
X = dataset[['age', 'fpl_value', 'fpl_points', 'page_views', 'position_cat']]
Y = dataset['market_value']
Enter fullscreen mode Exit fullscreen mode

Why these?

  • age – price typically varies with age/prime years.
  • fpl_value, fpl_points – performance and fantasy value often correlate with perceived market value.
  • page_views – a soft proxy for popularity/visibility.
  • position_cat – price dynamics differ by position.

Train/Test Split + Scaling

from sklearn.model_selection import train_test_split 
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=0.2, random_state=0
)

from sklearn.preprocessing import StandardScaler 
sc_X = StandardScaler() 
X_train = sc_X.fit_transform(X_train) 
X_test = sc_X.transform(X_test)
Enter fullscreen mode Exit fullscreen mode
  • Why scaling? Linear models are sensitive to feature scales; standardization helps stable coefficients and convergence.

Model: Linear Regression

from sklearn.linear_model import LinearRegression 
regressor = LinearRegression() 
regressor.fit(X_train, Y_train)
Enter fullscreen mode Exit fullscreen mode

Make predictions on the test set:

Y_pred = regressor.predict(X_test)
df = pd.DataFrame({'Actual': Y_test, 'Predicted': Y_pred})
df.head()
Enter fullscreen mode Exit fullscreen mode

Linear Regression

To see predictions across the full dataset:

X1 = sc_X.transform(X) 
Y_pred1 = regressor.predict(X1)
Enter fullscreen mode Exit fullscreen mode

Output: -

array([ 6.28665327e+01,  4.63344124e+01,  1.71999185e+01,  2.70542543e+01,
        1.67576333e+01,  2.31341678e+01,  3.06614290e+01,  1.26441319e+01,
        1.88859345e+01,  1.61778959e+01,  1.68413424e+01,  1.87550870e+01,
        1.53217919e+01,  2.00342164e+01,  5.03340141e+00,  9.45484552e+00,
        8.81515716e+00,  1.65135952e+01,  2.15921316e+01,  1.14508041e+01,
        4.94468838e+00,  2.20113374e+01,  5.06376105e+00,  9.02558975e+00,
        5.66032107e+00,  5.82966111e+00,  1.40691543e+01,  3.47309764e+01,
        2.54661370e+01,  3.17429798e+01,  9.84330289e+00,  6.72138221e+00,
        1.06760045e+01,  1.16738703e+01,  1.04311591e+01,  1.11660553e+01,
        4.62853856e+00,  1.29618040e+01,  8.85989508e+00,  2.93968695e+00,
        1.09260466e+01,  1.40564779e+01,  5.83211310e+00,  1.66622152e+00,
        6.92188866e+00,  5.12775349e+00,  4.77049269e+00,  8.60744252e-01,
        1.76621052e+00,  5.65940283e+00,  3.01078473e+00,  6.11320445e+00,
        4.77661984e-01,  6.86850732e+00,  4.09268232e+00,  4.45776943e+00,
       -1.21186324e+00, -4.32716636e+00,  2.27473673e+00, -1.88866900e+00,
        1.57521202e+00,  1.59605849e+00,  9.93010176e+00,  1.14337555e+00,
        4.70114334e-01, -5.69705594e-01,  5.49109710e+00,  2.42127613e+00,
        1.60229262e+00,  1.68421329e+00,  6.44917242e+00,  5.78379753e+00,
        9.27755719e-01,  1.58683879e+00,  1.39307739e+01,  1.21129834e+01,
        1.60684486e+01,  6.89538099e+00,  5.06714267e+00,  6.16871528e+00,
        7.97262607e+00,  1.09000523e+01,  6.59496956e+00,  8.34852473e+00,
        7.41193892e-01,  2.90238204e+00,  4.10618827e+00,  1.04340149e+01,
        5.29097527e+00,  1.77164410e+00,  9.12243523e-01, -1.71626246e+00,
        5.39393523e+01,  5.02586215e+01,  2.33287914e+01,  3.38398821e+01,
        2.27397252e+01,  2.76023255e+01,  2.06741282e+01,  2.40857909e+01,
        2.55614322e+01,  1.97780734e+01,  2.45654940e+01,  1.00361670e+01,
        2.12049333e+01,  8.44089635e+00,  2.72273014e+01,  1.32175831e+01,
        1.00032029e+01,  2.07556329e-02,  1.34878506e+01,  8.99831431e+00,
        2.46911471e+01,  2.84420802e+01,  1.37693399e+01,  1.39753539e+01,
        4.14940785e+00,  7.89296807e+00,  4.59603873e+00,  9.33580921e+00,
        9.17322247e+00,  5.91166865e+00,  7.20501161e+00,  1.62985464e+00,
        9.77106295e+00,  5.24199276e+00,  6.77506385e+00, -1.10826023e-01,
        6.70876820e+00,  1.51229913e+00,  3.64913411e+00,  5.86341313e+00,
       -1.71934842e+00,  2.64918075e+01,  1.53767278e+01,  2.08364791e+01,
        1.28360458e+01,  1.42769184e+01,  1.83438607e+01,  8.39739780e+00,
        1.54972711e+01,  9.24879295e+00,  8.84735296e+00,  4.27543923e+01,
        1.00334222e+01,  6.67323145e+00,  4.24451728e+00,  1.64599406e+01,
        8.93319695e+00,  1.36017222e+01,  9.23623029e+00,  4.75455136e+00,
        5.70086565e+00,  5.26865228e+00,  6.05140101e+00,  1.27525174e+01,
        3.33916737e+00,  5.90391575e+00,  2.80368956e+00,  1.71560146e+01,
        1.77938304e+01,  4.82038725e+00,  4.59853329e+00,  3.02243318e+00,
        2.64504042e+00,  6.44877606e+00,  3.59123084e+00, -8.69915834e-01,
        2.93424055e+00, -2.94119546e+00,  3.14294737e+00,  3.46717211e+00,
        8.45537926e+00,  2.31326804e+00,  5.60399756e-01,  3.76573016e+00,
        1.30656073e-01,  1.83081491e+00, -2.04944876e+00,  2.08635844e-01,
       -6.24129449e-01,  8.17555691e+00,  9.17157126e+00,  7.59654639e+00,
        5.81263224e+00,  2.15563645e+00, -4.78803512e-01,  4.23712894e+00,
        8.80441122e+00,  3.50444501e+01,  3.07869954e+01,  1.59727357e+01,
        9.11841002e+00,  1.07657645e+01,  9.27215255e+00,  6.23564852e+00,
        1.96676802e+01,  1.10408239e+01,  8.46461158e+00,  4.94055844e+00,
        6.76284129e+00,  1.13816079e+01,  1.06885705e+01,  6.05253148e+00,
        2.99910714e+00,  1.44952259e+01,  3.61337549e+00,  2.60868278e+00,
        7.41538690e+00,  2.92184993e+00,  4.45153531e+00,  3.78618026e+00,
        8.64074595e+00,  3.50352917e+01,  4.00189086e+01,  4.30005449e+01,
        2.43905161e+01,  2.12379487e+01,  2.52051293e+01,  1.54203647e+01,
        1.33076223e+01,  1.40963410e+01,  1.33599626e+01,  1.71506799e+01,
        2.39610467e+01,  1.30875717e+01,  2.62851963e+01,  4.77089334e+00,
        5.82049375e+00,  1.00155278e+01,  1.43421271e+01,  8.23561008e+00,
        5.30518579e+00,  7.40927061e+00,  5.53990619e+00,  7.30884677e+00,
        5.55237174e+00,  2.70476182e+01,  7.80253031e+00,  6.98958048e+00,
        4.33994048e+01,  5.54702805e+01,  3.20615354e+01,  2.21769627e+01,
        2.53293890e+01,  3.42494372e+01,  1.31454153e+01,  1.21667107e+01,
        1.97419795e+01,  6.30015138e+00,  8.82081099e+00,  5.06400874e+01,
        1.57180792e+01,  1.44266939e+01,  2.25336943e+01,  1.35383098e+01,
        5.05587301e-01,  3.04413219e+00,  1.27785472e+01,  2.29855522e+01,
        5.81554453e+01,  2.31388392e+01,  2.16422756e+01,  5.10021411e+01,
        2.18816842e+01,  2.28615104e+01,  1.47747468e+01,  1.52930900e+01,
        3.17042563e+01,  1.46778860e+01,  3.35870220e+01,  3.10865733e+01,
        1.64592195e+01,  1.69076901e+01,  1.11555766e+01,  1.29238188e+01,
        9.61917212e+00,  1.25204674e+01,  5.30423360e+00,  5.54871109e+00,
        1.00654973e+01,  4.98946114e+00,  7.42955630e+00,  6.31904380e+00,
        1.30403887e+01,  1.00392522e+00,  6.06577036e+00,  6.72085484e+00,
        3.73134019e+00,  4.18149614e+00,  4.22383012e+00,  1.91381534e+00,
       -1.28088146e+00,  3.41045955e+00,  1.79497319e+00,  1.05829507e+01,
        9.43534298e+00,  3.28532558e+00,  1.35250312e+00,  4.38494855e+00,
        2.07244010e+00,  1.75906926e+00,  1.43509874e+01,  6.87198964e+00,
        6.13033140e+00, -2.33888264e+00,  1.35990916e+01,  1.80634481e+01,
        1.46927448e+01,  1.44239848e+01,  1.12612981e+01,  1.29791664e+01,
        6.24648586e+00,  9.03304108e+00,  5.25226415e+00,  1.51571160e+01,
        1.41645531e+01,  1.01954205e+01,  9.91526470e+00,  1.17886001e+01,
        3.27217678e+00,  5.94602737e+00,  2.11646419e+01,  4.44317689e+00,
        6.10421380e+00,  2.47182931e+00,  9.65090120e-01,  5.44983836e+00,
        5.29387882e+00,  1.22565818e+01,  1.78140601e+01,  7.59876326e+00,
        9.32675979e+00,  5.76379440e+00,  9.09217109e+00,  1.39168840e+01,
        9.92486243e+00,  1.19482597e+00,  5.87981968e+00,  3.78964322e+00,
        4.60747657e+00,  6.09054273e+00,  7.18585630e+00,  5.48772878e+00,
        1.01458041e+01,  2.99328230e+00,  1.21505520e+01,  3.70836344e+00,
        1.08663018e+01,  3.86679739e+00,  7.27572286e+00,  2.95082669e+01,
        2.28551986e+01,  8.11729167e+00,  1.09926147e+01,  1.23986210e+01,
        6.04083954e+00,  5.54532532e+00,  5.13323449e+00,  8.13637450e+00,
        5.03507346e+00,  6.15626580e+00,  5.97910234e+00,  5.35986738e+00,
       -2.24795766e-01,  6.53422485e+00, -1.30726559e+00,  4.22683927e+00,
        3.46130466e+00,  2.62146976e+00,  5.16180438e+00, -6.12786456e-01,
        1.87310775e+00,  2.18184157e+00,  6.84458483e-01,  1.15822758e+01,
        5.43069136e+01,  6.54671081e+01,  3.79330854e+01,  3.21481995e+01,
        1.72037115e+01,  1.59411311e+01,  1.74161390e+01,  1.11834417e+01,
        1.28956970e+01,  1.40528607e+01,  1.94235419e+01,  9.06373403e+00,
        2.21515945e+01,  1.22072454e+01,  1.16944338e+01,  9.81890962e+00,
        1.34160836e+01,  6.47409346e+00,  8.09325423e+00,  5.34652950e+00,
        1.15204048e+01,  1.72578040e+01,  6.88453524e+00,  7.38962753e+00,
        3.82242989e+00,  4.63910853e+00,  3.52678463e+00,  4.17896473e+00,
        7.89789526e+00,  8.12813073e+00,  1.27843162e+00,  9.97198566e+00,
        8.03247501e+00,  6.45394674e+00,  5.12676355e+00,  3.96953706e+00,
        9.13592851e+00,  1.70473040e-01,  6.70820319e+00,  4.63104526e+00,
        4.59259892e+00,  3.44236036e+00,  2.69420533e+00,  6.32927942e+00,
        7.78319819e+00,  1.53732744e+01,  1.28845013e+01,  7.87680461e+00,
        1.26908110e+01,  9.99255503e+00,  8.46245403e+00,  8.29923735e+00,
        8.85993621e+00,  8.23448929e+00,  8.17877211e+00,  1.09487410e+01,
        2.08215886e+00,  4.43309931e+00,  5.53899424e+00,  3.45700961e+00,
        4.43442418e+00, -3.65736914e-01,  3.47071664e+00,  1.49935272e+01,
        1.97778766e+01,  2.34133710e+01,  9.59576755e+00,  9.65451702e+00,
        1.81178362e+01,  7.31740896e+00,  8.88841637e+00,  7.79515304e+00,
        3.52915171e+00,  1.30045353e+01,  6.92258405e+00,  8.99422402e+00,
        4.63065558e+00,  7.37182795e+00,  4.50268052e+00,  7.47405175e+00,
        5.66246456e+00,  6.41619652e+00,  6.22417826e+00,  3.21996267e+00,
        5.11293760e+00])
Enter fullscreen mode Exit fullscreen mode

Evaluation: 10-Fold Cross-Validation

from sklearn.model_selection import cross_val_score 
accuracy = cross_val_score(estimator=regressor, X=X_train, y=Y_train, cv=10)
print(accuracy.mean())
print(accuracy.std())
Enter fullscreen mode Exit fullscreen mode
  • Metric: By default, LinearRegression with cross_val_score uses the estimator’s .score() which is .

  • Report:

    • CV Mean R²: {{CV_MEAN_R2}}
    • CV Std: {{CV_STD_R2}}

Metrics

If you prefer error metrics, add: from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score and compute MAE/RMSE/R² on the held-out test set.


What I Learned

  • Even a small, clean feature set can drive a reasonable baseline.
  • fpl_value and fpl_points usually show strong signal; if you have access to richer performance data (minutes, xG, assists/90, age-curve features), add them.
  • page_views captures attention, which influences pricing; try other popularity proxies.
  • Consider regularized models (Ridge/Lasso) or tree ensembles (RandomForest, XGBoost) and compare CV scores.
  • Plot residuals vs. predicted to check for systematic under/over-valuation (especially on very high-value players).

Follow-Up Questions

I’d love to hear your thoughts!

  • Which additional features would you include to improve prediction accuracy?
  • Do you think football player value is more influenced by performance or popularity metrics?
  • Would you like to see a version of this project using RandomForest/XGBoost?
  • Should I deploy this model as an interactive web app where you can enter player stats and get predictions?

Feel free to comment below — I’d love to discuss and expand this project further!


Connect With Me

Let’s learn and build cool data projects together!

Top comments (0)