Kaggle Diary: MITSUI&CO. Commodity Prediction Challenge: Day 1

#kaggle #machinelearning #prophet

Competition URL

https://www.kaggle.com/competitions/mitsui-commodity-prediction-challenge/rules#7-competition-data

Overview

The objective of this competition is to develop accurate and stable prediction models for commodity prices, which seems relatively straightforward. The data is lightweight, and unlike recent competitions that require high-spec PCs just to participate, I felt this was something I could actually join, so I decided to give it a try.

Data

The dataset consists of multiple time series data obtained from financial markets worldwide, including various financial products such as metals, futures, US stocks, and foreign exchange. The markets include:

LME: London Metal Exchange
JPX: Japan Exchange Group
US: Various US securities exchanges
FX: Foreign Exchange

Key features:

1,977 columns - quite a large-scale dataset
4 markets: LME (London metals), JPX (Japan), US (US stocks), FX (foreign exchange)
424 targets: Single commodity returns and differences between commodity pairs
1-4 day lags: Split into test_labels_lag_[1-4].csv
Need to consider lags due to financial institution holidays and processing time

Prediction

Apparently, the actual leaderboard isn't very useful, as the evaluation is based on how well predictions match actual values after the competition ends.

Code Reading

Reading public code shared by maverick_ss_26.

Examining Code to Verify if the Leaderboard is Really Invalid

https://www.kaggle.com/code/maverickss26/commodity-price-prediction-v1

Preprocessing and Other Tips

Check data quality by summing null values: .isnull().sum()
Display missing values using histograms
Use Prophet time series model
Express price volatility using standard deviation

About Prophet Model Options

changepoint_prior_scale=0.05,  # Trend change flexibility (smaller = more stable)
interval_width=0.95  # 95% confidence interval for predictions

General Example of Prophet Usage

from prophet import Prophet
import pandas as pd

# Data preparation (required: ds and y columns)
df = pd.DataFrame({
    'ds': ['2024-01-01', '2024-01-02', '2024-01-03'],  # Dates
    'y': [100, 120, 110]  # Values to predict
})

# Create and train model
model = Prophet()
model.fit(df)

# Create future dates
future = model.make_future_dataframe(periods=7)  # 7 days ahead

# Execute prediction
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']])

It can be used quite easily like this.
It appears that the test data contains training data, which shows that the leaderboard is meaningless.

DEV Community