Competition URL
https://www.kaggle.com/competitions/mitsui-commodity-prediction-challenge/rules#7-competition-data
Overview
The objective of this competition is to develop accurate and stable prediction models for commodity prices, which seems relatively straightforward. The data is lightweight, and unlike recent competitions that require high-spec PCs just to participate, I felt this was something I could actually join, so I decided to give it a try.
Data
The dataset consists of multiple time series data obtained from financial markets worldwide, including various financial products such as metals, futures, US stocks, and foreign exchange. The markets include:
- LME: London Metal Exchange
- JPX: Japan Exchange Group
- US: Various US securities exchanges
- FX: Foreign Exchange
Key features:
- 1,977 columns - quite a large-scale dataset
- 4 markets: LME (London metals), JPX (Japan), US (US stocks), FX (foreign exchange)
- 424 targets: Single commodity returns and differences between commodity pairs
- 1-4 day lags: Split into test_labels_lag_[1-4].csv
- Need to consider lags due to financial institution holidays and processing time
Prediction
Apparently, the actual leaderboard isn't very useful, as the evaluation is based on how well predictions match actual values after the competition ends.
Code Reading
Reading public code shared by maverick_ss_26.
Examining Code to Verify if the Leaderboard is Really Invalid
https://www.kaggle.com/code/maverickss26/commodity-price-prediction-v1
Preprocessing and Other Tips
- Check data quality by summing null values:
.isnull().sum()
- Display missing values using histograms
- Use Prophet time series model
- Express price volatility using standard deviation
About Prophet Model Options
changepoint_prior_scale=0.05, # Trend change flexibility (smaller = more stable)
interval_width=0.95 # 95% confidence interval for predictions
General Example of Prophet Usage
from prophet import Prophet
import pandas as pd
# Data preparation (required: ds and y columns)
df = pd.DataFrame({
'ds': ['2024-01-01', '2024-01-02', '2024-01-03'], # Dates
'y': [100, 120, 110] # Values to predict
})
# Create and train model
model = Prophet()
model.fit(df)
# Create future dates
future = model.make_future_dataframe(periods=7) # 7 days ahead
# Execute prediction
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']])
It can be used quite easily like this.
It appears that the test data contains training data, which shows that the leaderboard is meaningless.
Top comments (0)