DEV Community

Cover image for Day 4: Create a Standard ML Project Structure
Thu Kha Kyawe
Thu Kha Kyawe

Posted on

Day 4: Create a Standard ML Project Structure

Lab Information

A colleague has started a new ML project at /root/code/fraud-detection/, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team's conventions.

Inspect the existing project at /root/code/fraud-detection/.

The final layout must match the tree below exactly:
Enter fullscreen mode Exit fullscreen mode

fraud-detection/
├── data/
│ ├── raw/
│ └── processed/
├── models/
├── notebooks/
├── src/
│ ├── data/
│ ├── features/
│ ├── models/
│ └── utils/
├── tests/
├── configs/
├── requirements.txt
└── README.md

Every subdirectory under src/ must contain an __init__.py file so that Python recognises it as a package.

requirements.txt must list the following dependencies, one per line: scikit-learn, pandas, numpy, and mlflow. The canonical PyPI name for the scikit-learn package is scikit-learn.

README.md must begin with the heading # fraud-detection.

Review the existing project and correct everything that does not match the requirements above.
Enter fullscreen mode Exit fullscreen mode

Lab Solutions

🧭 Part 1: Lab Step-by-Step Guidelines

Run the following commands on the controlplane host.

Step 1 — Move into the project directory

cd /root/code/fraud-detection
Enter fullscreen mode Exit fullscreen mode

Step 2 — Inspect the current structure

tree
Enter fullscreen mode Exit fullscreen mode

If tree is unavailable:

sudo apt update && sudo apt install tree
Enter fullscreen mode Exit fullscreen mode

Step 3 — Check the required directory structure

Run:

tree
Enter fullscreen mode Exit fullscreen mode
  1. Rename incorrect directories
mv src/feature src/features
mv src/util src/utils
Enter fullscreen mode Exit fullscreen mode
  1. Create missing directories
mkdir -p data/raw
mkdir -p data/processed
mkdir -p tests
mkdir -p configs
Enter fullscreen mode Exit fullscreen mode
  1. Verify init.py still exists

Check:

ls src/features
ls src/utils
Enter fullscreen mode Exit fullscreen mode

You should see:

init.py

Step 4 — Verify and fix requirements.txt

Create/update the file:

cat > requirements.txt 
Enter fullscreen mode Exit fullscreen mode

Output:

sklearn
pandas
numpy
Enter fullscreen mode Exit fullscreen mode

Create the correct requirements.txt:

cat > requirements.txt <<EOF
scikit-learn
pandas
numpy
mlflow
EOF
Enter fullscreen mode Exit fullscreen mode

Create/update the README:

cat README.md 
Enter fullscreen mode Exit fullscreen mode

Output:

# Fraud

ML project for fraud detection at xFusionCorp Industries.
Enter fullscreen mode Exit fullscreen mode

Replace the README

Run:

cat > README.md <<EOF
# fraud-detection

ML project for fraud detection at xFusionCorp Industries.
EOF
Enter fullscreen mode Exit fullscreen mode

Step 7 — Verify the final structure and README.md content

Run:

tree
cat README.md 
Enter fullscreen mode Exit fullscreen mode

Expected structure:

fraud-detection/
├── data/
│ ├── raw/
│ └── processed/
├── models/
├── notebooks/
├── src/
│ ├── data/
│ │ └── init.py
│ ├── features/
│ │ └── init.py
│ ├── models/
│ │ └── init.py
│ └── utils/
│ └── init.py
├── tests/
├── configs/
├── requirements.txt
└── README.md

root@controlplane ~/code/fraud-detection via 🐍 v3.12.3 ➜  cat README.md 
# fraud-detection

ML project for fraud detection at xFusionCorp Industries.
Enter fullscreen mode Exit fullscreen mode

🧠 Part 2: Simple Beginner-Friendly Explanation

This lab focuses on organising a machine learning project according to the xFusionCorp Industries standard structure.

The goal is to:

standardise project layout
improve maintainability
make collaboration easier for developers and data scientists

You must inspect the existing project and correct anything that does not match the required structure.

Understanding the Required Project Structure

The final project must look exactly like this:

fraud-detection/
├── data/
│ ├── raw/
│ └── processed/
├── models/
├── notebooks/
├── src/
│ ├── data/
│ ├── features/
│ ├── models/
│ └── utils/
├── tests/
├── configs/
├── requirements.txt
└── README.md

Each folder has a specific purpose in an ML workflow.

Purpose of Each Directory

data/
Stores datasets used in the project.

data/raw/
Contains original unmodified data.

Example:
transactions.csv

data/processed/
Contains cleaned or transformed datasets used for training.

Example:
clean_transactions.csv

models/
Stores trained machine learning models.

Example:
fraud_model.pkl

notebooks/
Contains Jupyter notebooks for experimentation and analysis.

Example:
eda.ipynb

src/
Contains the main Python source code for the application.
This keeps project logic organised and modular.

Why init.py Files Are Required
Every subdirectory under src/ must contain:
init.py

This tells Python:
“Treat this directory as a Python package.”

Without these files:
imports may fail
modules may not be recognised correctly

Example:
from src.models.train import train_model

Purpose of src/ Subdirectories

Directory Purpose
src/data Data loading and preprocessing
src/features Feature engineering logic
src/models Training and prediction code
src/utils Helper functions and utilities

Why requirements.txt Matters
The lab requires the following dependencies:

scikit-learn
pandas
numpy
mlflow

This file helps developers install all required Python packages consistently.

Important note:

the correct PyPI package name is scikit-learn
not sklearn

Why README.md Matters
The README file provides project documentation.

The lab specifically requires it to begin with:

fraud-detection

This acts as the project title and ensures naming consistency.

Why Exact Naming Is Important
Lab validators check:
exact folder names
exact file names
exact dependency names

Even small differences such as:
feature instead of features
util instead of utils

Fraud instead of # fraud-detection

can cause the lab to fail.


Resources & Next Steps
📦 Full Code Repository: KodeKloud Learning Labs
💬 Join Discussion: DEV Community - Share your thoughts and questions
💼 Let's Connect: LinkedIn - I'd love to connect with you

Credits
• All labs are from: KodeKloud
• I sincerely appreciate your provision of these valuable resources.

Top comments (0)