Getting started with Kaggle

Basic libraries like Numpy, Pandas, Seaborn, Matplotlib are pre-installed in Kaggle. You just need to import them in your notebook.

Loading datasets in your notebook

Kaggle is organised in directories.
/kaggle/working/ is your present working directory.
/kaggle/input/ is where public datasets are kept.

So you can go to the sidebar in your Kaggle notebook and "Add Input" (for public Kaggle databases) or "Upload" (if you have own/non-Kaggle database). For example, you can add the "Titanic" dataset and then it appears in your /kaggle/input directory.

To print all the datasets added to your notebook, you can use Python's built-in os module, which can walk through a directory tree (like /kaggle/input).
So os.walk('/kaggle/input') is a generator that walks through the directory /kaggle/input and returns 3 things for each folder it visits:

dirname: current folder path
subdirs: a list of subdirectories (which we can ignore with _)
filenames: a list of all the files in that folder

So you can run the following code :

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

and it will output something like

/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv

Note that you only see what your notebook has mounted — not all Kaggle datasets.

To master data science projects (like the Titanic one), it's important to follow a structured pipeline :

Data Loading
Data Preprocessing:
- Data Cleaning (you can merge all dataframes before cleaning it)
- Feature Engineering
- Encoding
EDA
Preprocessing
Model Building
Model Tuning

DEV Community

Getting started with Kaggle

Loading datasets in your notebook

Top comments (0)