Basic libraries like Numpy, Pandas, Seaborn, Matplotlib are pre-installed in Kaggle. You just need to import them in your notebook.
Loading datasets in your notebook
Kaggle is organised in directories.
/kaggle/working/
is your present working directory.
/kaggle/input/
is where public datasets are kept.
So you can go to the sidebar in your Kaggle notebook and "Add Input" (for public Kaggle databases) or "Upload" (if you have own/non-Kaggle database). For example, you can add the "Titanic" dataset and then it appears in your /kaggle/input
directory.
To print all the datasets added to your notebook, you can use Python's built-in os
module, which can walk through a directory tree (like /kaggle/input).
So os.walk('/kaggle/input')
is a generator that walks through the directory /kaggle/input and returns 3 things for each folder it visits:
- dirname: current folder path
- subdirs: a list of subdirectories (which we can ignore with
_
) - filenames: a list of all the files in that folder
So you can run the following code :
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
and it will output something like
/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv
Note that you only see what your notebook has mounted — not all Kaggle datasets.
To master data science projects (like the Titanic one), it's important to follow a structured pipeline :
- Data Loading
- Data Preprocessing:
- Data Cleaning (you can merge all dataframes before cleaning it)
- Feature Engineering
- Encoding
- EDA
- Preprocessing
- Model Building
- Model Tuning
Top comments (0)