While Kaggle has published a great guide on how to use their Docker images, the instructions for MacOS are slightly outdated and require some modifications to work properly.
The problem lies in that the guide Kaggle published uses docker-machine
, but Docker has removed docker-machine
from the later versions of Docker Desktop. The fix is simple, but it would've saved me 30 minutes of Googling if someone wrote about this problem – so here I am.
This is not a replacement for any of the great guides to setting up Docker for Data Science out there. I still recommend reading Kaggle's guide for a better understanding of the process – I just aim to point out the steps that one can use with the updated Docker Desktop for Mac.
Instructions
Updated 17 May 2020
Step 1:
Install Docker Desktop for Mac from here.
Step 2:
Start up Docker and adjust the VM preferences. This menu can be found by clicking on the Docker menubar icon and selecting "Preferences...".
The recommendation is to increase the CPU count, disk size and memory to allow the VM to better handle data science operations.
Step 3:
Pull the image you wish to use. You can get the Kaggle Python image by running
$ docker pull kaggle/python
This step will take a while, as the image is quite large and takes time to download.
Step 4:
Put these lines in your .bashrc
or .zshrc
or whatever equivalent file:
# Kaggle Docker shorthand functions
kpython(){
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python python "$@"
}
ikpython() {
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python ipython
}
kjupyter() {
(sleep 3 && open "http://localhost:8888")&
docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
}
These shorthand functions allow you to use kpython
as a replacement for calling python
, ikpython
instead of ipython
, and run kjupyter
to start a Jupyter Notebook session. These will be done using the specified Docker image, which in this case is kaggle/python
. Replace the image name if necessary!
The change I've made to these functions is replacing $(docker-machine ip docker2)
in the original instructions to localhost
.
Optional adjustments:
Default browser where Jupyter Notebook is opened
Running kjupyter
opens http://localhost:8888
on your default browser.
If you want it to open in a different browser, add "<browser app name>"
after open
. For example, if I wish to open the link in Microsoft Edge instead:
kjupyter() {
(sleep 3 && open "Microsoft Edge" "http://localhost:8888")&
docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
}
If you do not want it to automatically open the link, remove (sleep 3 && open "Microsoft Edge" "http://localhost:8888")&
from the code.
Sleep time
The kjupyter
function holds for 3 seconds before opening the link to allow the Jupyter Notebook session to start up. However, you may find that your session takes longer/shorter to start up. Simply adjust sleep 3
to whatever delay you prefer.
Final notes
I hope these instructions aren't a duplicate of someone else's out there. Hopefully Kaggle updates the guide provided on their GitHub for the updated Docker Desktop and this will no longer be relevant.
Do contact me if there are any problems with the instructions.
References
How to get started with data science in containers
– Originally written by Jamie Hall and posted by the Kaggle Team. I got the bulk of the instructions from here.
How to setup a Data Science workflow with Kaggle Python Docker Image on Laptop – I got the bash functions from this article.
Top comments (0)