DEV Community

Devesh Jawla
Devesh Jawla

Posted on

How to run Julia on Kaggle? And submit a notebook with internet off?

Kaggle supports code in Python and R languages. For Python the familiar python notebook is provided. However to run Julia code is not supported, moreover if you do a pip install Julia, or JuliaCall, or some other method (for example I tried the following blog) to run Julia from Python, still it is difficult to submit a notebook to a competition which requires internet off.

The problem I faced in the latter blog post is that it requires internet to download dependencies, and then again to download the packages required by your own code. Therefore I present a simple approach to submit notebooks to Kaggle competitions, especially those requiring internet off.

Set up a Ubuntu VM and have your Julia project running on it

  1. Download a Virtual Machine software suitable for your PC.
  2. Download and install Ubuntu OS (preferably x86_64, but any other will do). Henceforth we will refer to this Ubuntu OS as guest. And at the time of writing this post julia-1.11.3 was the latest Julia version and therefore throughout this tutorial we have used this. Replace accordingly for your own case.
  3. Your PC is the host, and you can enable sharing a folder with the guest for easy file transfers between host and guest.
  4. Install the Julia Language on guest.
  5. Copy your Julia project code and its data files to guest (for example in a folder called myproject).
  6. Now instantiate and update your Julia project on guest. This will download and install all the project dependencies (packages required by your code) in the ~/.julia folder.

Bundle your Julia Project and its dependencies in a zip file

Zip in a bundle the ~/.julia folder, and the Project folder on guest using the following command

tar czvf julia_bundle.tar.gz \
myproject \
~/.julia
Enter fullscreen mode Exit fullscreen mode

Now copy the bundle to Host and upload the bundle to your Kaggle notebook as a dataset.
cp julia_bundle.tar.gz /media/share/julia_bundle.tar.gz.csv

Upload Julia as a dataset to Kaggle

Download Julia for Generic Linux on x86 and upload to Kaggle with a .csv appended to the name.

julia-1.11.3-linux-x86_64.tar.gz -> julia-1.11.3-linux-x86_64.tar.gz.csv

Extract Julia and your Bundle in the Kaggle Notebook

!tar xvf /kaggle/input/julia-exe/julia-1.11.3-linux-x86_64.tar.csv
!tar xvf /kaggle/input/bundle/julia_bundle.tar.gz.csv
Enter fullscreen mode Exit fullscreen mode

These will be extracted in the /kaggle/working directory

Set environment variables for Julia and its packages

import os
os.environ['JULIA_DEPOT_PATH'] = ':/kaggle/working/home/ubuntu/.julia'
os.environ['PATH'] += ':/kaggle/working/julia-1.11.3/bin'
Enter fullscreen mode Exit fullscreen mode

The /home/ubuntu/ path may be different in your case, adjust appropriately.

Execute your Julia file on Kaggle

!julia --project=/kaggle/working/home/ubuntu/myproject /kaggle/working/home/ubuntu/myproject/Test.jl
Enter fullscreen mode Exit fullscreen mode

Typically your Julia file, in our case Test.jl, contains all the code which would then call on other code and files as needed. If you need to use data from a competition then you must insert into your Test.jl, the path of necessary competition files such as a train file for example /kaggle/input/competition_directory/train.csv. This is relevant when you have trained a model on your host machine and now simply want to submit a trained model and run it on the data provided by the competition. Using the Kaggle paths in Test.jl for any inputs and outputs from your Julia code will help seamlessly run you code for any new data which the competition sponsor wants to test later on.

submission.csv

Typically for competitions, the Kaggle notebook needs to output a submission.csv. So we need to remove everything except the submission.csv from the /kaggle/working directory. Your Test.jl should have written a submission.csv file to the /kaggle/working directory. Run the following commands to remove Julia installation and your project and write a submission.csv using pandas just to double check that you can read it and then write it again.

import pandas as pd
submission=pd.read_csv('/kaggle/working/submission.csv')
print("Read Submission File")
!rm -rf /kaggle/working/julia-1.11.3
!rm -rf /kaggle/working/home/
submission.to_csv('submission.csv', index=False)
print('Wrote Subsmission')
Enter fullscreen mode Exit fullscreen mode

Hope that was helpful. Any comments or issues, please post, I'd be happy to help.

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more