DEV Community

Yuji Suehiro
Yuji Suehiro

Posted on

First step to getting started in data science

When you start data science, it can be a laborious task to run the necessary software and build an environment to run the software on a new computer.

I have written a script that can automatically install a collection of basic software commonly used in data science, so I will present it in this document.
The software to be installed is listed below.

Since the individual software is well-known, you can find installation instructions for each of them in various books and websites. However, it may be time-consuming to look up installation procedures for each software individually. This script can install them all at once, saving you time and effort.

* For use of the script, the OS should be Ubuntu 18.04 (64bit) and the NVIDIA GPU should be installed

Requirement

This script works in the OS Ubuntu 18.04 and requires NVIDIA GPU.

Test Environment

I tested the following systems.

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
Enter fullscreen mode Exit fullscreen mode

The model numbers of NVIDIA GPUs are as follows.

$ lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GK110GL [Quadro K5200] (rev a1)
Enter fullscreen mode Exit fullscreen mode

Software list installed by the script

You can install the following apps and libraries*1 to run the apps.

Name
NVIDIA GPU Driver *2
git
Clang
CMake
Perl
GNU Fortran
CUDA+CuDNN *3
Python3 *4 (+pip)(+modules *5)
OpenJDK
Docker(+NVIDIA Docker)
MySQL+SQLite
Node+npm
R
Graphviz
OpenCV
FFmpeg
cURL
cifs
clamAV
doxygen
mozc *6
OpenSSH(server)+ufw *7

*1 Please check lines 54-62 in the script for details.

*2 The default driver version is set as 470. If you need, please change the version according to the following description.

*3 You need to download the CUDA package and CuDNN sources before running the script. Please refer to the following description.

*4 python3 in Ubuntu 18.04 will be updated to 3.7.

*5 The modules are listed below.

*6 For Japanese only. Please ignore or set it as a non-installation app according to the How to customize.

*7 After the installation, the SSH server will be automatically started. If not needed, set it as a non-installation app according to the How to customize.

Python modules installed by the script

Name
numpy
pandas
matplotlib
scikit-learn
hmmlearn
umap-learn
tensorflow
torch
openpyxl
python-docx
python-pptx
reportlab
graphviz
selenium
beautifulsoup4

How to use

1. Download my script

Run the following command to download "multiple_installer.sh" from my GitHub repository.

$ wget https://raw.githubusercontent.com/YujiSue/GeneralScripts/main/multiple_installer.sh
Enter fullscreen mode Exit fullscreen mode

1.5 Customization of the script (Optional)

Please edit the "multiple_installer.sh" if you need.

- Select apps to install

If you find apps that are not required for your task, please edit the sentence inst_XXX=true at lines 13-33 in the script. If you change the value from true to false, the associated app will not be installed.

- Specify Version

Please change the GPU driver version at line 35 depending on the model of your GPU.

You can also change the versions of CMake and OpenCV to install by editing lines 36 and 37 in the script, respectively.

Default values are v3.22.5 for CMake and v4.5.5 for OpenCV.

- Specify directory to download

You can change the directory to store files downloaded during the installation of some apps by editing line 44. The default location is set as "$HOME/Downloads". If you would like to change it, rewrite the directory path after TEMPORARY=.

2. Download the CUDA package and CuDNN source codes

The versions of CUDA and CuDNN can be installed depending on the GPU model. Please refer to here for details.

For this reason, (1) the version of CUDA and CuDNN should be specified in advance, (2) the CUDA package should be downloaded, and (3) the CuDNN source codes should be downloaded.

(1) You can specify the versions of CUDA and CuDNN by editing lines 38-39 in the script "multiple_installer.sh".

The default versions are set as CUDA v11.5 and CuDNN v8.3.1.22.

(2) The CUDA package for Ubuntu can be downloaded from NVIDIA site. Select your OS (Ubuntu18.04 64bit) and download ~.deb.

If you use the default setting, you can download the file by the following command.

$ wget https://developer.download.nvidia.com/compute/cuda/11.5.1/local_installers/cuda-repo-ubuntu1804-11-5-local_11.5.1-495.29.05-1_amd64.deb
Enter fullscreen mode Exit fullscreen mode

(3) The CuDNN source codes can be downloaded from NVIDIA developer's site. Since the NVIDIA developer's account is required for the download, if you do not have your account, create an account for the downloading. Once logged in, download the CuDNN source codes for your version of CUDA.

If you use the default setting, download the following file.

cudnn-linux-x86_64-8.3.1.22_cuda11.5-archive.tar.xz
Enter fullscreen mode Exit fullscreen mode

(4) The downloaded files should be stored in the directory specified on line 44 in the script. The default directory is set as "$HOME/Downloads".

If you changed the directory in the customize section, do not forget to change the save location.

3. Run the "multiple_installer.sh"

You can run the script by bash command.

To run the "sudo" command, you need to input your password once.

$ bash multiple_installer.sh
Enter fullscreen mode Exit fullscreen mode

4. (Optional) Remove downloaded files

$ rm multiple_installer.sh
Enter fullscreen mode Exit fullscreen mode

Reference docs or sites

Top comments (0)