It’s always nice to have some bigger database around to play with a new technologies. IMDB has a lot of data, but it’s not straightforward on how to import those into your PostgreSQL instance. I made a quick bash script to simplify the process.
Please make sure to have
csvkit (python lib to work with CSV files)
pv (utility to monitor the progress of data through a pipe)
installed.
- To install csvkit run:
sudo pip install csvkit
- To install pv run one of the following:
-
For Linux:
sudo apt-get install pv
-
For Mac Ports:
sudo port install pv
-
For Homebrew:
brew install pv
-
For Linux:
To execute import script:
bash <(wget -nv -O - https://gist.githubusercontent.com/1mehal/13c85e108cbc906f5ec34d28d75b1968/raw/imdb_import_in_postgresql.sh)
Script would attempt to:
- Download latest archive of IMDB data and put it to folder you specified or use existing data
- Create corresponding tables in PostgreSQL database (database should exists before running the script)
- Import data from tsv files into corresponding tables
- Set up primary keys and foreign keys and relationships between tables. Also script would setup search indexes.
If something fails please follow up the script logic, and leave comments. bash script and corresponding sql scripts are located at https://gist.github.com/1mehal/13c85e108cbc906f5ec34d28d75b1968
That’s it.
Top comments (0)