Muhammad Ikramullah Khan

Posted on Jan 14

Running Scrapy on a Server: The Beginner's Guide to nohup

#webdev #programming #beginners #python

I ran my first spider on a server. It was scraping thousands of products. I started it, closed my laptop, and went home feeling proud.

The next morning, I checked the results. Nothing. The spider had stopped 5 minutes after I left.

I didn't know that closing my laptop killed the spider. I wasted an entire day. Then someone taught me about nohup, and everything changed.

Let me explain nohup in the simplest way possible.

The Problem (In Plain English)

What Normally Happens

You connect to your server:

ssh myserver.com

You run your spider:

scrapy crawl myspider

Your spider starts working. You see output on the screen. Everything looks good!

But then:

You close your laptop
Your internet disconnects
You lose WiFi connection

And your spider DIES.

All progress lost. All data lost. Everything stops.

Why? Because when you disconnect from the server, everything you started gets killed.

It's like pulling the plug on a computer while it's working.

What is nohup? (The Simple Explanation)

nohup is a simple command that tells your server:

"Keep this program running, even if I disconnect."

That's it. That's all it does.

Think of it like setting your spider to "run on its own" mode.

How to Use nohup (Step by Step)

Step 1: Connect to Your Server

ssh username@yourserver.com

Replace username with your actual username and yourserver.com with your server address.

Step 2: Go to Your Project Folder

cd /home/username/my_scrapy_project

Step 3: Run Your Spider with nohup

Instead of:

scrapy crawl myspider

Type this:

nohup scrapy crawl myspider &

That's it! Notice two things:

nohup at the beginning
& at the end

Step 4: You'll See This

nohup: ignoring input and appending output to 'nohup.out'

This means it's working!

Step 5: Disconnect Safely

Now you can:

Close your laptop
Turn off your computer
Disconnect from WiFi

Your spider keeps running!

What Does Each Part Mean?

Let's break down the command:

nohup scrapy crawl myspider &

nohup = "no hangup" (keep running when I disconnect)

scrapy crawl myspider = your normal spider command

& = run in background (give me my terminal back)

Where Did My Output Go?

When you use nohup, output doesn't show on screen. It goes to a file.

Check the Output File

cat nohup.out

This shows everything your spider printed.

Watch It Live

tail -f nohup.out

This shows the output as it happens, like watching it live.

Press Ctrl+C to stop watching (spider keeps running).

How to Check if Your Spider is Running

After you start your spider with nohup, how do you know it's still working?

Check Running Processes

ps aux | grep scrapy

You'll see something like:

username  12345  5.2  2.1  python scrapy crawl myspider

If you see this, your spider is running!

The number (12345) is the process ID. Remember it.

How to Stop Your Spider

Find the Process ID

ps aux | grep scrapy

Look for the number in the second column. Let's say it's 12345.

Stop the Spider

kill 12345

Replace 12345 with your actual process ID.

Your spider stops gracefully (finishes what it's doing, then stops).

Force Stop (If Normal Stop Doesn't Work)

kill -9 12345

This stops it immediately.

Better Way: Custom Output File

Don't use nohup.out. Give it a better name:

nohup scrapy crawl myspider > myspider.log 2>&1 &

Now output goes to myspider.log instead of nohup.out.

What does 2>&1 mean?

It means "send errors to the same file as normal output." Don't worry about the details, just always include it.

View Your Custom Log

tail -f myspider.log

Complete Example (Copy and Paste)

Here's everything together:

# 1. Connect to server
ssh username@server.com

# 2. Go to project folder
cd /home/username/my_project

# 3. Start spider with nohup
nohup scrapy crawl myspider > myspider.log 2>&1 &

# 4. Check it's running
ps aux | grep scrapy

# 5. Watch the logs
tail -f myspider.log

# 6. Stop watching (Ctrl+C)
# Spider keeps running!

# 7. Disconnect
exit

Common Questions

Q: Can I run multiple spiders?

Yes! Just use different log files:

nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &
nohup scrapy crawl spider3 > spider3.log 2>&1 &

Q: How do I know when my spider finished?

Check if it's still running:

ps aux | grep scrapy

If you don't see it, it finished (or crashed).

Check the log to see what happened:

tail myspider.log

Q: What if my spider crashes?

Check the log file for errors:

cat myspider.log | grep ERROR

Q: Can I see how long it's been running?

ps aux | grep scrapy

Look at the TIME column.

Making It Even Easier (Create a Script)

Instead of typing the long command every time, create a simple script.

Create the Script

nano run_spider.sh

Type this inside:

#!/bin/bash
nohup scrapy crawl myspider > myspider.log 2>&1 &
echo "Spider started! Check log with: tail -f myspider.log"

Save and exit (Ctrl+X, then Y, then Enter).

Make It Executable

chmod +x run_spider.sh

Use It

./run_spider.sh

Done! Much easier.

Making a Stop Script

Create an easy way to stop your spider.

Create the Script

nano stop_spider.sh

Type this inside:

#!/bin/bash
pkill -f "scrapy crawl myspider"
echo "Spider stopped!"

Save and exit.

Make It Executable

chmod +x stop_spider.sh

Use It

./stop_spider.sh

Your spider stops immediately.

Real Beginner Example

Let's say you have a spider that scrapes products from a website.

Your Situation

Spider name: products
Server: myserver.com
Username: john
Project folder: /home/john/scraper

What You Do

Step 1: Connect to server

ssh john@myserver.com

Step 2: Go to project

cd /home/john/scraper

Step 3: Start spider

nohup scrapy crawl products > products.log 2>&1 &

Step 4: Verify it's running

ps aux | grep products

You see:

john  15678  4.5  2.3  python scrapy crawl products

Great! It's running.

Step 5: Watch logs for a minute

tail -f products.log

You see your spider working. Press Ctrl+C.

Step 6: Disconnect

exit

Done! Your spider keeps running.

Checking on Your Spider Later

You come back the next day. Is it still running?

Connect Again

ssh john@myserver.com

Check if Running

ps aux | grep products

If you see it, it's still running!

If you don't see it, it finished. Check the log:

tail products.log

Look at the last lines to see if it finished successfully.

Troubleshooting

Problem: Spider Stops Immediately

Check the log:

cat myspider.log

Common reasons:

Python error in your code
Missing dependency
Wrong spider name

Problem: Can't Find Log File

Check what files you have:

ls -la

Look for nohup.out or your custom log file.

Problem: Too Many Logs

Delete old logs:

rm *.log

Or keep only recent ones:

rm old_spider.log

Simple Checklist

Before you disconnect, check:

[ ] Used nohup at the beginning
[ ] Used & at the end
[ ] Checked spider is running (ps aux | grep scrapy)
[ ] Know where log file is
[ ] Tested watching logs (tail -f logfile.log)

If all checked, you're good to disconnect!

What NOT to Do

Don't Forget the &

# WRONG
nohup scrapy crawl myspider

# RIGHT
nohup scrapy crawl myspider &

Without &, your terminal gets stuck.

Don't Close Terminal While Loading

Wait until you see:

nohup: ignoring input and appending output to 'nohup.out'

Then you can disconnect.

Don't Use Same Log File for Different Spiders

# WRONG (both use same log)
nohup scrapy crawl spider1 > log.txt 2>&1 &
nohup scrapy crawl spider2 > log.txt 2>&1 &

# RIGHT (different logs)
nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &

Quick Command Reference

Start spider:

nohup scrapy crawl myspider > myspider.log 2>&1 &

Check if running:

ps aux | grep scrapy

Watch logs:

tail -f myspider.log

Stop spider:

pkill -f "scrapy crawl myspider"

Check last 20 lines of log:

tail -n 20 myspider.log

Summary

What is nohup?
A command that keeps your spider running after you disconnect.

How to use it:

nohup scrapy crawl myspider > myspider.log 2>&1 &

Remember:

nohup = keep running
> myspider.log = save output to file
2>&1 = save errors too
& = run in background

To check if running:

ps aux | grep scrapy

To stop:

pkill -f "scrapy crawl myspider"

That's all you need to know!

You don't need to understand all the technical details. Just copy the commands, and your spider will keep running even after you disconnect.

Your First Time (Practice)

Let's practice together:

Connect to your server
Go to your project folder
Type: nohup scrapy crawl YOURSPIDER > spider.log 2>&1 &
Type: ps aux | grep scrapy (check it's running)
Type: tail -f spider.log (watch for 10 seconds)
Press Ctrl+C (stop watching, spider keeps running)
Type: exit (disconnect from server)

Congratulations! Your spider is now running on the server, and you can go home, sleep, or do anything else. It will keep working.

Tomorrow, connect again and check spider.log to see what happened!

That's all there is to it. nohup is simple, but it's incredibly powerful.

Happy scraping! 🕷️