DEV Community

Muhammad Ikramullah Khan
Muhammad Ikramullah Khan

Posted on

Running Scrapy on a Server: The Beginner's Guide to nohup

I ran my first spider on a server. It was scraping thousands of products. I started it, closed my laptop, and went home feeling proud.

The next morning, I checked the results. Nothing. The spider had stopped 5 minutes after I left.

I didn't know that closing my laptop killed the spider. I wasted an entire day. Then someone taught me about nohup, and everything changed.

Let me explain nohup in the simplest way possible.


The Problem (In Plain English)

What Normally Happens

You connect to your server:

ssh myserver.com
Enter fullscreen mode Exit fullscreen mode

You run your spider:

scrapy crawl myspider
Enter fullscreen mode Exit fullscreen mode

Your spider starts working. You see output on the screen. Everything looks good!

But then:

  • You close your laptop
  • Your internet disconnects
  • You lose WiFi connection

And your spider DIES.

All progress lost. All data lost. Everything stops.

Why? Because when you disconnect from the server, everything you started gets killed.

It's like pulling the plug on a computer while it's working.


What is nohup? (The Simple Explanation)

nohup is a simple command that tells your server:

"Keep this program running, even if I disconnect."

That's it. That's all it does.

Think of it like setting your spider to "run on its own" mode.


How to Use nohup (Step by Step)

Step 1: Connect to Your Server

ssh username@yourserver.com
Enter fullscreen mode Exit fullscreen mode

Replace username with your actual username and yourserver.com with your server address.

Step 2: Go to Your Project Folder

cd /home/username/my_scrapy_project
Enter fullscreen mode Exit fullscreen mode

Step 3: Run Your Spider with nohup

Instead of:

scrapy crawl myspider
Enter fullscreen mode Exit fullscreen mode

Type this:

nohup scrapy crawl myspider &
Enter fullscreen mode Exit fullscreen mode

That's it! Notice two things:

  • nohup at the beginning
  • & at the end

Step 4: You'll See This

nohup: ignoring input and appending output to 'nohup.out'
Enter fullscreen mode Exit fullscreen mode

This means it's working!

Step 5: Disconnect Safely

Now you can:

  • Close your laptop
  • Turn off your computer
  • Disconnect from WiFi

Your spider keeps running!


What Does Each Part Mean?

Let's break down the command:

nohup scrapy crawl myspider &
Enter fullscreen mode Exit fullscreen mode

nohup = "no hangup" (keep running when I disconnect)

scrapy crawl myspider = your normal spider command

& = run in background (give me my terminal back)


Where Did My Output Go?

When you use nohup, output doesn't show on screen. It goes to a file.

Check the Output File

cat nohup.out
Enter fullscreen mode Exit fullscreen mode

This shows everything your spider printed.

Watch It Live

tail -f nohup.out
Enter fullscreen mode Exit fullscreen mode

This shows the output as it happens, like watching it live.

Press Ctrl+C to stop watching (spider keeps running).


How to Check if Your Spider is Running

After you start your spider with nohup, how do you know it's still working?

Check Running Processes

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

You'll see something like:

username  12345  5.2  2.1  python scrapy crawl myspider
Enter fullscreen mode Exit fullscreen mode

If you see this, your spider is running!

The number (12345) is the process ID. Remember it.


How to Stop Your Spider

Find the Process ID

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

Look for the number in the second column. Let's say it's 12345.

Stop the Spider

kill 12345
Enter fullscreen mode Exit fullscreen mode

Replace 12345 with your actual process ID.

Your spider stops gracefully (finishes what it's doing, then stops).

Force Stop (If Normal Stop Doesn't Work)

kill -9 12345
Enter fullscreen mode Exit fullscreen mode

This stops it immediately.


Better Way: Custom Output File

Don't use nohup.out. Give it a better name:

nohup scrapy crawl myspider > myspider.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Now output goes to myspider.log instead of nohup.out.

What does 2>&1 mean?

It means "send errors to the same file as normal output." Don't worry about the details, just always include it.

View Your Custom Log

tail -f myspider.log
Enter fullscreen mode Exit fullscreen mode

Complete Example (Copy and Paste)

Here's everything together:

# 1. Connect to server
ssh username@server.com

# 2. Go to project folder
cd /home/username/my_project

# 3. Start spider with nohup
nohup scrapy crawl myspider > myspider.log 2>&1 &

# 4. Check it's running
ps aux | grep scrapy

# 5. Watch the logs
tail -f myspider.log

# 6. Stop watching (Ctrl+C)
# Spider keeps running!

# 7. Disconnect
exit
Enter fullscreen mode Exit fullscreen mode

Common Questions

Q: Can I run multiple spiders?

Yes! Just use different log files:

nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &
nohup scrapy crawl spider3 > spider3.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Q: How do I know when my spider finished?

Check if it's still running:

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

If you don't see it, it finished (or crashed).

Check the log to see what happened:

tail myspider.log
Enter fullscreen mode Exit fullscreen mode

Q: What if my spider crashes?

Check the log file for errors:

cat myspider.log | grep ERROR
Enter fullscreen mode Exit fullscreen mode

Q: Can I see how long it's been running?

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

Look at the TIME column.


Making It Even Easier (Create a Script)

Instead of typing the long command every time, create a simple script.

Create the Script

nano run_spider.sh
Enter fullscreen mode Exit fullscreen mode

Type this inside:

#!/bin/bash
nohup scrapy crawl myspider > myspider.log 2>&1 &
echo "Spider started! Check log with: tail -f myspider.log"
Enter fullscreen mode Exit fullscreen mode

Save and exit (Ctrl+X, then Y, then Enter).

Make It Executable

chmod +x run_spider.sh
Enter fullscreen mode Exit fullscreen mode

Use It

./run_spider.sh
Enter fullscreen mode Exit fullscreen mode

Done! Much easier.


Making a Stop Script

Create an easy way to stop your spider.

Create the Script

nano stop_spider.sh
Enter fullscreen mode Exit fullscreen mode

Type this inside:

#!/bin/bash
pkill -f "scrapy crawl myspider"
echo "Spider stopped!"
Enter fullscreen mode Exit fullscreen mode

Save and exit.

Make It Executable

chmod +x stop_spider.sh
Enter fullscreen mode Exit fullscreen mode

Use It

./stop_spider.sh
Enter fullscreen mode Exit fullscreen mode

Your spider stops immediately.


Real Beginner Example

Let's say you have a spider that scrapes products from a website.

Your Situation

  • Spider name: products
  • Server: myserver.com
  • Username: john
  • Project folder: /home/john/scraper

What You Do

Step 1: Connect to server

ssh john@myserver.com
Enter fullscreen mode Exit fullscreen mode

Step 2: Go to project

cd /home/john/scraper
Enter fullscreen mode Exit fullscreen mode

Step 3: Start spider

nohup scrapy crawl products > products.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Step 4: Verify it's running

ps aux | grep products
Enter fullscreen mode Exit fullscreen mode

You see:

john  15678  4.5  2.3  python scrapy crawl products
Enter fullscreen mode Exit fullscreen mode

Great! It's running.

Step 5: Watch logs for a minute

tail -f products.log
Enter fullscreen mode Exit fullscreen mode

You see your spider working. Press Ctrl+C.

Step 6: Disconnect

exit
Enter fullscreen mode Exit fullscreen mode

Done! Your spider keeps running.


Checking on Your Spider Later

You come back the next day. Is it still running?

Connect Again

ssh john@myserver.com
Enter fullscreen mode Exit fullscreen mode

Check if Running

ps aux | grep products
Enter fullscreen mode Exit fullscreen mode

If you see it, it's still running!

If you don't see it, it finished. Check the log:

tail products.log
Enter fullscreen mode Exit fullscreen mode

Look at the last lines to see if it finished successfully.


Troubleshooting

Problem: Spider Stops Immediately

Check the log:

cat myspider.log
Enter fullscreen mode Exit fullscreen mode

Common reasons:

  • Python error in your code
  • Missing dependency
  • Wrong spider name

Problem: Can't Find Log File

Check what files you have:

ls -la
Enter fullscreen mode Exit fullscreen mode

Look for nohup.out or your custom log file.

Problem: Too Many Logs

Delete old logs:

rm *.log
Enter fullscreen mode Exit fullscreen mode

Or keep only recent ones:

rm old_spider.log
Enter fullscreen mode Exit fullscreen mode

Simple Checklist

Before you disconnect, check:

  • [ ] Used nohup at the beginning
  • [ ] Used & at the end
  • [ ] Checked spider is running (ps aux | grep scrapy)
  • [ ] Know where log file is
  • [ ] Tested watching logs (tail -f logfile.log)

If all checked, you're good to disconnect!


What NOT to Do

Don't Forget the &

# WRONG
nohup scrapy crawl myspider

# RIGHT
nohup scrapy crawl myspider &
Enter fullscreen mode Exit fullscreen mode

Without &, your terminal gets stuck.

Don't Close Terminal While Loading

Wait until you see:

nohup: ignoring input and appending output to 'nohup.out'
Enter fullscreen mode Exit fullscreen mode

Then you can disconnect.

Don't Use Same Log File for Different Spiders

# WRONG (both use same log)
nohup scrapy crawl spider1 > log.txt 2>&1 &
nohup scrapy crawl spider2 > log.txt 2>&1 &

# RIGHT (different logs)
nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Quick Command Reference

Start spider:

nohup scrapy crawl myspider > myspider.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Check if running:

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

Watch logs:

tail -f myspider.log
Enter fullscreen mode Exit fullscreen mode

Stop spider:

pkill -f "scrapy crawl myspider"
Enter fullscreen mode Exit fullscreen mode

Check last 20 lines of log:

tail -n 20 myspider.log
Enter fullscreen mode Exit fullscreen mode

Summary

What is nohup?
A command that keeps your spider running after you disconnect.

How to use it:

nohup scrapy crawl myspider > myspider.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Remember:

  • nohup = keep running
  • > myspider.log = save output to file
  • 2>&1 = save errors too
  • & = run in background

To check if running:

ps aux | grep scrapy
Enter fullscreen mode Exit fullscreen mode

To stop:

pkill -f "scrapy crawl myspider"
Enter fullscreen mode Exit fullscreen mode

That's all you need to know!

You don't need to understand all the technical details. Just copy the commands, and your spider will keep running even after you disconnect.


Your First Time (Practice)

Let's practice together:

  1. Connect to your server
  2. Go to your project folder
  3. Type: nohup scrapy crawl YOURSPIDER > spider.log 2>&1 &
  4. Type: ps aux | grep scrapy (check it's running)
  5. Type: tail -f spider.log (watch for 10 seconds)
  6. Press Ctrl+C (stop watching, spider keeps running)
  7. Type: exit (disconnect from server)

Congratulations! Your spider is now running on the server, and you can go home, sleep, or do anything else. It will keep working.

Tomorrow, connect again and check spider.log to see what happened!

That's all there is to it. nohup is simple, but it's incredibly powerful.

Happy scraping! 🕷️

Top comments (0)