I ran my first spider on a server. It was scraping thousands of products. I started it, closed my laptop, and went home feeling proud.
The next morning, I checked the results. Nothing. The spider had stopped 5 minutes after I left.
I didn't know that closing my laptop killed the spider. I wasted an entire day. Then someone taught me about nohup, and everything changed.
Let me explain nohup in the simplest way possible.
The Problem (In Plain English)
What Normally Happens
You connect to your server:
ssh myserver.com
You run your spider:
scrapy crawl myspider
Your spider starts working. You see output on the screen. Everything looks good!
But then:
- You close your laptop
- Your internet disconnects
- You lose WiFi connection
And your spider DIES.
All progress lost. All data lost. Everything stops.
Why? Because when you disconnect from the server, everything you started gets killed.
It's like pulling the plug on a computer while it's working.
What is nohup? (The Simple Explanation)
nohup is a simple command that tells your server:
"Keep this program running, even if I disconnect."
That's it. That's all it does.
Think of it like setting your spider to "run on its own" mode.
How to Use nohup (Step by Step)
Step 1: Connect to Your Server
ssh username@yourserver.com
Replace username with your actual username and yourserver.com with your server address.
Step 2: Go to Your Project Folder
cd /home/username/my_scrapy_project
Step 3: Run Your Spider with nohup
Instead of:
scrapy crawl myspider
Type this:
nohup scrapy crawl myspider &
That's it! Notice two things:
-
nohupat the beginning -
&at the end
Step 4: You'll See This
nohup: ignoring input and appending output to 'nohup.out'
This means it's working!
Step 5: Disconnect Safely
Now you can:
- Close your laptop
- Turn off your computer
- Disconnect from WiFi
Your spider keeps running!
What Does Each Part Mean?
Let's break down the command:
nohup scrapy crawl myspider &
nohup = "no hangup" (keep running when I disconnect)
scrapy crawl myspider = your normal spider command
& = run in background (give me my terminal back)
Where Did My Output Go?
When you use nohup, output doesn't show on screen. It goes to a file.
Check the Output File
cat nohup.out
This shows everything your spider printed.
Watch It Live
tail -f nohup.out
This shows the output as it happens, like watching it live.
Press Ctrl+C to stop watching (spider keeps running).
How to Check if Your Spider is Running
After you start your spider with nohup, how do you know it's still working?
Check Running Processes
ps aux | grep scrapy
You'll see something like:
username 12345 5.2 2.1 python scrapy crawl myspider
If you see this, your spider is running!
The number (12345) is the process ID. Remember it.
How to Stop Your Spider
Find the Process ID
ps aux | grep scrapy
Look for the number in the second column. Let's say it's 12345.
Stop the Spider
kill 12345
Replace 12345 with your actual process ID.
Your spider stops gracefully (finishes what it's doing, then stops).
Force Stop (If Normal Stop Doesn't Work)
kill -9 12345
This stops it immediately.
Better Way: Custom Output File
Don't use nohup.out. Give it a better name:
nohup scrapy crawl myspider > myspider.log 2>&1 &
Now output goes to myspider.log instead of nohup.out.
What does 2>&1 mean?
It means "send errors to the same file as normal output." Don't worry about the details, just always include it.
View Your Custom Log
tail -f myspider.log
Complete Example (Copy and Paste)
Here's everything together:
# 1. Connect to server
ssh username@server.com
# 2. Go to project folder
cd /home/username/my_project
# 3. Start spider with nohup
nohup scrapy crawl myspider > myspider.log 2>&1 &
# 4. Check it's running
ps aux | grep scrapy
# 5. Watch the logs
tail -f myspider.log
# 6. Stop watching (Ctrl+C)
# Spider keeps running!
# 7. Disconnect
exit
Common Questions
Q: Can I run multiple spiders?
Yes! Just use different log files:
nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &
nohup scrapy crawl spider3 > spider3.log 2>&1 &
Q: How do I know when my spider finished?
Check if it's still running:
ps aux | grep scrapy
If you don't see it, it finished (or crashed).
Check the log to see what happened:
tail myspider.log
Q: What if my spider crashes?
Check the log file for errors:
cat myspider.log | grep ERROR
Q: Can I see how long it's been running?
ps aux | grep scrapy
Look at the TIME column.
Making It Even Easier (Create a Script)
Instead of typing the long command every time, create a simple script.
Create the Script
nano run_spider.sh
Type this inside:
#!/bin/bash
nohup scrapy crawl myspider > myspider.log 2>&1 &
echo "Spider started! Check log with: tail -f myspider.log"
Save and exit (Ctrl+X, then Y, then Enter).
Make It Executable
chmod +x run_spider.sh
Use It
./run_spider.sh
Done! Much easier.
Making a Stop Script
Create an easy way to stop your spider.
Create the Script
nano stop_spider.sh
Type this inside:
#!/bin/bash
pkill -f "scrapy crawl myspider"
echo "Spider stopped!"
Save and exit.
Make It Executable
chmod +x stop_spider.sh
Use It
./stop_spider.sh
Your spider stops immediately.
Real Beginner Example
Let's say you have a spider that scrapes products from a website.
Your Situation
- Spider name:
products - Server:
myserver.com - Username:
john - Project folder:
/home/john/scraper
What You Do
Step 1: Connect to server
ssh john@myserver.com
Step 2: Go to project
cd /home/john/scraper
Step 3: Start spider
nohup scrapy crawl products > products.log 2>&1 &
Step 4: Verify it's running
ps aux | grep products
You see:
john 15678 4.5 2.3 python scrapy crawl products
Great! It's running.
Step 5: Watch logs for a minute
tail -f products.log
You see your spider working. Press Ctrl+C.
Step 6: Disconnect
exit
Done! Your spider keeps running.
Checking on Your Spider Later
You come back the next day. Is it still running?
Connect Again
ssh john@myserver.com
Check if Running
ps aux | grep products
If you see it, it's still running!
If you don't see it, it finished. Check the log:
tail products.log
Look at the last lines to see if it finished successfully.
Troubleshooting
Problem: Spider Stops Immediately
Check the log:
cat myspider.log
Common reasons:
- Python error in your code
- Missing dependency
- Wrong spider name
Problem: Can't Find Log File
Check what files you have:
ls -la
Look for nohup.out or your custom log file.
Problem: Too Many Logs
Delete old logs:
rm *.log
Or keep only recent ones:
rm old_spider.log
Simple Checklist
Before you disconnect, check:
- [ ] Used
nohupat the beginning - [ ] Used
&at the end - [ ] Checked spider is running (
ps aux | grep scrapy) - [ ] Know where log file is
- [ ] Tested watching logs (
tail -f logfile.log)
If all checked, you're good to disconnect!
What NOT to Do
Don't Forget the &
# WRONG
nohup scrapy crawl myspider
# RIGHT
nohup scrapy crawl myspider &
Without &, your terminal gets stuck.
Don't Close Terminal While Loading
Wait until you see:
nohup: ignoring input and appending output to 'nohup.out'
Then you can disconnect.
Don't Use Same Log File for Different Spiders
# WRONG (both use same log)
nohup scrapy crawl spider1 > log.txt 2>&1 &
nohup scrapy crawl spider2 > log.txt 2>&1 &
# RIGHT (different logs)
nohup scrapy crawl spider1 > spider1.log 2>&1 &
nohup scrapy crawl spider2 > spider2.log 2>&1 &
Quick Command Reference
Start spider:
nohup scrapy crawl myspider > myspider.log 2>&1 &
Check if running:
ps aux | grep scrapy
Watch logs:
tail -f myspider.log
Stop spider:
pkill -f "scrapy crawl myspider"
Check last 20 lines of log:
tail -n 20 myspider.log
Summary
What is nohup?
A command that keeps your spider running after you disconnect.
How to use it:
nohup scrapy crawl myspider > myspider.log 2>&1 &
Remember:
-
nohup= keep running -
> myspider.log= save output to file -
2>&1= save errors too -
&= run in background
To check if running:
ps aux | grep scrapy
To stop:
pkill -f "scrapy crawl myspider"
That's all you need to know!
You don't need to understand all the technical details. Just copy the commands, and your spider will keep running even after you disconnect.
Your First Time (Practice)
Let's practice together:
- Connect to your server
- Go to your project folder
- Type:
nohup scrapy crawl YOURSPIDER > spider.log 2>&1 & - Type:
ps aux | grep scrapy(check it's running) - Type:
tail -f spider.log(watch for 10 seconds) - Press Ctrl+C (stop watching, spider keeps running)
- Type:
exit(disconnect from server)
Congratulations! Your spider is now running on the server, and you can go home, sleep, or do anything else. It will keep working.
Tomorrow, connect again and check spider.log to see what happened!
That's all there is to it. nohup is simple, but it's incredibly powerful.
Happy scraping! 🕷️
Top comments (0)