First of all don't ask me why π
Project
InstaStalker = Python + Docker + Lambda + SNS (and touch of EventBridge)
InstaStalker: Your daily stalking assistant.
The main goal is to scrape Instagram profiles and know if a profile is public or private. On the other hand, this is a tracking application, and we assume that the profile we want to follow is private. Therefore, Lambda triggers SNS to send an email if the profile visibility status has changed to visible (public). The whole scenario works without login (at least without my credentials).
Scraping
Scraping was completely a trial and error stage. First I started using beautifulsoup4 to get the page. In the beginning, it worked. I got the whole page source (HTML) and parsed the βis_privateβ key using regex. Voila! However, it worked for a short time. Instagram allows you to see a few pages without login per IP. So this is why it worked for a short time. Also, Instagram changes many things frequently. When you reach the limit bs4 gets the login pageβs HTML, meaning there is no is_private key.
V1 Code
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request
username = "amazonwebservices"
req = Request("https://instagram.com/" + str(user_name),
headers={"User-Agent": "Mozilla/5.0"})
html = urlopen(req).read().decode("utf-8")
soup = BeautifulSoup(html, features="lxml")
scripts = soup.find_all("script") # script block contains user info
...
# then regex part and convert key-values into pandas df
As I mentioned, I needed to find a new way due to IP restriction and frequent changes. I was Googling and found a way to get data from Instagram API. It was much better than parsing HTML.
V2 Code
username = "amazonwebservices"
import json
import requests
response = requests.get("https://i.instagram.com/api/v1/users/web_profile_info/?username={0}".format(username), headers={"x-ig-app-id": "936619743392459"})
data = json.loads(response.content)
print(data['data']['user']['is_private'])
# False
Still, I couldn't find a solution for login. Although the API had retrieved the user information after a few requests, it was still necessary to log in. I tried some try & except blocks and added headers (user agents, encoding, schema, etc.), but they never worked out. Also, I tried instaloader but got the same error; log in. I saw some Proxy solutions, but these were paid services, and I wouldn't pay for them. Then I discovered RapidAPI. It is an API platform and provides Proxy. Finally, I found what I was looking for. I tried a few and decided to move forward with one of them.
Code V3 (latest for now π
)
def rapid_checker(user_name: str):
url = "http://api_url/{0}".format(user_name)
headers = {
"X-RapidAPI-Key": "api_key",
"X-RapidAPI-Host": "api_host",
}
response = requests.request("GET", url, headers=headers)
if response.status_code == 200:
response = json.loads(response.text)
return {
"username": response["username"],
"fullname": response["full_name"],
"bio": response["biography"],
"is_private": response["is_private"],
"follower_count": int(response["edge_followed_by"]["count"]),
"following_count": int(response["edge_follow"]["count"]),
"total_post": int(response["edge_owner_to_timeline_media"]["count"]),
"profile_pic": response["profile_pic_url_hd"],
}
else:
print("Couldn't retrieve data for {0}. Reason: {1}".format(user_name, json.loads(response.text)["message"]))
return None
Lambda Deployment
Finally, I can gather data without any login errors. However, there is a daily limit for the APIs, but it is a tolerable thing.
Then I passed through the automatization stage using Lambda. I create a lambda_handler.py
using the gathering function and add SNS codes for the notifications. Whole script ππ»
import boto3
import os
import requests
import json
from json import JSONDecodeError
client = boto3.client("sns", region_name="region")
def rapid_checker(user_name: str):
# same as the code above ππ».
def lambda_handler(event, context):
try:
topic_arn = "sns_arn"
try:
username = event["user_name"]
except:
username = "amazonwebservices"
print("Username:", username)
result = rapid_checker(username)
if result is not None:
print(result)
if result["is_private"] != True:
client.publish(
TopicArn=topic_arn,
Subject="π¨π¨π¨",
Message="β '{0}' profile is now public! \n \
Follower Count: {1} \n \
Following Count: {2} \n \
Total Post: {3} \n \
Profile Pic: {4}".format(
result["username"],
result["follower_count"],
result["following_count"],
result["total_post"],
result["profile_pic"],
subtype="html"),)
print("Message sent to SNS!")
print("Checked for:", username, "and the visibility status is:", result["is_private"])
return result
else:
print("Couldn't retrieve data for {0}. Reason: API Limit Reached!".format(username))
except JSONDecodeError as error:
print(error)
return "Couldn't retrieve data for {0}".format(username)
Next I created a Dockerfile and pushed it to ECR. Almost ready!
FROM public.ecr.aws/lambda/python:3.8
RUN pip install requests
COPY lambda_function.py .
CMD [ "lambda_function.lambda_handler" ]
Then I created a Lambda function using a Docker image and tested it. Everything works perfectly. Finally, I created an EventBridge Rule for the scheduled runs.
This is how you can build your "homemade" Instagram Stalking application using AWS π
Lastly don't ask me why π
(All solutions arise from a need...)
Top comments (0)