DEV Community

Scrapfly for Scrapfly

Posted on • Originally published at scrapfly.io on

What is HTTP 422 Error? (Unprocessable Entity)

What is HTTP 422 Error? (Unprocessable Entity)

HTTP error 422 Unprocessable Entity occurs when the server understands the request but finds the content syntactically correct yet semantically invalid. Essentially, the data you’re submitting may be well-formed, but something about it is incorrect or incomplete, making it impossible for the server to process.

What are HTTP 422 Error Causes?

The primary cause of a 422 error code is sending data that, while properly formatted, is invalid according to the server's expectations. This often happens with POST requests when submitting form data, JSON, or XML that contain formatting errors.

For example, submitting an invalid or even a well-formed JSON document that lacks required fields or contains invalid values could trigger a 422 error.

To avoid this error, it’s important to ensure that the content being sent matches the server’s requirements, such as validation rules, data types, or required fields.

Practical Example

To demonstrate how a server might return an HTTP 422 status code, let's build a simple Flask API with a /submit endpoint that accepts POST requests. This example mimics submitting data to an API and returns a 422 error when the submitted data does not meet the server's validation rules (e.g., invalid email format).

from flask import Flask, jsonify, request

app = Flask( __name__ )

# A simple validation function to check for a valid email format
def is_valid_email(email):
    return "@" in email and "." in email

@app.route("/submit", methods=["POST"])
def submit():
    data = request.json
    email = data.get("email")

    # Check if email is provided and valid
    if not email or not is_valid_email(email):
        # Unprocessable Entity: Invalid email format
        return jsonify({"error": "Invalid email format."}), 422

    # Otherwise, process the request
    return jsonify({"message": "Data submitted successfully."}), 201

if __name__ == " __main__":
    app.run(debug=True)

Enter fullscreen mode Exit fullscreen mode

In the example above, we simulate a /submit endpoint that accepts POST requests containing JSON data. The server expects a valid email address in the request. If the email is missing or does not meet the simple validation check (containing "@" and "."), the server returns a 422 error, indicating the request is well-formed but semantically incorrect (i.e., invalid email). If the email is valid, the server processes the request and returns a success message.

We can test this server with a http client like python's httpx:

 import httpx

# Test successful submission with a valid email
response = httpx.post("http://127.0.0.1:5000/submit", json={"email": "valid@example.com"})
print(f"Successful Submission: {response.status_code}, {response.json()}")

# Test failed submission with an invalid email
response = httpx.post("http://127.0.0.1:5000/submit", json={"email": "invalid-email"})
print(f"Failed Submission: {response.status_code}, {response.json()}")
Enter fullscreen mode Exit fullscreen mode

422 in Web Scraping

In web scraping 422 http code is usually encountered when an error is made in POST or PUT data generation. So, ensure that posted data is of valid format be it JSON, HTML or XML to avoid this error.

Furthermore, as scrapers don't know exactly how the server reads the received data, it can be difficult to debug the exact cause. For this, Browser Developer Tools can be used to inspect exactly how a website formats the data like symbol escaping, indentation etc all of which can play a part in data processing. Replicating the exact behavior will decrease chances of encountering http status 422 while scraping.

The 422 error could also mean that the server is blocking your requests and deliberately returning a 422 status code to signal that you are not allowed to access the resource. If you're receiving this status code on a GET request, then that could be a sign of blocking.

Power Up with Scrapfly

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

What is HTTP 422 Error? (Unprocessable Entity)

It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!

Summary

HTTP 422 errors typically result from submitting well-formed but invalid data, often in POST requests. While it's unlikely that 422 errors are used to block scrapers, it’s always best to test with rotating proxies if the issue persists. Using Scrapfly’s advanced tools, you can bypass these potential blocks and ensure your tasks continue without disruption.

Top comments (0)