DEV Community

Lawrence
Lawrence

Posted on

Improving Form and AJAX Security

This post is first and foremost about form security and fundamental considerations that apply to any Web application platform. The first section will cover these concepts and the second section will use PHP to demonstrate and explain applied use. Developers that will benefit from this the most will need to already have experience or knowledge in server-side processing of HTML forms.

NOTE: I want to make it clear, early on, that regardless of any filtration methods used (including the ones in this post), data should always be escaped for external resources like SQL databases. Input filtering is not an alternative to things like parameterized or escaped SQL queries!

Principals of Form Security

At some point in a development career, required skills will need to evolve from the basics of efficiency, performance, usability, readability and re-usability, to a collection of skills that also include security. Some might believe that this is part of the starting skill-set, and maybe that's true for someone who goes through 4 years of college before they write their first line of code, but in most cases, there is a much more gradual ingestion of not just how development works, but how the Web works, how malicious users carry out attacks, and what types of things Web Application Scanners (WAS) are going to grade an application on.

Something that I come across more often than anything else in other developers' code is a lack of consideration for the most basic and standard ways that malicious users will begin their attacks on a website. These are script and bot level methods that can make even very small web applications targets because the barrier to entry (effort-level) is so low: HTTP requests.

The Nature of an HTTP Request

This all starts at some moderate level understanding of the HTTP protocol. There are multiple ways to send a request to a Web server; the majority of requests use the GET command, which is most often used to request a response for a specific URL on a website and may often include URL parameters from a query string to provide some additional instructions to the resource it is requesting. A Web browser builds this request automatically for the user. Example:

GET / HTTP/1.1
Host: dev.to

This can be played with manually using something like a telnet client or PuTTY in Raw mode by connecting to dev.to on port 80. Two line-breaks indicates to the server that the request is complete and ready to be processed (everything after the first line is for adding HTTP headers).

The response from the server looks like:

HTTP/1.1 301 Moved Permanently
Server: Varnish
Retry-After: 0
Location: https://dev.to/
Content-Length: 0
Accept-Ranges: bytes
Date: Sat, 17 Aug 2019 17:46:15 GMT
Via: 1.1 varnish
Connection: close
X-Served-By: cache-lax8646-LAX
X-Cache: HIT
X-Cache-Hits: 0
X-Timer: S1566063976.885350,VS0,VE0

A form or AJAX request that uses the GET method will automatically fill in a query string to the URL path in the HTTP request. The POST method fills the data into an HTTP header, but ultimately works the same (with some exceptions not covered here like posting files).

The reason I'm briefly explaining this is that it's important to understand that what gets posted to a Web server is a lot less complicated than what it may seem to a developer that does not understand the protocol at this level. So, when it comes to things like form data, a malicious user can do so much more than just put funky values into the form provided on a Web page. A deeper dive into the protocol can be found here: An Introduction to HTTP Basics.

Efficacy of Client-Side Validation

Even basic Web developers most likely understand that any client-side validation in place offers no security nor data integrity to a form handler, and for those that don't, the brief explanation in the previous section should make it clear that client-side validation is only useful for making a Web application more user-friendly (or reducing server load, briefly covered later).

This means that any time data is pulled from a query string or form post data, the developer must take the precaution of validating that data carefully. This includes AJAX requests that are invisible to normal users.

Server-side Processing of User-Generated Content

This is the mete of the matter. At a fundamental level, developers should always take consideration when retrieving data from GET (query string) or POST (form data). Here are some starting considerations for any variable:

  • Have I checked for the absence of the variable?
  • Have I confirmed if the variable is required?
  • Have I confirmed that the variable is the right data type?
  • Have I confirmed that the variable is within minimum or maximum length requirements?
  • Have I confirmed that malicious content has been removed?

The last bullet is my favorite, and I get this a lot:

"I'm properly escaping SQL (or blank external resource) parameters, so I don't have to worry about that."

This is a misguided assumption.

There are multiple reasons why this is misguided; the first (and most obvious) of which is that this only assumes protection of the database (or other external resource) is needed. Here is a more complete list of reasons of why GET and POST parameters should be filtered at the very start of capturing data:

  • Data that makes it into my database (or other external resource) may have another purpose such as retrieval and display in another area. Failing to filter things like HTML tags, server-side code escapes, HTTP or Email headers, can end up allowing a malicious user to complete their attack later.
  • Data that makes it into my database may be used differently in the future than it is used now. On initial build the captured data is only stored in the database, but it is possible that functionality is added later that uses the data in a way that makes it unsafe.
  • Data that is captured by my code may be extended by another developer in the future, in which case, there's no guarantee that developer is going to properly escape parameters. On initial build, my code isn't saving to a database, but a developer may later decide to store that data and there's no guarantee the developer will properly escape it.
  • My code may be re-used by another developer as a copy/paste base for new functionality in the future. This is common when working with junior developers who may not be at the level of understanding needed for protecting an external resource. Having examples available to juniors is a great way to set them on the right track early.
  • I work on a team where multiple developers handle different tasks, so another developer who is handling integration of the data captured by my code, may incorrectly assume it is safe.
  • Zero-day vulnerabilities are becoming more and more common, and these can pop-up at many different levels even in a server-side framework or Web server.

Most of the above considerations should make sense, but the last bullet is another big assumption, which is that the rest of the code is safe from whatever that data may be. Certainly, that is likely to be the intended way things work, but if a vulnerability in the server-side framework or even the Web server itself (apache, nginx, iis, etc) is exposed, simply working with the captured data in memory could pose an unforeseen risk.

REMINDER: Regardless of any filtering done prior, developers should always escape content for external resources. Input filtering is not an alternative to things like parameterized SQL queries!

A Note About Web Application Firewalls (WAF)

If a Web application sits behind a proper WAF, it can be assumed that a great deal of malicious content will never make it the actual Web application, so I'll concede that in the case of having 100% faith that the Web application will always be behind a proper WAF, filtering malicious content becomes a less likely to be needed endeavor. However, a developer should make sure they understand WAF technology before making this assumption. For instance, if a developer's idea of a proper WAF is one that runs within the Web application that needs to be protected (ie: a plugin for the platform they are on), that developer needs to do more research. Just like traditional firewalls, it is never ideal to have an entity responsible for protecting itself.

This concludes the conceptual nature of handling form data because the applied methods for actually doing this can vary greatly by platform. For instance, some frameworks are going to either pre-validate GET and POST data before it even hits the developer's code, others may provide convenient wrappers or functions for safe/filtered retrieval, and others may require custom filtering mechanisms entirely. The next section will cover examples for dealing with this in PHP.

Filtering Input in PHP

I selected PHP to use an example because it covers a variety of applicable needs and methods for filtering:

  • NEED: PHP is the most widely used server-side programming language. Sources: Current, Historical
  • NEED: The standard method for retrieving GET and POST data offers no base filtering or protection.
  • NEED: Some PHP-based CMS platforms either do not offer, or allow bypass of, platform-provided input filtering mechanisms.
  • METHOD: PHP offers the filter_input function that comes with a great deal of filtering mechanisms that make the process easy and consistent even with vanilla PHP.

Source Form

For all of the coming server-side examples, I'll be assuming that the source of the data is coming from the following HTML form. The goal in these examples is purely to focus on filtering data server-side, so anything related to UX, client-side validation, etc., is completely removed.

<form>
    <input type="text" name="name">
    <input type="text" name="email">
    <input type="text" name="age">
    <input type="text" name="postalcode">
    <select name="subject">
        <option value="general">General</option>
        <option value="other">Other</option>
    </select>
    <input type="submit" value="Submit">
</form>

Retrieving/Capturing Form Data (Traditional)

For those that aren't familiar with PHP but still interested in the examples to follow, I have included a quick review of the traditional approach for retrieving content that is sent in the HTTP request to the server. This is done by accessing the $_GET and $_POST global variables prepared by PHP prior to any developer code execution. These variables are queried using array syntax.

For form or AJAX method GET:

$name = $_GET['name'];
$email = $_GET['email';
$age = $_GET['age'];
$postalcode = $_GET['postalcode'];
$subject = $_GET['subject'];

For form or AJAX method 'POST':

$name = $_POST['name'];
$email = $_POST['email';
$age = $_POST['age'];
$postalcode = $_POST['postalcode'];
$subject = $_POST['subject'];

In both cases, the code above does nothing to prepare the data for safe nor practical use. There is no null check (input not sent), no validation, and no filtering for malicious code. Without even doing anything with these variables, this code can throw warnings or exceptions simply by submitting the form without one of the variables, which can expose other shortcomings of the form depending on the configuration of the Web application.

Enter filter_input

A lot can be learned simply by reviewing the PHP manual entry for filter_input, https://www.php.net/manual/en/function.filter-input.php. It is an extension of another useful PHP function, filter_var. It accepts a minimum of two parameters, but when used properly, it should always have at least three.

This function allows the developer to indicate where the variable is being retrieved (INPUT_POST, INPUT_GET, INPUT_COOKIE -- not covered here but do not use for session or server, see $_SESSION and getenv() first), and the variable name they want to retrieve a value for. The final property allows them to specify their base filtering mechanism. INPUT_COOKIE is incredibly important and very often overlooked, but I won't be covering it in this post (for the most part, all of the same rules apply to cookies that apply to GET and POST). Here are some basic examples to start using our form (above) with the POST method.

$name = filter_input(INPUT_POST, 'name', FILTER_SANITIZE_STRING);
$email = filter_input(INPUT_POST, 'email', FILTER_VALIDATE_EMAIL);
$age = filter_input(INPUT_POST, 'age', FILTER_SANITIZE_NUMBER_INT);
$postalcode = filter_input(INPUT_POST, 'postalcode', FILTER_SANITIZE_STRING);
$subject = filter_input(INPUT_POST, 'subject', FILTER_SANITIZE_STRING);

These are pretty basic filtering mechanisms; there are great filters for handling things like required conditions and checkbox inputs and automatically dumping those into an array, but it's going a bit beyond what I'll cover here since they all have intricacies to applied use.

Each one of these variables will return one of three different things:

  • NULL: If the input was not posted to the handler, NULL is returned.
  • FALSE: If the input did not meet the provided filter's requirements, false is returned (invalid).
  • MIXED: If the value met the filter requirements, it will return the value that the filter provides. PHP is a loosely typed system, however, there are cases where making sure the type needed is what was provided. So, in the case of FILTER_SANITIZE_NUMBER_INT, an integer value will be returned, so if that value is 0 I can expect that $var === 0 is true, but $var === false is false -- this is important for testing against the FALSE and NULL conditions.

How Malicious Data is Handled

Malicious data is treated differently depending on the filter in use. In a way, it's not really malicious data that is being filtered, rather it's filtering content that is unwanted by the filter. In other words, FILTER_SANITIZE_STRING isn't necessarily looking to filter malicious code, but it is anticipating a string that a user would type in something like a message. Special characters that are used for things like code or SQL are stripped out, so in this case the result is not a value of FALSE, but instead the provided string minus any dangerous characters. This is an important consideration for capturing something like a password that is very likely to have special characters in it.

In the case of something like FILTER_SANITIZE_NUMBER_INT, malicious code would return FALSE since anything but an integer would be invalid.

Handling Filtered Results and Further Processing

How the data is handled after it's filtered is really up to the developer, but here's a basic idea for how it can be treated properly:

$name = filter_input(INPUT_POST, 'name', FILTER_SANITIZE_STRING);
$isValid = false;
$errMessage = '';
if ($name === null) {
    $errMessage = 'A value for Name was not found.';
} else if ($name === false) {
    $errMessage = 'An invalid value for Name was detected.';
} else {
    if (strlen($name) === 0) {
        $errMessage = 'Name is required.';
    } else {
        $isValid = true;
    }
}

In the above, the FALSE condition is incredibly unlikely to happen. Even if nothing but malicious symbols are found, it would return an empty string.

$email = filter_input(INPUT_POST, 'email', FILTER_VALIDATE_EMAIL);
$isValid = false;
$errMessage = '';
if ($email === null) {
    $errMessage = 'A value for Email was not found.';
} else if ($email === false) {
    $errMessage = 'Email is invalid.';
} else {
    $isValid = true;
}

In the above, FALSE is returned if the value passed for email is not a valid email address.

$age = filter_input(INPUT_POST, 'age', FILTER_SANITIZE_NUMBER_INT);
$isValid = false;
$errMessage = '';
if ($age === null) {
    $errMessage = 'A value for Age was not found.';
} else if ($age === false) {
    $errMessage = 'Age must be a number between 1 and 100.';
} else {
    if ($age > 0 && $age <= 100) {
        $isValid = true;
    } else {
        $errMessage = 'Age must be a number between 1 and 100.';
    }
}

In the final else, it is already confirmed that $age is an integer. All that's left to do is make sure it falls within the required range.

$postalcode = filter_input(INPUT_POST, 'postalcode', FILTER_SANITIZE_STRING);
$isValid = false;
$errMessage = '';
if ($postalcode === null) {
    $errMessage = 'A value for Postal Code was not found.';
} else if ($postalcode === false) {
    $errMessage = 'An invalid value for Postal Code was detected.';
} else {
    if (!preg_match('/^[0-9]{5,5}$/', $postalCode)) {
        $errMessage = 'Postal code must be a 5-digit value.';
    } else {
        $isValid = true;
    }
}

In the above example, it might seem like using FILTER_SANITIZE_NUMBER_INT would be a good idea since I'm expecting numbers only, but this is not true since some postal codes may start with a leading 0.

$subject = filter_input(INPUT_POST, 'subject', FILTER_SANITIZE_STRING);
$isValid = false;
$errMessage = '';
switch ($subject) {
    case 'general':
    case 'other':
        $isValid = true;
        break;
    default:
        $errMessage = 'An invalid value for Subject was detected.';
        break;
}

In the example above, I only gave the form two options for subject. This means that I know if it's not one of those two options, it's wrong, making it a bit less code since checking for null or false would be for handling malicious usage of the form handler.

Some Final Notes on filter_input

There are some things to consider about the filtering system. For instance, in many cases when dealing with string content, the values are automatically trimmed of whitespace from the beginning and end of the value. In general, this is a good thing, but it could also be a problem if the code is expecting there to be spaces (or if it needs to be able to detect these scenarios perhaps for user messaging).

Some of the base filters provided can create challenges in determining the type of problem found with a value. It can take some applied-use experience to get a feel for the conditions that can be encountered and how to counter those (the options parameter can add quite a bit of additional control).

The final (fourth) parameter for filter_input accepts bitwise disjunctions of flags. This can greatly increase the power of the filters, but will require some significant additional reading to leverage fully.

Note the use of type comparison especially for false. Without a type comparison (===), values such as an empty string, empty array, a string value of false, or 0 would also be true when using a == operator without type comparison.

NULL is only returned from filter_input when the parameter does not exist. An empty form field will pass an empty string and not trigger the NULL value, although this can be encountered when dealing with checkboxes (an easy way to get around this when using ajax is to always have a value sent even if the box is not checked). Setting checkboxes aside, this NULL condition is only to validate that the HTML form contains the field and that a malicious user is not attempting to post to the form without that parameter. In most cases, adding a specific condition for NULL is not really needed so long as it halts usage of that variable. A quicker approach would be to use the condition $value === null || $value === false to return an "Invalid" response is perfectly acceptable.

I didn't cover it here, but I also recommend that developers look into cookie filtering. Cookie data is posted in HTTP headers, so malicious users can fill in whatever content they want for those. It is important that whenever retrieving cookie data, that it is run through a filtering system before being used.

Notes About AJAX

I quickly want to point out that all of the rules above for handling POST and GET data used in the examples above, also apply to AJAX requests including AJAX methods that do not require any user interaction. The handlers for your AJAX requests can be manipulated in the same way that a form handler can be manipulated.

Notes About UX

There are numerous posts on dev.to about client-side validation, so the only thing I'll add for consideration is that if server-side validation is complete, it can be leveraged for client-side validation. It's less than ideal for traditional forms that actually post-back to the server, but for AJAX implementations, there are many ways to have error messages return to the client-side code to output the results of the form handler error messages. This approach should be balanced with the performance capabilities of the server and traffic/usage of the form, but for low to medium traffic implementations, having an extensive layer of client-side validation will result in more work both in development and testing (depending on the approach). It is also a lot easier to make sure validation is consistent if it's all done in one place.

Quick Summary / tl;dr

Developers should always consider form handlers to be an easy point of entry for malicious users and never rely solely on client-side scripting to protect the security of the application or integrity of the data.

Thank you for reading

Top comments (0)