Sarah Katz

Posted on Sep 14, 2020

In Which Sarah Learns Web Security: Part Two, Input and Output

#security #cybersecurity

In the previous article, I discussed how and why I started learning about web security, as well as some general security principles I learned from the LinkedIn Learning course that I completed. In this article, I want to review the next section of the course, from which I learned some important principles for dealing with input and output. When input enters your app, you want to follow certain steps to ensure that the input is appropriate and that no security risks or vulnerabilities are being introduced with this input. Similarly, when displaying output on your app, you want to ensure that the output is safe (and will not cause unexpected side effects in the user's browser) and that you are only displaying output that is appropriate for the user to see.

Let's go into a little more detail about how this course recommends regulating your inputs and outputs.

Regulating Requests

Your website should be selective about which requests it accepts. The four most common types of requests are GET, POST, PUT, and DELETE, and most websites generally accept GET and POST requests. POST requests should only be accepted when a form is being submitted (for example, logging in or adding a comment on a blog post). Most (or often all) other requests should be GET requests. It's also important to inspect the format of the request - is it the format you are expecting? If not, you should not be sending a successful response.

Regulating requests is not itself enough to protect against attacks, but it's an important first layer of defense in depth (which we discussed in the last article).

Validating Input

In a situation where a user is submitting data, we can't just assume that all of the input is acceptable - we must validate the input. "Acceptable" can include any number of criteria, from the presence of the data to its uniqueness. Some common qualities to validate include the presence length of the data, the type of data (ie string, number, file type, etc), and the format of the data (which often involves using regular expressions to determine if the data matches an expected pattern). If you're expecting data to be a member of a particular set of data (for example, is it one of the values on your allowed data list), you should be validating that as well.

Sanitize Data

Validating data is not the last step - once data has passed validation, it should then be sanitized to remove anything potentially harmful. Many common attacks start with malicious input (something which will be covered in part 3 of this series), and it's important to sanitize all data because you don't know when the data will contain something harmful.

The instructor in this course recommended typecasting as a good first step in sanitizing data. Typecasting refers to ensuring that the data you received matches the expected type. Each programming language handles data types differently, and its best to typecast your data yourself to ensure that it is handled as expected. Typecasting should be done early so that it is not forgotten.

What sanitization you apply next will depend on where the data is going. Different languages and data pathways have different sanitization processes (because what is harmful differs across languages). Many languages have special characters as part of the language that can be used to insert harmful code (for example, HTML tags use <>, so an attacker could send input that starts with > and then add their own HTML). Two ways to sanitize data to ensure that these special characters cannot cause harm are encoding and escaping the characters. Encoding replaces harmful characters with harmless equivalents (for example, < for < in HTML input). Escaping adds an escape character before the powerful character, which renders the character harmless. You can get functions to encode and escape data from either the language's built-in function or a well-vetted library (you may be tempted to write these functions yourself, but it's more secure to use the existing well-tested functions).

Label Variables

One of the recommendations made in this course is keeping track of which data has been sanitized and which has not. I wasn't aware of this consideration before watching this video, but it makes sense - you don't want to display unsanitized data in a place where it could cause harm just because you mixed up your data. The instructor recommends labeling variables to indicate whether or not the data has been sanitized. For example, data that comes in from an input can be labeled as rawVariableName, and once it has been sanitized, it can be assigned to a variable named safeVariableName.

Keep Code Private

The code for your website should never be made publicly available. If hackers can see your code, they may be able to determine what security defenses you're using (and figure out how to circumvent them). It is important to separate your app into a public directory that is accessible to the browser and a private directory that handles the code behind the scenes. Most web frameworks do this, and your server should be configured to serve files from the public directory.

Keep Credentials Private

We all know that it's a bad idea to share your password or write it somewhere where it is available to others. This applies to applications as well - if your app uses any credentials or access keys (for example, to access APIs, code repositories, or databases), these credentials should not be put directly in your code. The best practice is to separate all of your configuration from your code. Store these access keys in a separate file and use constants to refer to them (for example, you may want to configure a variable for your DATABASE_URL and only reference the variable name in your app).

Credentials should never be stored in version control (such as git or svn). Once something is in version control, it's hard to remove all traces of it, so if you currently have credentials in your version control, once you've removed those credentials (and excluded any files that contain credentials), the instructor of this course recommends changing those credentials.

One last recommendation that this course made was to use an SSH key to handle credentials for connecting to a remote server. An SSH key has two parts, a public key and a private key, and only when the keys match is access granted. Your public key can be stored in plain text in your code because it will only work properly when it interacts with the appropriate private key (which is not stored in your code).

Keep Error Messages Vague

When I'm working on developing a new app (or a new feature or a bug fix), I like having descriptive error messages. The more I know about what goes wrong, the easier it will be for me to identify a way to fix it. That is not at all a security concern - as long as those descriptive error messages don't make it into a production app. Developers aren't the only people who like descriptive error messages - hackers like them too. When they encounter an error as part of an attack, hackers can use details from the error messages to help them figure out what attack to try next. Best practice is to return 404 or 500 errors in production (with no more information than that and possibly a generic "something went wrong" error message) and save displaying detailed error logs for a development environment.

Smart Logging

If a hacker attempts to penetrate your app, it's useful to have detailed logs to help you identify what happened and determine where there may be a vulnerability that needs to be fixed. However, too much logging can become a liability (see above), so it's important to be smart about what you log. A good general principle is to log any errors that occur (including as many details as you can), any sensitive actions (such as logging in or changing user permissions), and any suspicious activity.

When logging an event, some details you may want to log include the date and time of the event, whatever information you have about the source of the event (IP address, user account info), the URL where the event occurred (including request parameters and cookies if applicable), and a backtrace (if your programming language of choice allows it). Because they contain a lot of sensitive data, these logs should be kept in a restricted area, such as a database or a private folder.

Knowing general security principles and how to securely handle your input and output is important, but it's not enough to secure your application. To properly develop a threat model and act on it, you need to know what kind of attacks are out there and how to protect against each attack. In the next post in this series, I will share some common attacks covered in this course and the instructor's suggestions for how to protect against these attacks.

DEV Community

In Which Sarah Learns Web Security: Part Two, Input and Output

Regulating Requests

Validating Input

Sanitize Data

Label Variables

Keep Code Private

Keep Credentials Private

Keep Error Messages Vague

Smart Logging

Top comments (0)

Read next

Network Security, CDN Technologies and Performance Optimization

Access to Google Cloud Virtual Machine through SSH

Slowloris Attack: How it Works, Identify and Prevent

How to mitigate a DDoS attack on a small server