At the beginning of my Engineering career, I worked in a place where I had a lot of freedom to implement and experiment any technology I found interesting. I tried many technologies like PHP, Java, EJBs, SOAP, Rest and JavaScript. This gave me a lot of perspective, but I lacked the guidance and mentoring from more experienced developers. One of the most problematic things I built was a login. I would like to share in this document things that I did in the past so you understand why it is problematic and how I would build them today.
Earlier Mistakes
My First PHP Login.
This is not a terrible option when using a single host, small project, but the biggest problem comes when we need to scale horizontally. Session variables exist only on the servers they are created. If you add more servers + Load Balancer, there is no guarantee that your requests go to the same server.
In terms of vulnerabilities, if somebody manages to read your session id, they can impersonate you, act on your behalf. This is not really different from other methods like JWT so it is important to set up SameSite cookies or CSRF tokens, but do you think I did that for my first login ? of course NOT!
My first Password storage.
If you are thinking on implementing a login please NEVER do what I am about to describe:
The first time I implemented a password, I was worried that somebody would find out the "actual" password. I wasn't actually thinking about using HTTP (instead of HTTPs), so my take on this was "encrypting" the password into MD5. Then the password was saved in MD5, but I was NOT doing anything different than just sending the password AS is. Let me explain the problems with this approach:
- Over HTTP an MD5 password can be read, and anybody can simply replicate the request with the same MD5
- MD5 was actually NOT encrypting, it was a hashing.
- There are databases all over the internet mapping MD5 and other hashed passwords available so finding an MD5 can actually be translated to an actual password.
My second Password storage.
Then I understood, whenever I send a password I need to make sure data goes actually encrypted, so I learned about HTTPS, we started using HTTPS on our projects. But I was still storing the password on MD5. If I had a data breach, The MD5 could be potentially mapped to a real password.
My first API authentication/authorization
I invented my own approach to authenticate users. I was thinking I was being brilliant but I was just under the Dunning-Kruger effect at times.
- User introduced a username, and password
- Updated the database with a random token, and an expiration date. provided that token to the client.
- Each authenticated request provided the token each requests:
- Search the database for that token and fetches the user and expiration date (First red-flag)
- Search the database for roles, permissions (Second red-flag)
- Authorize user action based on the read permission.
Given that the software had a really small number of users, this wasn't a problem but of course, this approach doesn't scale well.
What would I do differently today ?
First of all, I wouldn't try to implement this by myself, the wisest thing on Security is not re-inventing the wheel, many people have already work on this problems and faced bugs, vulnerabilities, and definitely this is a really common implementation on many systems, that requires to certify the user that is requesting something on their systems. But If I were limited to that decision and have to implement myself:
BCrypt hashing on password Storage
BCrypt is a hashing mechanism that uses a configurable cost factor and a random salt which is stored with the password to test during the login, this approach solves two issues:
- Makes it more expensive to test a password: The idea is that an attacker is not able to quickly test over different combinations. When hardware gets better we can increase the cost factor which can be enforced with a change password policy.
- The random salt helps the hashing to be less predictable, which solves the MD5 mapping issue. Users using the same password, will not have the same hash.
BONUS: Have a SECRET called "pepper", this adds an extra protection, consists of using a Secret keyword that in case somebody gets unauthorized access to the database, they still need the "pepper" to produce the hashing.
JWT Tokens
JWT is a signed token issued by an authentication service. Once credentials are validated at login, a payload is created including information like user_id, roles, scopes, etc, once signed, it is shared using base64URL encoding. Other services can read the token, verify the signature is valid, and handle the information included in the payload to authorize actions.
Compared to the first approach, this scales better as:
- It does not require a read on the database on each request, which makes it faster to verify.
- Requests on resource services are mostly stateless, instead of storing a SESSION variable we present a signed token saying who we are and we validate the sign. It is like showing your ID issued by DMV that certifies who you are but you are able to validate that it is not a fake ID.
One common JWT implementation comes with 2 different Tokens:
- ACCESS TOKEN: This is the token that will actually give you access but is short-lived (5-15min), If somebody obtains this token, the blast radius is reduced.
- REFRESH TOKEN: This is longer term lived (days/weeks), this will be used to retrieve newer ACCESS TOKENS and are revocable.
Finally I would add some considerations to avoid XSS and CSRF attacks, taking into account that:
- Local Storage could be a really bad place to store tokens. If some script is injected, JavaScript can potentially read your tokens and share with the attacker.
- Cookies need to be configured not to send them automatically from different sites.
Conclusion.
The biggest lesson I learned is that security isn't about implementing something clever. Most of my mistakes came from getting into a battle many other people already fought instead of understanding well-proven solutions. Today, If I need authentication, before I start implementing it, I would ask:
- Has somebody already solved this problem?
- What is the state of the art to solve this particular problem?
- What benefit am I getting from implementing it myself?
Top comments (0)