Behind the scenes of our security incident management process

#insideatlassian #it #enterprise #incidentmanagement

On the security team, we don’t manage any Atlassian products like other Atlassian teams do. Our main product is trust, and that’s a job that’s never finished.

To me, security is more of a mindset; one of constant diligence, continuous improvement, and seeking out ways to innovate.

Sometimes security teams can act like more of a blocker than a facilitator, holding up products with burdensome gate checks. But my team strives to assist in the creation and management of secure software, making sure that teams can collaborate and get their work done in the safest way possible. When great products get shipped in a safe and secure way, we know we’ve done our job. To that end, there are three principles that guide our work and inform our action plans and responses to security incidents:

Our guiding principles

1. Be open and available

No one benefits from a security team that works in the shadows or doesn’t share information. We know that if we want to be involved in the development of safe software we have to be approachable and available to everyone: engineering, support, partners, and customers. And we encourage anyone with a concern to report it to us, no matter the severity of the issue or their role.

2. Be consistent in word and action

Being available to people wouldn’t mean much if we weren’t also consistent in how we approached our work. In order to preserve and enforce policies and procedures, we all need to be on the same page and act in predictable ways. We document how we work and then we share this information across our internal collaboration tools and with our broader customer and partner community. We also publish security stats, data, policies, and procedures on our public trust website.

3. Always seek better ways

Consistency doesn’t benefit anyone if we’re consistently, well, wrong. So we’re always striving to improve our monitoring, our tooling, our procedures, and to keep ahead of potential risks. We have a team whose main purpose is to study vulnerabilities in the wild and then come up with ways to fortify our systems against them so we’re not losing ground by constantly being in reaction mode.

Meet our team

Security is built into all of our products and our processes which are shared across the company and the community. There are three main groups at Atlassian that work actively on preventing and responding to security incidents:

Security Engineering. These are the people that review code and look into the security of our products, making sure that they uncover vulnerabilities and address them in a timely fashion, preferably before they ship! We want all of our products to be secure from the ground up. Often members of a product’s security engineering team are involved in the response to a security incident so they have the context they need when it comes to remediating the vulnerability.

Security Intelligence. This team actively looks for suspicious things happening, or things that could happen, on the entire network. They respond to reports from customers and partners, as well the as the internal teams they work with. This is the group that actively protects our products and systems against vulnerabilities and directly responds to incidents.

Policies and Trust. This team is responsible for the communications of our security rules and policies and they publish to the trust website I mentioned above. The information they publish is meant to be useful to the broader security and development community, not just Atlassian customers. Again, referring back to our three guiding principles, we want to make information available to everyone and break down the barriers that often surround security discussion.

How we build our stack

People often ask us what we’re using in our own stack, especially as it pertains to security. We use a mix of our own products plus integrate with a few other best-of-breed tools. Being able to share information across the organization – and with partners and customers – is our first priority, so we naturally make sure that our stack is harmonized for this purpose.

Here’s a list of products we use and how we use them:

*Comprehensive detection and analysis: * We use Splunk to query our systems and uses heuristic analysis and anomaly detection based on policies our security intelligence team writes. When it comes to risk we want to cover all of our bases, so we write policies based on historical and theoretical incidents.

Agile communications loop: If we find something that just requires a quick fix we take care of it in the moment and then log what happened. But for something more involved we first record the incident in Jira, and then also in Slack, for broader communication.

Connected conversations: Sometimes alerts come in from customers or partners via Jira Service Desk and then go to Opsgenie to alert the right people. These alerts are also sent to Jira, and then Slack, so transparency is broad and everyone has visibility.

*Knowledge capture and transfer: * A lot of our playbooks are stored in Confluence and if we need to use any of them as a guide to a response we’ll reference them in the Jira ticket. From there, if a conversation also takes place in email, that information gets logged in Jira, too. And if the incident gets updated in Jira we’ll see that in Slack. We’ve created a really smooth way to make it so people can give input via the tool that makes the most sense for them and we don’t have to worry about people being left in a communication silo. Everyone can have access to everything.

How we respond

We have a predefined way, following industry best practices, that we respond to incidents, and the security intelligence team spends a lot of time detailing these processes out and training people on how to follow them. We do this so we don’t react too quickly and make the wrong fix, but really take the time to investigate an incident and make sure we agree on the approach to remediation.

*Detect and analyze. * As I said before, this is something that the Security Intelligence team focuses a lot of our time on. We write queries, look for certain vulnerable services, and measure the severity of any issues so we can determine what our response will be.
Investigate. Once an issue has been detected, and we’ve got an idea of the nature of the issue, we’ll conduct an investigation to determine the severity and urgency of the issue. For this, we use the security classification system from the VERIS community. This helps us make sure we’re using our resources effectively and not over- or under-reacting to incidents.
*Contain and eradicate. * In fact, up until something is determined to be a threat, we don’t call it an incident. But once we make that determination, and we’ve classified it, we call on those planned responses that the Security Intelligence team spends so much of time creating. We figure out who’s vulnerable, build the fix, work with the product teams to get the patch ready and make sure it’s all ready to go.
*Communicate. * Once we have the fix ready to deploy we work with marketing, and sometimes legal, to make sure our communications to customers are timely and clear, and that everyone understands how to install the code fixes. Much of this work is done in Confluence where we can review the draft and make comments and edits and ensure the announcement is rock solid.
*Post-incident review. * After we’re confident the problem is fixed and everyone has what they need, we do a post-incident review, called a PIR, and we track this in Jira. This is usually a collection of tasks we assign ourselves to take care of any actions that we need to take, like any tweaks to our response process or any people who may need some new training, and we assign deadlines to these tasks. We do this within the first week of the incident when everything is sharp in our minds. After deploying the fix this is one of the best ways to make sure our products and systems are safe.

We’ve published more detailed information about how our team responds to incidents here in the Trust section of the website.

As you can see, everyone at Atlassian takes security pretty seriously and we devote a lot of effort to making sure we have dedicated teams that build safe software and maintain our systems to keep them as secure as possible. And we think that focusing on the three guiding values—being open and available, being consistent, and always seek better ways—is a great approach for building trust with our community.

The post Behind the scenes of our security incident management process appeared first on Atlassian Blog.