DEV Community

How to Transition AWS WAF from COUNT Mode to BLOCK Mode

In this article, I will share my experience after a colleague asked me, “I've been running WAF in COUNT mode for a while, but how do I determine which rules should be switched to BLOCK mode?”

Although I initially thought it would be straightforward, I couldn't easily explain it, so I created a procedure. This article outlines how to search logs in Amazon CloudWatch Logs, analyze them, and determine when to switch to BLOCK mode. Even if you're using a different WAF, the same logic should apply when transitioning to BLOCK mode.

Since this article assumes prior knowledge, I won’t explain basic concepts like “What is WAF?” or the specifications of AWS WAF’s managed rules.

COUNT Mode vs BLOCK Mode

WAF typically has two modes: COUNT and BLOCK. Although there is also an ALLOW mode, it’s less commonly used as a default setting.

When first providing a service, you use COUNT mode because starting with BLOCK mode might accidentally block legitimate requests. After running in COUNT mode, you assess the logs to decide whether to switch to BLOCK mode.

This article aims to share the know-how of analyzing logs. The general idea is that the rule can be switched to BLOCK if legitimate requests aren't recorded under COUNT mode.

The truth is, the decision of whether or not to block also depends on the service we are providing.
For example, it fluctuates depending on the following factors.

  • Does the service require authentication?
  • Is it offered internationally?
  • Does it provide APIs that third parties can call?
  • Is it B2C or B2B?
  • This article focuses on general SaaS services and how to proceed.

This article summarizes the content of what you should look for in general SaaS.

Also, you need to pay attention to the logs output by AWS WAF, so let's examine those first.

Viewing AWS WAF Logs

Here, we will look at the logs using CloudWatch Logs Insights.
However, you will only use CloudWatch Logs Insights the first time to download the COUNT logs.
Because the AWS WAF logs are complex and the detection details are written in arrays, it is extremely difficult for me to analyze them using a query as they are, and depending on the number of logs, trial and error can end up costing a lot of money. Furthermore, when you want to search for User-Agent and user-agent at the same time, you can't even write a regular expression, so I chose jq. However, of course, there are cases where it is not possible to download the logs for governance reasons. In that case, I personally think it is better to output the logs to Amazon S3 and use Amazon Athena.

First of all, we need to understand the structure of the log, so let's take a look. It looks like this.

{

    "timestamp":1533689070589,                            
    "formatVersion":1,                                   
    "webaclId":"385cb038-3a6f-4f2f-ac64-09ab912af590",  
    "terminatingRuleId":"Default_Action",                
    "terminatingRuleType":"REGULAR",                     
    "action":"ALLOW",                                    
    "httpSourceName":"CF",                               
    "httpSourceId":"i-123",                             
    "ruleGroupList":[                                    
     {  
        "ruleGroupId":"41f4eb08-4e1b-2985-92b5-e8abf434fad3",
        "terminatingRule":null,    
        "nonTerminatingMatchingRules":[                  
            {"action" : "COUNT", "ruleId" : "4659b169-2083-4a91-bbd4-08851a9aaf74"}       
        ],
        "excludedRules":[
            {"exclusionType" : "EXCLUDED_AS_COUNT", "ruleId" : "5432a230-0113-5b83-bbb2-89375c5bfa98"}
        ]                          
     }
    ],
    "rateBasedRuleList":[                                 
     {  
        "rateBasedRuleId":"7c968ef6-32ec-4fee-96cc-51198e412e7f",   
        "limitKey":"IP",
        "maxRateAllowed":100                                                                                           
     },
     {  
        "rateBasedRuleId":"462b169-2083-4a93-bbd4-08851a9aaf30",
        "limitKey":"IP",
        "maxRateAllowed":100
     }
    ],      
    "nonTerminatingMatchingRules":[                                
        {"action" : "COUNT",  "ruleId" : "4659b181-2011-4a91-bbd4-08851a9aaf52"}    
    ],                                  
    "httpRequest":{                                                             
        "clientIp":"192.10.23.23",                                           
        "country":"US",                                                         
        "headers":[                                                                 
            {  
                "name":"Host",
                "value":"127.0.0.1:1989"
             },
             {  
                "name":"User-Agent",
                "value":"curl/7.51.2"
             },
             {  
                 "name":"Accept",
                 "value":"*/*"
             }
        ],
        "uri":"REDACTED",                                                
        "args":"usernam=abc",                                         
        "httpVersion":"HTTP/1.1",
        "httpMethod":"GET",
        "requestId":"cloud front Request id"                    
    }
}
Enter fullscreen mode Exit fullscreen mode

Reference: https://docs.aws.amazon.com/waf/latest/developerguide/classic-logging.html

In the case of AWS WAF, the field to look at in COUNT mode is called nonTerminatingMatchingRules.
If the rule is set to ALLOW or BLOCK mode, you need to look at the TerminatingMatchingRules field, which ends processing when a match is found, but COUNT mode is a non-terminating process, and it evaluates whether or not other rules match until the end. Therefore, you need to look at notTerminatingMatchingRules.

Analyzing Logs

You can extract the COUNT log by executing the following query.

fields @timestamp, @message 
| filter @message like /"action":"COUNT"/ 
| sort @timestamp desc 
| display @timestamp, @message, httpRequest.clientIp, httpRequest.uri, httpRequest.country
Enter fullscreen mode Exit fullscreen mode

The above results can be downloaded in JSON format. We recommend a minimum period of one month, but you should be aware that CloudWatch Logs Insights can only retrieve up to 10,000 items at a time.

Once you've downloaded it, you can use jq to analyze it.

Determining if the Request is Legitimate

Now that we've come this far, we'll finally start to judge whether or not the request is legitimate, but we'll continue on the assumption that the SaaS has authentication (login).

First, run the following query. As previously mentioned, check the ruleId within the nonTerminatingMatchingRules.

jq -r '.[] | .["@message"].nonTerminatingMatchingRules[] | .ruleId' logs-insights-results.json | sort | uniq -c | sort -nr 

10 AWSManagedRulesAnonymousIpList 
5   AWSManagedRulesCommonRuleSet 
5   AWSManagedRulesBotControlRuleSet 
1   AWSManagedRulesLinuxRuleSet 
1   AWSManagedRulesKnownBadInputsRuleSet
Enter fullscreen mode Exit fullscreen mode

You can confirm these results. It is okay to set the rules that are not displayed here to BLOCK. This is because the fact that they are not recorded in normal use means that they are only recorded when an attack is received.

Next, we will look at the combination of rules and URIs.

jq -r '.[] | .["@message"] as $msg | $msg.nonTerminatingMatchingRules[] | "\(.ruleId) \($msg.httpRequest.uri)"' logs-insights-results.json | sort AWSManagedRulesAnonymousIpList / 

AWSManagedRulesAnonymousIpList / 
AWSManagedRulesAnonymousIpList /.env 
AWSManagedRulesAnonymousIpList /auth/favicon.ico AWSManagedRulesAnonymousIpList /auth/login 
AWSManagedRulesAnonymousIpList /auth/login 
AWSManagedRulesAnonymousIpList /auth/login 
AWSManagedRulesAnonymousIpList /auth/login 
AWSManagedRulesAnonymousIpList /wp-login.php 
AWSManagedRulesAnonymousIpList /wp-login.php 
AWSManagedRulesBotControlRuleSet / 
AWSManagedRulesBotControlRuleSet / 
AWSManagedRulesBotControlRuleSet /auth/favicon.ico AWSManagedRulesBotControlRuleSet /auth/login 
AWSManagedRulesBotControlRuleSet /auth/login 
AWSManagedRulesCommonRuleSet / 
AWSManagedRulesCommonRuleSet /news/12345
AWSManagedRulesCommonRuleSet /news/files 
AWSManagedRulesCommonRuleSet /auth/login 
AWSManagedRulesCommonRuleSet /news/sources/03a0 AWSManagedRulesKnownBadInputsRuleSet /.env 
AWSManagedRulesLinuxRuleSet /.env
Enter fullscreen mode Exit fullscreen mode

In the following cases, it is possible to determine that it is BLOCK.

  • Detection is only performed for URIs that are not used in the service
    • In the above example, the following are targeted
      • AWSManagedRulesKnownBadInputsRuleSet
      • AWSManagedRulesLinuxRuleSet
  • IP addresses/bot-related rules that are not detected in access after authentication
    • If there is no detection even after authentication, it can be seen that it is not legitimate use
      • It means that an unauthorized user logged in and was unable to use it
      • If you don't want to allow bot access in the first place, just BLOCK them
  • In the example above, the following are targeted
    • AWSManagedRulesAnonymousIpList
    • AWSManagedRulesBotControlRuleSe

If you are not confident about your decision here, outputting the User-Agent may help you make a decision. You can output the User-Agent as follows (I will omit the results)

jq -r '.[] | .["@message"] as $msg | $msg.nonTerminatingMatchingRules[] | "\(.ruleId) \($msg.httpRequest.uri) \($msg.httpRequest.headers[] | select(.name | test("(?i)user-agent")) | .value)"' logs-insights-results.json | sort
Enter fullscreen mode Exit fullscreen mode

When you look at this, you can see User-Agent that are clearly not your own users, so you can use this as an element to determine that it is not a legitimate request Basically, the rules for BLOCK can be determined from this investigation. For the rest, we recommend continuing to operate in COUNT mode in case anything happens. The reason for BLOCKing up to this point is that there are not many cases where it is okay to change to BLOCK after logging in, and if there is a case where BLOCK should be used, it means that the site has already been attacked, so there is a possibility that incident response will begin.

Notes on detecting bots

AWS WAF has a managed rule called “AWSManagedRulesBotControlRuleSet”. As the name suggests, this is for detecting bot access.
If you BLOCK access because “our service does not allow bot access”,
there may be unexpected pitfalls. When you set this to BLOCK, please keep in mind the E2E test. It is possible that the E2E test will be detected as a Bot access, the test will fail, and the deployment will fail. Of course, you will be able to notice this before it goes live, so it won't be a big problem, but it is possible that the deployment will suddenly fail, and you won't know what the cause is, so I think it would be good to be aware of this. Also, if you are using a Bot for a service that anyone can access, such as a toC service, you may receive many attack scans if you are outputting logs. This may result in a large amount of output and cost you money. Please consider whether or not to enable BotControl, including these points.

Conclusion

This article summarized how to determine which rules can be safely switched to BLOCK mode based on AWS WAF logs. While it may seem straightforward, figuring out how to search logs required some trial and error. Understanding the structure of logs is key.

I hope this article helps, and I’d appreciate any feedback or suggestions. Thank you for reading.

Top comments (0)