DEV Community

Cover image for Amazon Cognito Observability Best Practices with Datadog

Amazon Cognito Observability Best Practices with Datadog

Amazon Cognito is an user authentication and authorization service that lets you enable sign-up, sign-in, and access control for your web and mobile systems. Cognito handles user accounts, password recovery, multi-factor authentication, and more. It also allows integration with popular single sign-on (SSO) services such as Google, Facebook, and Apple. Finally, one of its most important features is the ability to scale to millions of users.

There Are Two Types of Cognito Pools

1. User Pools
User Pools handle user sign-up, sign-in, and authentication. They act as a user directory managed by AWS. User Pools provide features such as multi-factor authentication (MFA), password policies, and integration with identity providers like Google, Apple, SAML, and OIDC.

The output of a User Pool is a set of tokens for authenticated users: an ID token (JWT), an access token, and a refresh token.

2. Identity Pools (also known as Federated Identities)
Identity Pools provide temporary AWS credentials that allow authenticated users to access AWS resources directly. They can work with User Pools or other identity providers.

Identity Pools can federate identities from multiple sources into a single AWS identity. They use AWS Security Token Service (STS) to issue temporary AWS access keys based on assigned IAM roles.

In this blog post, I will walk through observability in Amazon Cognito User Pools.

First things first: Cognito observability mainly relies on two types of telemetry data — metrics and logs. Let’s go through them in detail.

Amazon Cognito Metrics

You can enable the Amazon Cognito – Datadog integration to collect and monitor these metrics.

Amazon Cognito Datadog Integration

This integration will enable below metrics:

1. Sign-In Metrics

Measure user authentication activity and throttling.

✅ Sign-in Success % → aws.cognito.sign_in_successes
📊 Sign-in Requests → aws.cognito.sign_in_successes.samplecount
🏆 Successful Sign-ins → aws.cognito.sign_in_successes.sum
🚫 Throttled Sign-ins → aws.cognito.sign_in_throttles

2. Sign-Up Metrics

Track new user registrations and throttling.

✅ Sign-up Success % → aws.cognito.sign_up_successes
📊 Sign-up Requests → aws.cognito.sign_up_successes.samplecount
🏆 Successful Sign-ups → aws.cognito.sign_up_successes.sum
🚫 Throttled Sign-ups → aws.cognito.sign_up_throttles

3. Token Refresh Metrics

Monitor token refresh performance and throttling.

✅ Token Refresh Success % → aws.cognito.token_refresh_successes
📊 Token Refresh Requests → aws.cognito.token_refresh_successes.samplecount
🏆 Successful Token Refreshes → aws.cognito.token_refresh_successes.sum
🚫 Throttled Token Refreshes → aws.cognito.token_refresh_throttles

4. Federation Metrics

Track identity federation success and throttling.

✅ Federation Success % → aws.cognito.federation_successes
📊 Federation Requests → aws.cognito.federation_successes.samplecount
🏆 Successful Federation Requests → aws.cognito.federation_successes.sum
🚫 Throttled Federation Requests → aws.cognito.federation_throttles

5. Risk & Security Metrics

Measure detected risks and blocked requests.

⚠️ Account Takeover Risk → aws.cognito.account_take_over_risk, aws.cognito.account_takeover_risk
🔐 Compromised Credential Risk → aws.cognito.compromised_credential_risk, aws.cognito.compromised_credentials_risk
🟢 No Risk Detected → aws.cognito.no_risk
🛑 Any Risk Detected → aws.cognito.risk
⛔ Blocked by Config → aws.cognito.override_block

Amazon Cognito Logs

Amazon Cognito supports two plans, including the Plus plan — an enhanced set of user pool features designed for applications that require advanced security options. The Plus plan enables logging and analysis of user activity. It allows you to access logs, risk ratings, and CloudWatch metrics related to user authentication activity within your user pool.

Amazon Cognito Plan Types

You need to configure your Datadog Lambda forwarder function in AWS and add Cognito logs as a trigger to send the logs to Datadog.

Datadog Log Forwarder for Cognito

This will enable you to receive Cognito logs in Datadog.

Cognito Logs in Datadog

Amazon Cognito Log Attributes are as follows :

Group Attribute Description
User & Identity Information 🔶 userName The username involved in the event.
userSub Unique UUID assigned to the user in the User Pool.
idpName Identity Provider name (e.g., Google, Facebook, SAML).
clientId App client ID used for the request.
userPoolId Cognito User Pool identifier.
id Internal log event identifier.
Event Context 🔶 eventType Type of event (e.g., SignUp, SignIn, PasswordChange).
eventSource Source of the event (e.g., USER_AUTH_EVENTS).
🔶 eventResponse Event status (e.g., Pass, Fail, InProgress).
eventId Unique ID for the event.
eventTimestamp / timestamp Event time in epoch milliseconds.
creationDate Date/time the event record was created.
challenges Authentication challenges and outcomes (e.g., Password:Success).
Risk & Security Signals 🔶 riskDecision Risk analysis result (e.g., PASS, FAIL, BLOCK).
compromisedCredentialDetected Whether compromised credentials were detected (true/false).
riskLevel Level of risk detected (e.g., Low, Medium, High).
Client & Location Data 🔶 ipAddress IP address of the client making the request.
city City from IP geolocation.
country Country from IP geolocation.
deviceName Browser and OS details (e.g., Chrome 138, Windows 10).
Logging & Invocation Data logLevel Log severity level (e.g., INFO, WARN, ERROR).
host Host name (e.g., cognito).
service AWS service producing the log (e.g., cloudwatch).
version Log event schema version.
invoked_function_arn ARN of the Lambda function processing/forwarding the log.
logSourceId.userPoolId User Pool ID from log source metadata.
requestId AWS request identifier for the service call.
Feedback Data (Optional) eventFeedbackDate Date feedback was recorded.
eventFeedbackProvider Entity providing the feedback.
eventFeedbackValue Feedback result (e.g., Valid, Invalid).
Miscellaneous hasContextData Boolean indicating additional context data availability.

Username, Event Type, Event Response, Risk Decision, and IP Address are the most commonly used attributes that you can use to create rich custom metrics to facilitate many of your fine-grained drill-down needs.

Finally, it's about creating a Service Level Indicator (SLI) dashboard that provides a business perspective on things.

SLI Dashboard

Troubleshooting Cognito-Related Issues: Best Practices

  1. Use Amazon Cognito Plus Tier
    The Cognito Plus tier is highly recommended.
    It enables log delivery and provides risk-based metrics such as riskDecision, eventType, and eventResponse, which are essential for troubleshooting authentication and security issues.

  2. Build Custom Metrics Using Logs in Datadog
    Cognito logs come with rich attributes (refer to the previous table), allowing you to create powerful custom metrics.
    These metrics can offer deep visibility into user behavior, login patterns, error spikes, and other critical insights.

  3. Set Up SLI and SLO Dashboards
    It's important to translate technical metrics into your business context — in other words, what your end users are actually experiencing.
    This allows you to build meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs) that track reliability from a user-focused perspective.

That's a wrap on my AWS Cognito Observability Guide. Use these best practices to improve visibility, reduce troubleshooting time, and align system metrics with business goals.

Top comments (0)