Amazon Cognito is an user authentication and authorization service that lets you enable sign-up, sign-in, and access control for your web and mobile systems. Cognito handles user accounts, password recovery, multi-factor authentication, and more. It also allows integration with popular single sign-on (SSO) services such as Google, Facebook, and Apple. Finally, one of its most important features is the ability to scale to millions of users.
There Are Two Types of Cognito Pools
1. User Pools
User Pools handle user sign-up, sign-in, and authentication. They act as a user directory managed by AWS. User Pools provide features such as multi-factor authentication (MFA), password policies, and integration with identity providers like Google, Apple, SAML, and OIDC.
The output of a User Pool is a set of tokens for authenticated users: an ID token (JWT), an access token, and a refresh token.
2. Identity Pools (also known as Federated Identities)
Identity Pools provide temporary AWS credentials that allow authenticated users to access AWS resources directly. They can work with User Pools or other identity providers.
Identity Pools can federate identities from multiple sources into a single AWS identity. They use AWS Security Token Service (STS) to issue temporary AWS access keys based on assigned IAM roles.
In this blog post, I will walk through observability in Amazon Cognito User Pools.
First things first: Cognito observability mainly relies on two types of telemetry data — metrics and logs. Let’s go through them in detail.
Amazon Cognito Metrics
You can enable the Amazon Cognito – Datadog integration to collect and monitor these metrics.
This integration will enable below metrics:
1. Sign-In Metrics
Measure user authentication activity and throttling.
✅ Sign-in Success % → aws.cognito.sign_in_successes
📊 Sign-in Requests → aws.cognito.sign_in_successes.samplecount
🏆 Successful Sign-ins → aws.cognito.sign_in_successes.sum
🚫 Throttled Sign-ins → aws.cognito.sign_in_throttles
2. Sign-Up Metrics
Track new user registrations and throttling.
✅ Sign-up Success % → aws.cognito.sign_up_successes
📊 Sign-up Requests → aws.cognito.sign_up_successes.samplecount
🏆 Successful Sign-ups → aws.cognito.sign_up_successes.sum
🚫 Throttled Sign-ups → aws.cognito.sign_up_throttles
3. Token Refresh Metrics
Monitor token refresh performance and throttling.
✅ Token Refresh Success % → aws.cognito.token_refresh_successes
📊 Token Refresh Requests → aws.cognito.token_refresh_successes.samplecount
🏆 Successful Token Refreshes → aws.cognito.token_refresh_successes.sum
🚫 Throttled Token Refreshes → aws.cognito.token_refresh_throttles
4. Federation Metrics
Track identity federation success and throttling.
✅ Federation Success % → aws.cognito.federation_successes
📊 Federation Requests → aws.cognito.federation_successes.samplecount
🏆 Successful Federation Requests → aws.cognito.federation_successes.sum
🚫 Throttled Federation Requests → aws.cognito.federation_throttles
5. Risk & Security Metrics
Measure detected risks and blocked requests.
⚠️ Account Takeover Risk → aws.cognito.account_take_over_risk, aws.cognito.account_takeover_risk
🔐 Compromised Credential Risk → aws.cognito.compromised_credential_risk, aws.cognito.compromised_credentials_risk
🟢 No Risk Detected → aws.cognito.no_risk
🛑 Any Risk Detected → aws.cognito.risk
⛔ Blocked by Config → aws.cognito.override_block
Amazon Cognito Logs
Amazon Cognito supports two plans, including the Plus plan — an enhanced set of user pool features designed for applications that require advanced security options. The Plus plan enables logging and analysis of user activity. It allows you to access logs, risk ratings, and CloudWatch metrics related to user authentication activity within your user pool.
You need to configure your Datadog Lambda forwarder function in AWS and add Cognito logs as a trigger to send the logs to Datadog.
This will enable you to receive Cognito logs in Datadog.
Amazon Cognito Log Attributes are as follows :
Group | Attribute | Description |
---|---|---|
User & Identity Information | 🔶 userName | The username involved in the event. |
userSub | Unique UUID assigned to the user in the User Pool. | |
idpName | Identity Provider name (e.g., Google, Facebook, SAML). | |
clientId | App client ID used for the request. | |
userPoolId | Cognito User Pool identifier. | |
id | Internal log event identifier. | |
Event Context | 🔶 eventType | Type of event (e.g., SignUp, SignIn, PasswordChange). |
eventSource | Source of the event (e.g., USER_AUTH_EVENTS). | |
🔶 eventResponse | Event status (e.g., Pass, Fail, InProgress). | |
eventId | Unique ID for the event. | |
eventTimestamp / timestamp | Event time in epoch milliseconds. | |
creationDate | Date/time the event record was created. | |
challenges | Authentication challenges and outcomes (e.g., Password:Success). | |
Risk & Security Signals | 🔶 riskDecision | Risk analysis result (e.g., PASS, FAIL, BLOCK). |
compromisedCredentialDetected | Whether compromised credentials were detected (true/false). | |
riskLevel | Level of risk detected (e.g., Low, Medium, High). | |
Client & Location Data | 🔶 ipAddress | IP address of the client making the request. |
city | City from IP geolocation. | |
country | Country from IP geolocation. | |
deviceName | Browser and OS details (e.g., Chrome 138, Windows 10). | |
Logging & Invocation Data | logLevel | Log severity level (e.g., INFO, WARN, ERROR). |
host | Host name (e.g., cognito). | |
service | AWS service producing the log (e.g., cloudwatch). | |
version | Log event schema version. | |
invoked_function_arn | ARN of the Lambda function processing/forwarding the log. | |
logSourceId.userPoolId | User Pool ID from log source metadata. | |
requestId | AWS request identifier for the service call. | |
Feedback Data (Optional) | eventFeedbackDate | Date feedback was recorded. |
eventFeedbackProvider | Entity providing the feedback. | |
eventFeedbackValue | Feedback result (e.g., Valid, Invalid). | |
Miscellaneous | hasContextData | Boolean indicating additional context data availability. |
Username, Event Type, Event Response, Risk Decision, and IP Address are the most commonly used attributes that you can use to create rich custom metrics to facilitate many of your fine-grained drill-down needs.
Finally, it's about creating a Service Level Indicator (SLI) dashboard that provides a business perspective on things.
Troubleshooting Cognito-Related Issues: Best Practices
Use Amazon Cognito Plus Tier
The Cognito Plus tier is highly recommended.
It enables log delivery and provides risk-based metrics such as riskDecision, eventType, and eventResponse, which are essential for troubleshooting authentication and security issues.Build Custom Metrics Using Logs in Datadog
Cognito logs come with rich attributes (refer to the previous table), allowing you to create powerful custom metrics.
These metrics can offer deep visibility into user behavior, login patterns, error spikes, and other critical insights.Set Up SLI and SLO Dashboards
It's important to translate technical metrics into your business context — in other words, what your end users are actually experiencing.
This allows you to build meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs) that track reliability from a user-focused perspective.
That's a wrap on my AWS Cognito Observability Guide. Use these best practices to improve visibility, reduce troubleshooting time, and align system metrics with business goals.
Top comments (0)