hisatomo futahashi

Posted on May 14

Datadog Archive Search is now GA — A Hands-on Walkthrough with S3

#datadog #aws

Introduction

Hi, this is futahashi from Japan. In this post, I'd like to share my experience trying out Datadog Archive Search. Logs are something everyone uses on a daily basis, but they are also notoriously hard to design well and easy to overspend on, so I'm sure many of you will appreciate this update as much as I do.

A small note before we begin: I'm not a native English speaker, so please bear with any awkward phrasing. I've done my best to keep the technical content accurate — feedback and corrections are very welcome.

This article focuses on AWS only.

TL;DR

You can stream-query archived logs immediately, without Rehydration
You can also run a Rehydration on top if you need to
You can narrow the scan by time or attribute
Pricing is based on the scanned data size
Each search keeps up to 100k events of search results for 24 hours, for free
Without Rehydration:
- No advanced analysis such as aggregations or visualizations
- No references from other features (Dashboard / Notebook / Log Explorer, etc.)

What changed

"I want to search for logs from three weeks ago" — but the Log Retention Periods on this environment is only 15 days. Situations like this come up sometimes. To deal with this, we humans have historically had the following options:

① Run a Rehydrate

Use Datadog Rehydrate feature to index Archived Logs. It runs as a batch process and takes anywhere from minutes to hours before the logs become searchable. Indexing also costs money.

② Use a different tool (Amazon Athena, etc.)

Use another tool built for analyzing archive logs. You lose Datadog excellent UI, and investigation efficiency drops.

③ Revisit Log Retention Periods

This only takes effect going forward, but depending on the frequency of the use case, revisiting Log Retention Periods is an option. However, the Indexing bill scales with retention, so weighing return on investment is important.

This is exactly where the now-GA Archive Search comes in. There is no batch waiting time like Rehydrate — results stream in incrementally. Before you run, you can preview your query against up to 1,000 sample Archive Logs, so you can verify your filter before kicking off a wasted scan. The billing model is also different from Rehydrate: Archive Search charges only for scan, and indexing through Rehydrate is optional. On top of that, each Archive Search keeps up to 100k events of logs for 24 hours, for free. You can search Archive Logs in Datadog quickly, easily, and intelligently, which is a big win both for investigation efficiency and for cost savings.

To summarize:

	Before (Rehydrate only)	Archive Search
Log retrieval	Batched, delayed	Streaming, fast
Wait time	Minutes to hours	Streamed
Pre-query check	Not possible	Verifiable on a 1,000-event sample
Indexing required	Required	Optional
Pricing axis	Index + Scan	Scan only (Index only if you Rehydrate)
Retention	3–180 days	24h (3–180 days if Rehydrated)
Event count	No limit	Up to 100k events per Archive Search, free for 24h
Example use case	Investigating Archive Logs beyond 24 hours; needing advanced analysis	Simply searching Archive Logs; intelligently narrowing logs before Rehydrate

Overall architecture

Here's an image of Archive Search and the things related to it. This is just my own cheerful mental model, not official information, so please take it as something to give you a feel for how it fits together.

① Logs are ingested into Datadog from a Datadog Forwarder, etc.
② Once logs are ingested, based on the Log Archive config, filtered logs are sent to the S3 bucket used for archiving. At this point, the Role that Datadog uses (usually DatadogIntegrationRole) needs a policy that grants read/write access to the S3 bucket.
③ When you run Archive Search, logs are retrieved from the archive S3 and kept in a dedicated Archive Search index for up to 100k events for 24 hours. This index is not accessible from Log Explorer.
④ If needed, you can Rehydrate to keep the logs in an index that's accessible from Log Explorer.
Notes
- ⑤ The regular Log Index keeps logs based on a config separate from Log Archive.
- ⑥ The reason I draw conventional Rehydration and Rehydration from Archive Search as separate is that, at the moment, Rehydrations triggered from Archive Search don't show up in Historical Views, so they appear to be treated as separate things — and I'm following that.

Let's try it

Here's what I tried with Archive Search, and what I learned. I'm assuming you're starting from creating a new archive, so if you already have an archive, feel free to skip ahead to the permissions section.

S3 bucket and IAM policy

First, if you don't already have an S3 bucket to store the archive logs, create one. The output path for archive logs is configurable, so you can also use an existing suitable bucket.

Next, attach an S3 read/write policy to the Datadog AWS Integration Role. Below is an example — replace the bucket name and prefix with values appropriate to your environment.

Example target bucket

Bucket name: some-datadog-enthusiast-bucket
Prefix: datadog/logs/

Example policy to add:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DatadogUploadAndRehydrateLogArchives",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject"],
      "Resource": [
        "arn:aws:s3:::some-datadog-enthusiast-bucket/datadog/logs/*"
      ]
    },
    {
      "Sid": "DatadogRehydrateLogArchivesListBucket",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": [
        "arn:aws:s3:::some-datadog-enthusiast-bucket"
      ]
    }
  ]
}

Archive configuration

Create the archive. The fields are as follows:

Archive Name: The archive name.
Define Which Data To Forward: A filter for which logs to archive.
Set Archive Type: The archive type (Amazon S3 / Google Cloud Storage / Azure Storage).
Configure Bucket
- AWS Account: The AWS account and DatadogIntegrationRole.
- S3 bucket: The name of the archive S3 bucket.
- Storage Class: The object class (Standard / Standard IA / Intelligent Tiering / One Zone IA / Glacier IR).
- Path (Optional): The output prefix.
Advanced Settings
- Compression method: The compression method (ZSTD / GZIP).
- Encryption Type: The encryption type (Default S3 Bucket-Level Encryption / Amazon S3 Managed Keys / AWS Key Management Service).
- Tags in your Archive: A setting to additionally store Datadog Tags with the logs.
- Tags on your Rehydrated Logs: A setting to add specific tags to Rehydrated logs.
- Scan size for your rehydration: The setting for the maximum scan size of Rehydration.

With Define Which Data To Forward, you can configure a filter for the logs you want to archive, so you can avoid archiving unnecessary logs.

For Compression method, ZSTD is recommended when your goal is Archive Search or Rehydrate, since it reduces scan and egress cost. If you also plan to use other tools, GZIP may be an option as well.

You can also specify Storage Class and Encryption Type, which lets you optimize cost and apply appropriate security for your environment.

By running Test Configuration, you can verify whether the role has the appropriate read/write permissions on the configured bucket.

In addition, there are settings called Partition Attributes and Lookup Attributes, which are currently in Preview. These are said to reduce scan size by splitting data into directory hierarchies by attribute, or by skipping unneeded data blocks. I haven't applied for the Preview and couldn't try them out, so I'm leaving them out of this post.

Permissions

To run Archive Search, two kinds of permissions are required. Make sure to grant the necessary permissions for usage.

① Logs Write Historical Views: Required to run Archive Search
② Logs Read Archive: Required to read the archive

Datadog Admin Role and Datadog Standard Role carry both of these permissions, but Datadog Read Only Role only has Logs Read Archive — please be careful. Also, Archive Search results have Restriction Queries applied, so only logs you're allowed to view will be shown. That's reassuring!

Running Archive Search

From the Datadog left-hand menu, open Logs and switch to the Archive Search tab. On this screen, you can configure Archive Search, see a list of past Archive Searches, and start a new Archive Search. You can start one with the New Search button.

By the way, in the settings, you can configure the Rehydration volume limit and Rehydration retention periods as shown below.

Next, to choose what to search, fill in the following:

Archive Name: Specify the target archive.
Filter (Optional): Optionally specify a filter.
Timeframe: The target time range.
Mode
- ① Search: Up to 100,000 events kept for 24 hours, free.
- ② Search & Rehydration: 3–180 days of Indexing.

Specifying a filter lets you keep down the Indexing cost at Rehydration time, and, with Lookup Attributes (currently in Preview), it also enables reductions in scan size.

For choosing the mode, Search seems good for lightweight searches, and Search & Rehydration for use cases that need complex analysis. Even if you choose Search, you can still Rehydrate afterward, so there's nothing to worry about.

By checking Estimated scan size, you can see an estimate of the scan size before you actually run. Preview Log Sample lets you preview up to 1,000 logs from the same partition in the target period. You can click into logs to see details and to validate whether your filter query makes sense. That's a thoughtful touch!

Pressing Search runs Archive Search.

Checking the Archive Search results

Once Archive Search runs, results stream in incrementally and you can quickly search Archive Logs. This is a dedicated Archive Search view, different from Log Explorer. Aggregations and similar analysis aren't available, but filtering is.

In addition, you can show/hide columns from Options.

You can run a Rehydration as needed. With Rehydration, you gain the ability to explore and analyze the logs in Log Explorer. If you need complex analysis, or if you need to retain the logs for a certain period, go ahead and Rehydrate.

Both Archive Search and Rehydration results can be referenced from the Archive Search screen anytime within their retention period. For Rehydrated results, a link to Explorer is also generated — another thoughtful touch.

Notes and caveats

Here are the notes and caveats around using Archive Search.

① No aggregations / visualizations / integration with other features

Archive Search provides a dedicated view that's different from Log Explorer. Here, group-by aggregations, visualizations like pie charts, and integrations with other Datadog features (Dashboard / Notebook / Log Explorer, etc.) are not available. If you want any of those, you'll need to run the Rehydration option to perform Indexing.

② Scan appears to run at the same S3 path level

Archives have the following structure, and the hour directory contains multiple compressed files:

dt=YYYYMMDD/hour=HH/archive_*****

I couldn't find this documented officially, but the scan behavior looks like it operates at the same S3 path level. In other words, all the compressed data for one hour is in scope for the scan. As an extreme example, scanning 00:00–00:01 and scanning 00:00–00:59 have the same scan size, and 00:59–01:00 would be twice that size. As a workaround, you could split your archives based on the ranges you query most often, but in practice that's painful, so I'm looking forward to Partition Attributes and Lookup Attributes.

③ Query Preview is a 1,000-event sample

Query Preview samples appear to show 1,000 archive samples from the same partition that falls within the Timeframe you specified. As mentioned above, partitions are split at the hourly level, so these are 1,000 samples within an hour-sized window. Treat Preview as a feature for validating query syntax and confirming the archive structure — don't misread a zero-result preview as meaning "no matches for that time."

④ Scan stops at 100k events

Archive Search scans stop at 100,000 events per scan. To retrieve everything within a specified range, you'll need to use the results to exclude unneeded logs via a filter, or narrow the timeframe, so the result fits within 100k events, then re-run.

When re-running, the Clone feature is handy. The form gets pre-filled with the original search's filter and timeframe, so you can tweak only what you need and re-submit.

⑤ Cloud-side retrieval and transfer costs are billed separately

In addition to Datadog's scan-based billing, retrieval from S3 cold storage and egress to Datadog are billed by your cloud provider.

⑥ Archive config does not apply retroactively

Archive config applies only to logs archived after the setting is in place. It does not apply retroactively to logs archived in the past, so it needs to be set up in advance.

Pricing

Overview

The price table below uses the AP1 On-demand prices at the time of writing for reference. For the latest and most accurate information, refer to the official Datadog Pricing.

Item	Unit price
Archive Search	$0.07 / GB scanned
Temporary index for Archive Search results	Free (up to 100k events, 24 hours)
Rehydration Scan	$0.13 / GB scanned
Rehydration Indexing	Same as Logs Indexing
Logs Ingestion	$0.13 / GB ingested
Logs Indexing (15 day retention)	$3.19 / 1M log events
Forwarding to S3 / GCS / Azure (archive writes)	Included in Logs Ingestion

Archive Search is billed based on scanned data size. At the moment, you reduce scan size via the Archive & Forwarding filter and the Archive Search Timeframe. Once the Lookup Attribute setting (currently in Preview) is available, you'll also be able to reduce scan via the Archive Search filter. There's also an annual contract option for Archive Search, so if you can predict a certain volume of usage, you can get a commit-based discount. Archive Search results — up to 100,000 events — are kept for 24 hours with no Rehydration charge. That's lovely!

There's no additional charge for writing archive logs. Forwarding to S3 / GCS / Azure Storage is included in the Logs Ingestion price. Forwarding to destinations other than archives, such as external SIEMs and BI vendors, is billed separately, so be careful.

Rehydration pricing has a two-tier structure. One part scales with the size of the compressed logs scanned. The other is the Indexing cost when you index the matched logs, which follows the same pricing as regular Logs Indexing. Also, as mentioned above, Archive Search's scan price is roughly half the price.

Pricing comparison examples

① 15 Day Retention

Assume a service generates 100 GB / 100M log events per month, runs Indexed Logs at 15-day retention, and the data is roughly 10 GB on S3.

Item	Calculation	Monthly
Ingestion	100 GB × $0.13	$13
Indexing (15 Day Retention)	100M × $3.19	$319
Total		$322

② 7 Day Retention + Archive Search

Now consider running Archive Search alongside, so that 7-day retention is enough. Even if you go wild with Archive Search and scan the full 10 GB, the cost is as follows — about a 21% cost reduction. You can see how much of the total cost Indexing accounts for, and how cheap Archive Search is.

Item	Calculation	Monthly
Ingestion	100 GB × $0.13	$13
Indexing (7 Day Retention)	100M × $2.39	$239
Archive Search	10 × $0.07	$0.70
S3 Standard Storage	10 × $0.025	$0.25
S3 Internet Out	10 × $0.114	$1.14
Total		$254.09

③ 7 Day Retention + Archive Search + Rehydration

Under the same conditions, if you Rehydrate 10% of what Archive Search returned, the Indexing portion grows by 10%, so:

Item	Calculation	Monthly
Ingestion	100 GB × $0.13	$13
Indexing (7 Day Retention)	(100M + 10M) × $2.39	$262.9
Archive Search	10 × $0.07	$0.70
S3 Standard Storage	10 × $0.025	$0.25
S3 Internet Out	10 × $0.114	$1.14
Total		$277.99

I went with an extreme example where Archive Search and Rehydration are pushed to the limit, but in practice the scan volume should be much smaller than the full archive, and the Rehydrated index can be kept down further by tuning the time range and queries. You can also pick a shorter Day Retention for Rehydration to make it even cheaper.

Wrap-up

In this post, I covered how to use Datadog Archive Search and the caveats to watch out for. This feature solves the conventional Index and Rehydrate pain points around cost and time while still leveraging existing features, and it brings cost reduction and operational efficiency benefits to many users.

Depending on the use case, you can revisit Log Retention Periods, or even whether you keep logs in Indexed Logs at all, and dramatically cut Indexing cost. It also serves as a clever and inexpensive option as a pre-stage to Rehydrate. You can now narrow down past logs first, then Rehydrate only the parts you actually need.

I'm looking forward to seeing more cheap, fast, and clever evolutions from Datadog.

DEV Community