A practical shape for metadata-only AI image datasets

#ai #opensource

Most AI image galleries eventually turn into wallpaper soup.

You scroll past something beautiful, weird, useful, or cursed. Five seconds later it is gone. The image might have had a prompt, a model name, a source page, tags, or useful safety context, but the feed treats all of that like packaging foam.

I have been building GeneratedGallery to test a different shape: a free AI image gallery where the image is not separated from the trail around it.

Site: https://generatedgallery.com

Dataset page: https://generatedgallery.com/ai-image-dataset

Manifest: https://generatedgallery.com/index/manifest.json

The dataset is metadata-first

The important caveat: this is not a rights-free image bundle.

GeneratedGallery is a discovery and provenance index. It points at public image records and keeps metadata together where available. Media rights stay with the upstream creator or platform. That distinction matters, because a lot of AI dataset conversations collapse into one vague bucket called "scraping" and then nobody can tell what is actually being shared.

For this project, the useful unit is a record, not a file dump.

A record can include:

image URL
thumbnail URL
prompt text when available
source URL
source platform
model or generation metadata when available
category
tags
safety label
indexed timestamp

That turns a gallery into something closer to a research surface. You can browse visually, but you can also inspect patterns. What prompts repeat? Which styles show up together? What categories are overrepresented? Where does the metadata vanish?

Why prompts should stay attached

The prompt is not always the whole recipe. Sometimes it is incomplete. Sometimes it is misleading. Sometimes generation settings, LoRAs, checkpoints, post-processing, or selection bias matter more.

Still, a prompt is useful context.

A prompt lets builders compare intent against output. It helps prompt writers learn from patterns. It gives researchers a weak but inspectable signal about how public AI image culture is describing itself.

Without the prompt trail, an AI image gallery is just vibes in a grid.

With the prompt trail, it becomes searchable memory.

Why source links should stay attached

Source links are boring until you need one.

If you are using an image as inspiration, the source page helps you understand context. If you are researching prompt trends, the source page gives you a way to verify metadata. If you are building tools around generated media, source links help keep the record honest.

Generated media needs boring plumbing:

where did this come from
what was known at index time
what is uncertain
what safety label was attached
what changed later

That is not glamorous, but it is the difference between a pile of thumbnails and an actual index.

The public export

GeneratedGallery exposes a public manifest and JSONL export so the archive is not trapped inside the UI.

Start here:

Dataset page: https://generatedgallery.com/ai-image-dataset
Manifest: https://generatedgallery.com/index/manifest.json
Protocol notes: https://generatedgallery.com/protocol
Creator kit: https://generatedgallery.com/protocol/creator-kit

The manifest is meant to be boring and inspectable. The JSONL feed is meant to be easy to process with normal tools.

That means a builder can pull records into a notebook, run simple searches, test an image browsing UI, inspect prompt distributions, or build small agents that understand image packs without scraping the web interface.