DEV Community

Cover image for Using Chat GPT To Generate Datasets 🤖
Cris Mihalache
Cris Mihalache

Posted on

8 1 2 3 2

Using Chat GPT To Generate Datasets 🤖

Recently I used Chat GPT to generate mock data in JSON format.

This is extremely powerful, as you can describe what real data looks like in your prompt, and Chat GPT will generate data with realistic content in fields like descriptions, titles, tags, etc.

This is a sample of the data Chat GPT generated:

  {
    "id": 3,
    "description": "Implementing OAuth2 for user authentication",
    "tags": [
      "@feature",
      "@security"
    ],
    "start": "2024-02-22T03:11:03",
    "end": "2024-02-22T06:36:18"
  }
Enter fullscreen mode Exit fullscreen mode

Notice how the description, tags, and duration are all realistic.

Example

To build the documentation website for track-time-cli, I needed a database with mock data to have realistic command output in the documentation content.

To get Chat GPT to generate them for me, I used the following prompt:

Generate 5 time sheet entries with realistic software development tasks as descriptions. Use the following schema:

        {
          "id": 1,
          "description": "writing documentation",
          "tags": [
            "@feature",
            "@docs"
          ],
          "start": "2024-02-21T18:26:45",
          "end": "2024-02-21T19:33:08"
        }

The entries should be sequential in time, of varying durations (between 5 minutes and 5 hours), and have realistic descriptions that fit the software development time sheet theme.

The output should be in JSON, an array of time sheet entries conforming to the provided schema example.
Enter fullscreen mode Exit fullscreen mode

Note that I specified the number of entries (5) and the type of activity (realistic software development tasks) that should be mentioned in the description.

I defined the output schema by providing an example of a correctly formatted time sheet entry. I also specified that they should be sequential in time and have realistic durations as would be expected for the defined type of activity.

Result

This was the output:

[
  {
    "id": 1,
    "description": "Refactoring user interface code",
    "tags": [
      "@feature",
      "@frontend"
    ],
    "start": "2024-02-21T18:26:45",
    "end": "2024-02-21T22:39:32"
  },
  {
    "id": 2,
    "description": "Optimizing database queries for report generation",
    "tags": [
      "@performance",
      "@database"
    ],
    "start": "2024-02-21T22:39:32",
    "end": "2024-02-22T03:11:03"
  },
  {
    "id": 3,
    "description": "Implementing OAuth2 for user authentication",
    "tags": [
      "@feature",
      "@security"
    ],
    "start": "2024-02-22T03:11:03",
    "end": "2024-02-22T06:36:18"
  },
  {
    "id": 4,
    "description": "Weekly team sync-up meeting",
    "tags": [
      "@meeting",
      "@communication"
    ],
    "start": "2024-02-22T06:36:18",
    "end": "2024-02-22T10:16:24"
  },
  {
    "id": 5,
    "description": "Adding localization support for new languages",
    "tags": [
      "@feature",
      "@i18n"
    ],
    "start": "2024-02-22T10:16:24",
    "end": "2024-02-22T11:50:10"
  }
]
Enter fullscreen mode Exit fullscreen mode

After running this prompt a few times to generate 16 entries, I placed them all in an array within a mock JSON database. The output of running track-time-cli's list command was the following:

 23:50 ▶ 97% ▶ xia ▶ f3rno64 ▶ .track-time-cli ▶  main ▶ ✎ ▶ 2? ▶ $ ▶ tt l
* Showing sheets since 2/21/2024, 12:00:00 AM [yesterday]

- Sheet main (16 entries) [1:10:19:16]
    (1)   [9:28:26]  yesterday       ->  14 hours ago    Writing API documentation                           @feature @docs
    (2)   [16:51]    14 hours ago    ->  14 hours ago    Implementing user authentication module             @feature @backend
    (3)   [50:01]    14 hours ago    ->  13 hours ago    Designing database schema for new features          @design @database
    (4)   [2:04:11]  13 hours ago    ->  11 hours ago    Improving responsive design for mobile devices      @feature @frontend
    (5)   [2:29:48]  11 hours ago    ->  9 hours ago     Conducting user experience research                 @research @ux
    (6)   [1:42:30]  9 hours ago     ->  7 hours ago     Implementing caching for improved performance       @performance @backend
    (7)   [9:08]     7 hours ago     ->  7 hours ago     Weekly team retrospective meeting                   @meeting @team
    (8)   [39:52]    7 hours ago     ->  6 hours ago     Refining search algorithm                           @algorithm @backend
    (9)   [2:13:23]  6 hours ago     ->  4 hours ago     Reviewing and merging pull requests                 @review @version-control
    (10)  [1:08:12]  4 hours ago     ->  3 hours ago     Developing new feature for customer analytics       @feature @analytics
    (11)  [2:27:32]  3 hours ago     ->  21 minutes ago  Solving compatibility issues in cross-platform app  @bugfix @mobile
    (12)  [3:14:14]  21 minutes ago  ->  in 3 hours      Organizing backlog and prioritizing tasks           @planning @management
    (13)  [3:07:36]  in 3 hours      ->  in 6 hours      Upgrading server infrastructure                     @maintenance @devops
    (14)  [1:41:17]  in 6 hours      ->  in 8 hours      Creating automated tests for new API endpoints      @testing @automation
    (15)  [2:02:29]  in 8 hours      ->  in 10 hours     Troubleshooting server downtime issues              @maintenance @devops
    (16)  [43:46]    in 10 hours     ->  in 10 hours     Designing user interface for the upcoming module    @design @frontend

* 0 Sheets not shown. use --all to show
Enter fullscreen mode Exit fullscreen mode

Conclusions

Clearly Chat GPT is a great tool for generating realistic datasets.

Try it out the next time you need some mock data with realistic textual content! 🚀

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (1)

Collapse
 
seif_sekalala_81e09fe6b9e profile image
Seif Sekalala

Good job! I’m embarking on a huge ChatGPT-human (book-) writing collabo, and I plan to prompt “Prof Chat” to help me generate a wide variety of data-sets! Here’s a relevant link below, if you’re interested.

But again, good job. Your chat-result (the data-set[s]) is an encouraging omen for my plan, so to speak! 🙏👍

Link:
drive.google.com/drive/folders/1-8...

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

AWS GenAI LIVE!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️