DEV Community

naoki_JPN
naoki_JPN

Posted on

OpenAI Codex Desktop Complete Guide — Mastering Skills, Plugins & Automations

Note: This article is a Japanese summary of a ~103-minute video by @DeRonin_ on X. This is the English translation. Original video: https://twitter.com/DeRonin_/status/2048823420977119727

Introduction

The OpenAI Codex desktop app is a comprehensive AI agent platform that goes far beyond coding assistance — covering design, document creation, research, and automation. This article summarizes the full 103-minute guide video.


Core Features of the Codex Desktop App

Here are the key features introduced at the start of the video.

Codex app feature overview and demo screen (0:00)

Project Management and File Organization

Codex manages chats in "project" units, each linked 1:1 to a local folder on your computer. Files generated through chat are automatically saved to an outputs/ folder inside the project directory, and any file in that folder can be referenced with @filename. You can open the folder instantly via the "Open in Finder" button.

Parallel Multitasking

You can run multiple chat threads simultaneously. Even while one agent is working, you can start new tasks in another chat. A blue dot notification appears when a task completes, so you can check results and give the next instruction right away.

Skills and Plugins

Skills are "reusable recipes"; plugins are "installable packages that bring those recipes into Codex." Hundreds of pre-built plugins exist for services like Google Calendar, Gmail, Figma, and Remotion. You can also combine external APIs with the skill creator to build your own custom skills. Once created, skills can be invoked in future sessions with /skill-name or @skill-name.

Automations

Set up recurring tasks with natural language — for example, "Every Friday at 4am, summarize my weekly calendar and send it via email." You can view, test, and edit automations from the Automations tab.

Computer Control

The agent literally controls your mouse and keyboard. This enables working with GUI apps that have no API, such as building apps in Xcode or navigating a browser.

In-App Image Generation

Generate images from prompts and use them directly in your workflow. The video demonstrated generating product images for a shoe brand and 10 iOS app icon variations. Transparent background generation is also supported.

Steer Feature

Even while an agent is processing, you can paste text or images and immediately redirect it ("fix this part"). Normally prompts queue up and wait their turn, but the "Steer" button lets you interrupt instantly.

Terminal Integration (Claude Code)

For design-heavy tasks, you can launch Claude Code from the terminal with claude --dangerously-skip-permissions. In the video, Claude Code was used to finalize landing pages and slide decks when Codex's design precision reached its limits.

Canva Export

Created PowerPoint files can be opened in Canva with one click for manual finishing of the last 5–10%.


The Difference Between Skills and Plugins

The video's mid-section used the Excalidraw skill to auto-generate a structure diagram.

Skill and Plugin structure diagrammed with Excalidraw (10:00)

Skill Plugin
Definition A reusable workflow package for specific tasks A unit that installs additional functionality into Codex
Role Bundles instructions, resources, and scripts to extend Codex's task-handling ability Bundles skills, apps, MCP Servers, and integrations
Purpose A recipe that ensures Codex executes workflows reliably Provides access to connected systems and packaged tools

Note: Simple way to remember:

  • Skill = reusable recipe
  • Plugin = installable package that brings that recipe into Codex

Design Tool Integration (Paper / Figma)

Landing page being auto-generated in Paper Alpha (30:00)

Codex integrates with Paper (Alpha), a Figma-like design tool.

Demo flow:

  1. Prompt: "Using the new Noo Shoo company logo image, create a landing page directly in Paper"
  2. Codex confirms Paper MCP actions and selects a transparent hero image
  3. Codex auto-decides design direction: editorial-tech, warm near-black neutral, cyan accents
  4. Auto-builds 4 sections: Hero, Performance Strip, Product Story, CTA/Footer

Note: Paper is a design tool built for AI agent collaboration, offering more intuitive operation than direct Figma editing.


Automations

Automation settings screen (35:00)

Automations can be created just by typing "do X every week" in chat. The video demonstrated two:

Weekly Calendar Summary
After connecting Google Calendar and Gmail plugins, just say "Every Friday at 4am, summarize this week's schedule and send it via email." Done. You can immediately see when the next run is scheduled.

Monthly YouTube Report
After creating a YouTube Researcher skill with the SuperData API, instruct: "On the last day of each month, use that skill to analyze this month's videos and compile them into a Word document." The resulting report includes hook analysis and a views-ranked table — delivered automatically.


Part 2 Highlights: 6 Projects in Parallel

In the second half of the video, using "Chorus (an AI agent learning app)" as the subject, the following 6 projects were created simultaneously:

Task Tools Used
iOS App (design + implementation) Swift, Xcode, Supabase, mobile design skill
Web Landing Page Tally, React, Claude Code, Vercel
Launch Video Remotion plugin, Claude Code
Investor Deck PowerPoint skill, Claude Code, Canva
X Post Automation Typefully skill
Project Plan Markdown (checklist)

The key is: after giving instructions to each task, move on to the next without waiting. Serial task accumulation becomes effective multitasking.


Summary

The Codex desktop app is a comprehensive AI agent platform covering not just coding, but design, documents, research, and automation.

  • Skills + Plugins — automate any workflow
  • Automations — fully automate recurring research and report creation
  • Design tool integration — applicable to non-engineer workflows
  • Multitasking (give instructions, then move on) is the core skill of the AI era
  • Codex + Claude Code combination: Codex for general orchestration, Claude Code for design-precision tasks

The ability to choose models and processing load based on task size and precision requirements is another strength of Codex.


Detailed Video Guide


Part 1 — Mastering the Basics

Download and Project Management

Search "Codex app download" in your browser and download from chatgpt.com. The initial screen looks like ChatGPT's chat interface, but the internals are completely different.

Codex's standout feature is project management linked to local folders. Before starting a chat, you specify which folder to work in. That folder becomes the "project," and all files created by the agent are auto-saved to its outputs/ folder.

From the project side panel you can open the folder in Finder or reference files with @filename. Even with 30+ projects, Command+G search lets you find any chat instantly by name or content.

In permission settings, "Full Access" mode lets the agent work without approval prompts. The recommended defaults are GPT-5.4 model and Extra High processing load.


Using Skills and Plugins

Skills and plugins are often confused, but the essential difference is:

Skills are "recipes" for the agent — step-by-step instructions for executing specific tasks. Plugins are those recipes packaged into installable units. Think of it as "plugin = container for skills."

To explore what a new plugin can do, the fastest approach is to open a new chat and ask: @Figma tell me everything you can do with this plugin. Clicking the ▼ (caret) on the response shows the thinking process too.


Practical Demo: Automating with Google Calendar + Gmail

Google Calendar integration and automation setup (15:00)

Installing the Google Calendar plugin is as simple as selecting "Google Calendar" from Plugins and signing in via browser.

After connecting, these operations complete in a single conversation:

  1. "List all my events this week" → All calendar events displayed
  2. "Send me a weekly summary by email" → Sent via Gmail immediately
  3. "Set this as an automation every Friday at 4am" → Registered as a weekly task

The Automations tab shows next run time, status, and a test run button. After creation, you can edit with natural language like "always use the Gmail skill."


Generating Designs with Figma and Paper MCP

Figma plugin's main use is "converting existing Figma boards to code." It's not suited for the reverse direction (having AI generate designs and place them in Figma).

Paper (Alpha) fills that role. It's a design tool built for AI agent collaboration:

"Using the new shoe PNG (no background), create a landing page in Paper"
↓
Codex calls Paper MCP and decides design direction
↓
Auto-builds 4 sections: Hero, Performance Strip, Product Story, CTA
Enter fullscreen mode Exit fullscreen mode

Setting chat to "mini-window" mode lets you float Codex minimized to the side while viewing Paper.

Codex also has a Steer feature. Normally prompts queue up while AI is working, but pressing "Steer" lets you interrupt instantly. You can paste a screenshot and say "this part is overlapping, fix it" — and the agent course-corrects mid-task.


Building a Custom Skill: YouTube Researcher

By combining external APIs, you can add capabilities Codex doesn't have natively. Here's the process using a YouTube transcript-fetching skill as an example:

Step 1: Find an API

Ask Codex "give me the top 5 APIs for getting YouTube transcripts" — it suggests SuperData, Transcript API, YouTube Transcript.io, etc. SuperData is free up to 100 requests/month.

Step 2: Create the skill

In a new chat, enter:

Use skill creator to build a skill that fetches and summarizes
the latest 10 video transcripts from a specific channel using SuperData API.
API key: [paste here]
Enter fullscreen mode Exit fullscreen mode

Typing skill creator activates a skill-creation focused mode.

Step 3: Use the skill

After creation, open a new chat and type "YouTube Researcher" to use it:

Research Riley Brown's latest 10 YouTube videos,
get transcripts and compile them into a document.
Include which videos performed well, with hook (intro) analysis.
Add thumbnails too.
Enter fullscreen mode Exit fullscreen mode

The resulting report includes hook win/loss analysis — "Claude is taking over" (urgency, big market shift) and "Claude Code Leak" rated highly, while vibe-coding videos showed low performance.

Afterwards, it was automated: "On the last day of each month, use this skill to analyze this month's videos and auto-create a Word report."


Part 2 — Building 6 Projects in Parallel

Note: From here, "Chorus (an AI agent learning app)" is used as the subject to demo building 6 projects simultaneously. The core is "give instructions, then move on to the next" — serial task accumulation.

How to Set Up a Project Plan

First create a "My New Business" folder and start a new project. The plan is created from chat as a Markdown file:

Attach a screenshot and say:

"Looking at this, create a checklist-format project plan.
 Items: iOS app, web landing page,
 mobile app design, launch video, investor deck,
 X post automation — 6 items total.
 Include the app idea at the top."
Enter fullscreen mode Exit fullscreen mode

Chorus app concept: A platform for learning about AI agents. An iOS app providing tool comparisons, a skills library (copy-paste ready), and learning content.


iOS App: From Design to TestFlight

Creating the screen design skill

Just paste the instructions exported from claude.ai/design's new design tool into Codex and say "create a mobile design skill that can do the same thing" — your custom skill is complete.

"Using the mobile design skill, create screens for the Chorus app
 in basic Apple style"
Enter fullscreen mode Exit fullscreen mode

The result shows a prototype link with a 4-tab mockup: Learn, Platforms, Skills, Saved.

Building in Xcode

"Create a Swift mobile app called Chorus.
 For now just display 'Hello, this is Chorus' in the center of the screen.
 When done, open the Xcode project."
Enter fullscreen mode Exit fullscreen mode

Pressing "Play" in Xcode + iOS Simulator (or real device) reflects the latest build each time. After integrating the screen designs, connect Supabase.

Supabase Connection and Auth

Supabase is the de facto database for AI agents. After configuring MCP and restarting Codex, the connection is reflected. Post-restart, say "create all tables once connected" — skill categories, platforms, skills, and saved items tables are auto-generated.

Authentication was implemented with email + password (Google Sign-In was attempted first, but Supabase's native email auth was the fastest path). Turn off email confirmation in Supabase and you can sign in immediately.

The video completed upload to TestFlight.


Web Landing Page: Tally + React + Vercel

Preparing the form (tally.so)

Create a waitlist form in tally.so using a template with name and email fields. Copy the "embed code" when done.

Running as a React app

"I'm using tally.so. Embed this form in the site
 and run it locally as a React app. We'll do design later."
Enter fullscreen mode Exit fullscreen mode

Styling with Claude Code

Since Codex struggles with design, call Claude Code from the terminal:

claude --dangerously-skip-permissions
Enter fullscreen mode Exit fullscreen mode
"Forget all the styling on this page.
 Look at the Chorus app code and match the fonts and design.
 Keep the Tally embed as is.
 Minimal text, simple, conversion-focused."
Enter fullscreen mode Exit fullscreen mode

Claude Code dramatically improves it in minutes. When done, "Deploy to Vercel and give me a public link" completes the process.


Launch Video: Remotion Plugin

Creating motion graphics video with Remotion (55:00)

Install the Remotion plugin and just type @remotion in a new chat.

"Create a launch video for the Chorus app.
 As a test video: take the attached app screen screenshots,
 put them in iPhone mockups on a white background with animation.
 Get it running on localhost."
Enter fullscreen mode Exit fullscreen mode

Opening localhost:3031 shows the timeline editor. Time is specified in seconds.frames format (e.g., 2.20 = 2 seconds 20 frames).

You can steer corrections at any point during processing. Display gridlines and pass coordinates to the agent (e.g., "X axis 1040, Y axis 540") for precise positioning.

For design-precision elements (animations, color cards, cut quality), delegating to Claude Code produces dramatically better results. For BGM, just attach an MP3 file to the message and say "add this at 50% volume."


Investor Deck: Chat Fork and Canva Integration

Fork the chat

Right-click the mobile app chat and select "Fork into Local" to create a new chat inheriting the same context. Rename it "Investor Deck" and start working.

"Analyze the app's features, icon, and style,
 then create an investor slide deck with the same design.
 Use the PowerPoint skill.
 Research what investors want in April 2026 and match the style."
Enter fullscreen mode Exit fullscreen mode

Refine with Claude Code

claude --dangerously-skip-permissions
"Look at this deck, reduce text, increase visuals.
 Add charts and diagrams for readability. Don't add more slides."
Enter fullscreen mode Exit fullscreen mode

Export to Canva

A "Canva" icon appears next to the PowerPoint file. Click it and Canva opens for final touches. Animations can be added too.


X Post Automation: Typefully Skill

Get the Typefully API key (a scheduling tool for multiple Twitter accounts) and instruct:

"Research the Typefully API and create a skill for full control.
 Test with the Riley Brown account (identify with fruit emoji).
 API key: [paste here]"
Enter fullscreen mode Exit fullscreen mode

After the skill is complete, automate it:

"Set up an automation to create 3 X post drafts every morning.
 Use the Typefully control skill."
Enter fullscreen mode Exit fullscreen mode

Final Results: All Tasks Summary

Results achieved by the end of the video:

Task Result
iOS App Published to TestFlight (Learn, Platforms, Skills, Saved features)
Web Landing Page Live on Vercel, Tally form working
Launch Video First draft complete with Remotion + Claude Code
Investor Deck Exported to Canva, manually polished
X Post Automation 3 daily drafts scheduled
Project Plan All 6 items checked off

Note: Key lesson from the video:
AI agents can take 1–2 hours per task. Instead of waiting, "give a new agent new instructions → move on" — repeated. This serial task accumulation is the core of AI-era productivity.

Top comments (0)