Note: This article is a Japanese summary of a ~103-minute video by @DeRonin_ on X. This is the English translation. Original video: https://twitter.com/DeRonin_/status/2048823420977119727
Introduction
The OpenAI Codex desktop app is a comprehensive AI agent platform that goes far beyond coding assistance — covering design, document creation, research, and automation. This article summarizes the full 103-minute guide video.
Core Features of the Codex Desktop App
Here are the key features introduced at the start of the video.
Project Management and File Organization
Codex manages chats in "project" units, each linked 1:1 to a local folder on your computer. Files generated through chat are automatically saved to an outputs/ folder inside the project directory, and any file in that folder can be referenced with @filename. You can open the folder instantly via the "Open in Finder" button.
Parallel Multitasking
You can run multiple chat threads simultaneously. Even while one agent is working, you can start new tasks in another chat. A blue dot notification appears when a task completes, so you can check results and give the next instruction right away.
Skills and Plugins
Skills are "reusable recipes"; plugins are "installable packages that bring those recipes into Codex." Hundreds of pre-built plugins exist for services like Google Calendar, Gmail, Figma, and Remotion. You can also combine external APIs with the skill creator to build your own custom skills. Once created, skills can be invoked in future sessions with /skill-name or @skill-name.
Automations
Set up recurring tasks with natural language — for example, "Every Friday at 4am, summarize my weekly calendar and send it via email." You can view, test, and edit automations from the Automations tab.
Computer Control
The agent literally controls your mouse and keyboard. This enables working with GUI apps that have no API, such as building apps in Xcode or navigating a browser.
In-App Image Generation
Generate images from prompts and use them directly in your workflow. The video demonstrated generating product images for a shoe brand and 10 iOS app icon variations. Transparent background generation is also supported.
Steer Feature
Even while an agent is processing, you can paste text or images and immediately redirect it ("fix this part"). Normally prompts queue up and wait their turn, but the "Steer" button lets you interrupt instantly.
Terminal Integration (Claude Code)
For design-heavy tasks, you can launch Claude Code from the terminal with claude --dangerously-skip-permissions. In the video, Claude Code was used to finalize landing pages and slide decks when Codex's design precision reached its limits.
Canva Export
Created PowerPoint files can be opened in Canva with one click for manual finishing of the last 5–10%.
The Difference Between Skills and Plugins
The video's mid-section used the Excalidraw skill to auto-generate a structure diagram.
| Skill | Plugin | |
|---|---|---|
| Definition | A reusable workflow package for specific tasks | A unit that installs additional functionality into Codex |
| Role | Bundles instructions, resources, and scripts to extend Codex's task-handling ability | Bundles skills, apps, MCP Servers, and integrations |
| Purpose | A recipe that ensures Codex executes workflows reliably | Provides access to connected systems and packaged tools |
Note: Simple way to remember:
- Skill = reusable recipe
- Plugin = installable package that brings that recipe into Codex
Design Tool Integration (Paper / Figma)
Codex integrates with Paper (Alpha), a Figma-like design tool.
Demo flow:
- Prompt: "Using the new Noo Shoo company logo image, create a landing page directly in Paper"
- Codex confirms Paper MCP actions and selects a transparent hero image
- Codex auto-decides design direction: editorial-tech, warm near-black neutral, cyan accents
- Auto-builds 4 sections: Hero, Performance Strip, Product Story, CTA/Footer
Note: Paper is a design tool built for AI agent collaboration, offering more intuitive operation than direct Figma editing.
Automations
Automations can be created just by typing "do X every week" in chat. The video demonstrated two:
Weekly Calendar Summary
After connecting Google Calendar and Gmail plugins, just say "Every Friday at 4am, summarize this week's schedule and send it via email." Done. You can immediately see when the next run is scheduled.
Monthly YouTube Report
After creating a YouTube Researcher skill with the SuperData API, instruct: "On the last day of each month, use that skill to analyze this month's videos and compile them into a Word document." The resulting report includes hook analysis and a views-ranked table — delivered automatically.
Part 2 Highlights: 6 Projects in Parallel
In the second half of the video, using "Chorus (an AI agent learning app)" as the subject, the following 6 projects were created simultaneously:
| Task | Tools Used |
|---|---|
| iOS App (design + implementation) | Swift, Xcode, Supabase, mobile design skill |
| Web Landing Page | Tally, React, Claude Code, Vercel |
| Launch Video | Remotion plugin, Claude Code |
| Investor Deck | PowerPoint skill, Claude Code, Canva |
| X Post Automation | Typefully skill |
| Project Plan | Markdown (checklist) |
The key is: after giving instructions to each task, move on to the next without waiting. Serial task accumulation becomes effective multitasking.
Summary
The Codex desktop app is a comprehensive AI agent platform covering not just coding, but design, documents, research, and automation.
- Skills + Plugins — automate any workflow
- Automations — fully automate recurring research and report creation
- Design tool integration — applicable to non-engineer workflows
- Multitasking (give instructions, then move on) is the core skill of the AI era
- Codex + Claude Code combination: Codex for general orchestration, Claude Code for design-precision tasks
The ability to choose models and processing load based on task size and precision requirements is another strength of Codex.
Detailed Video Guide
Part 1 — Mastering the Basics
Download and Project Management
Search "Codex app download" in your browser and download from chatgpt.com. The initial screen looks like ChatGPT's chat interface, but the internals are completely different.
Codex's standout feature is project management linked to local folders. Before starting a chat, you specify which folder to work in. That folder becomes the "project," and all files created by the agent are auto-saved to its outputs/ folder.
From the project side panel you can open the folder in Finder or reference files with @filename. Even with 30+ projects, Command+G search lets you find any chat instantly by name or content.
In permission settings, "Full Access" mode lets the agent work without approval prompts. The recommended defaults are GPT-5.4 model and Extra High processing load.
Using Skills and Plugins
Skills and plugins are often confused, but the essential difference is:
Skills are "recipes" for the agent — step-by-step instructions for executing specific tasks. Plugins are those recipes packaged into installable units. Think of it as "plugin = container for skills."
To explore what a new plugin can do, the fastest approach is to open a new chat and ask: @Figma tell me everything you can do with this plugin. Clicking the ▼ (caret) on the response shows the thinking process too.
Practical Demo: Automating with Google Calendar + Gmail
Installing the Google Calendar plugin is as simple as selecting "Google Calendar" from Plugins and signing in via browser.
After connecting, these operations complete in a single conversation:
- "List all my events this week" → All calendar events displayed
- "Send me a weekly summary by email" → Sent via Gmail immediately
- "Set this as an automation every Friday at 4am" → Registered as a weekly task
The Automations tab shows next run time, status, and a test run button. After creation, you can edit with natural language like "always use the Gmail skill."
Generating Designs with Figma and Paper MCP
Figma plugin's main use is "converting existing Figma boards to code." It's not suited for the reverse direction (having AI generate designs and place them in Figma).
Paper (Alpha) fills that role. It's a design tool built for AI agent collaboration:
"Using the new shoe PNG (no background), create a landing page in Paper"
↓
Codex calls Paper MCP and decides design direction
↓
Auto-builds 4 sections: Hero, Performance Strip, Product Story, CTA
Setting chat to "mini-window" mode lets you float Codex minimized to the side while viewing Paper.
Codex also has a Steer feature. Normally prompts queue up while AI is working, but pressing "Steer" lets you interrupt instantly. You can paste a screenshot and say "this part is overlapping, fix it" — and the agent course-corrects mid-task.
Building a Custom Skill: YouTube Researcher
By combining external APIs, you can add capabilities Codex doesn't have natively. Here's the process using a YouTube transcript-fetching skill as an example:
Step 1: Find an API
Ask Codex "give me the top 5 APIs for getting YouTube transcripts" — it suggests SuperData, Transcript API, YouTube Transcript.io, etc. SuperData is free up to 100 requests/month.
Step 2: Create the skill
In a new chat, enter:
Use skill creator to build a skill that fetches and summarizes
the latest 10 video transcripts from a specific channel using SuperData API.
API key: [paste here]
Typing skill creator activates a skill-creation focused mode.
Step 3: Use the skill
After creation, open a new chat and type "YouTube Researcher" to use it:
Research Riley Brown's latest 10 YouTube videos,
get transcripts and compile them into a document.
Include which videos performed well, with hook (intro) analysis.
Add thumbnails too.
The resulting report includes hook win/loss analysis — "Claude is taking over" (urgency, big market shift) and "Claude Code Leak" rated highly, while vibe-coding videos showed low performance.
Afterwards, it was automated: "On the last day of each month, use this skill to analyze this month's videos and auto-create a Word report."
Part 2 — Building 6 Projects in Parallel
Note: From here, "Chorus (an AI agent learning app)" is used as the subject to demo building 6 projects simultaneously. The core is "give instructions, then move on to the next" — serial task accumulation.
How to Set Up a Project Plan
First create a "My New Business" folder and start a new project. The plan is created from chat as a Markdown file:
Attach a screenshot and say:
"Looking at this, create a checklist-format project plan.
Items: iOS app, web landing page,
mobile app design, launch video, investor deck,
X post automation — 6 items total.
Include the app idea at the top."
Chorus app concept: A platform for learning about AI agents. An iOS app providing tool comparisons, a skills library (copy-paste ready), and learning content.
iOS App: From Design to TestFlight
Creating the screen design skill
Just paste the instructions exported from claude.ai/design's new design tool into Codex and say "create a mobile design skill that can do the same thing" — your custom skill is complete.
"Using the mobile design skill, create screens for the Chorus app
in basic Apple style"
The result shows a prototype link with a 4-tab mockup: Learn, Platforms, Skills, Saved.
Building in Xcode
"Create a Swift mobile app called Chorus.
For now just display 'Hello, this is Chorus' in the center of the screen.
When done, open the Xcode project."
Pressing "Play" in Xcode + iOS Simulator (or real device) reflects the latest build each time. After integrating the screen designs, connect Supabase.
Supabase Connection and Auth
Supabase is the de facto database for AI agents. After configuring MCP and restarting Codex, the connection is reflected. Post-restart, say "create all tables once connected" — skill categories, platforms, skills, and saved items tables are auto-generated.
Authentication was implemented with email + password (Google Sign-In was attempted first, but Supabase's native email auth was the fastest path). Turn off email confirmation in Supabase and you can sign in immediately.
The video completed upload to TestFlight.
Web Landing Page: Tally + React + Vercel
Preparing the form (tally.so)
Create a waitlist form in tally.so using a template with name and email fields. Copy the "embed code" when done.
Running as a React app
"I'm using tally.so. Embed this form in the site
and run it locally as a React app. We'll do design later."
Styling with Claude Code
Since Codex struggles with design, call Claude Code from the terminal:
claude --dangerously-skip-permissions
"Forget all the styling on this page.
Look at the Chorus app code and match the fonts and design.
Keep the Tally embed as is.
Minimal text, simple, conversion-focused."
Claude Code dramatically improves it in minutes. When done, "Deploy to Vercel and give me a public link" completes the process.
Launch Video: Remotion Plugin
Install the Remotion plugin and just type @remotion in a new chat.
"Create a launch video for the Chorus app.
As a test video: take the attached app screen screenshots,
put them in iPhone mockups on a white background with animation.
Get it running on localhost."
Opening localhost:3031 shows the timeline editor. Time is specified in seconds.frames format (e.g., 2.20 = 2 seconds 20 frames).
You can steer corrections at any point during processing. Display gridlines and pass coordinates to the agent (e.g., "X axis 1040, Y axis 540") for precise positioning.
For design-precision elements (animations, color cards, cut quality), delegating to Claude Code produces dramatically better results. For BGM, just attach an MP3 file to the message and say "add this at 50% volume."
Investor Deck: Chat Fork and Canva Integration
Fork the chat
Right-click the mobile app chat and select "Fork into Local" to create a new chat inheriting the same context. Rename it "Investor Deck" and start working.
"Analyze the app's features, icon, and style,
then create an investor slide deck with the same design.
Use the PowerPoint skill.
Research what investors want in April 2026 and match the style."
Refine with Claude Code
claude --dangerously-skip-permissions
"Look at this deck, reduce text, increase visuals.
Add charts and diagrams for readability. Don't add more slides."
Export to Canva
A "Canva" icon appears next to the PowerPoint file. Click it and Canva opens for final touches. Animations can be added too.
X Post Automation: Typefully Skill
Get the Typefully API key (a scheduling tool for multiple Twitter accounts) and instruct:
"Research the Typefully API and create a skill for full control.
Test with the Riley Brown account (identify with fruit emoji).
API key: [paste here]"
After the skill is complete, automate it:
"Set up an automation to create 3 X post drafts every morning.
Use the Typefully control skill."
Final Results: All Tasks Summary
Results achieved by the end of the video:
| Task | Result |
|---|---|
| iOS App | Published to TestFlight (Learn, Platforms, Skills, Saved features) |
| Web Landing Page | Live on Vercel, Tally form working |
| Launch Video | First draft complete with Remotion + Claude Code |
| Investor Deck | Exported to Canva, manually polished |
| X Post Automation | 3 daily drafts scheduled |
| Project Plan | All 6 items checked off |
Note: Key lesson from the video:
AI agents can take 1–2 hours per task. Instead of waiting, "give a new agent new instructions → move on" — repeated. This serial task accumulation is the core of AI-era productivity.




Top comments (0)