Agent Skill · OpenAI

imagegen

Generate or edit raster images when the task benefits from AI-created bitmap visuals such as photos, illustrations, textures, sprites, mockups, or transparent-background cutouts. Use when Codex should create a brand-new image, transform an existing image, or derive visual variants from references, and the output should be a bitmap asset rather than repo-native code or vector. Do not use when the task is better handled by editing existing SVG/vector/code-native assets, extending an established icon or logo system, or building the visual directly in HTML/CSS/canvas.

View SKILL.md on GitHub → Source repository Provider profile

Provider: OpenAI Path in repo: skills/.system/imagegen/SKILL.md

Skill body

Image Generation Skill

Generates or edits images for the current project (for example website assets, game assets, UI mockups, product mockups, wireframes, logo design, photorealistic images, or infographics).

Top-level modes and rules

This skill has exactly two top-level modes:

Default built-in tool mode (preferred): built-in image_gen tool for normal image generation and editing. Does not require OPENAI_API_KEY.
Fallback CLI mode (explicit-only): scripts/image_gen.py CLI. Use only when the user explicitly asks for the CLI path. Requires OPENAI_API_KEY.

Within the explicit CLI fallback only, the CLI exposes three subcommands:

generate
edit
generate-batch

Rules:

Use the built-in image_gen tool by default for all normal image generation and editing requests.
Never switch to CLI fallback automatically.
If the built-in tool fails or is unavailable, tell the user the CLI fallback exists and that it requires OPENAI_API_KEY. Proceed only if the user explicitly asks for that fallback.
If the user explicitly asks for CLI mode, use the bundled scripts/image_gen.py workflow. Do not create one-off SDK runners.
Never modify scripts/image_gen.py. If something is missing, ask the user before doing anything else.

Built-in save-path policy:

In built-in tool mode, Codex saves generated images under $CODEX_HOME/* by default.
Do not describe or rely on OS temp as the default built-in destination.
Do not describe or rely on a destination-path argument (if any) on the built-in image_gen tool. If a specific location is needed, generate first and then move or copy the selected output from $CODEX_HOME/generated_images/....
Save-path precedence in built-in mode:
1. If the user names a destination, move or copy the selected output there.
2. If the image is meant for the current project, move or copy the final selected image into the workspace before finishing.
3. If the image is only for preview or brainstorming, render it inline; the underlying file can remain at the default $CODEX_HOME/* path.
Never leave a project-referenced asset only at the default $CODEX_HOME/* path.
Do not overwrite an existing asset unless the user explicitly asked for replacement; otherwise create a sibling versioned filename such as hero-v2.png or item-icon-edited.png.

Shared prompt guidance for both modes lives in references/prompting.md and references/sample-prompts.md.

Fallback-only docs/resources for CLI mode:

references/cli.md
references/image-api.md
references/codex-network.md
scripts/image_gen.py

When to use

Generate a new image (concept art, product shot, cover, website hero)
Generate a new image using one or more reference images for style, composition, or mood
Edit an existing image (inpainting, lighting or weather transformations, background replacement, object removal, compositing, transparent background)
Produce many assets or variants for one task

When not to use

Extending or matching an existing SVG/vector icon set, logo system, or illustration library inside the repo
Creating simple shapes, diagrams, wireframes, or icons that are better produced directly in SVG, HTML/CSS, or canvas
Making a small project-local asset edit when the source file already exists in an editable native format
Any task where the user clearly wants deterministic code-native output instead of a generated bitmap

Decision tree

Think about two separate questions:

Intent: is this a new image or an edit of an existing image?
Execution strategy: is this one asset or many assets/variants?

Intent:

If the user wants to modify an existing image while preserving parts of it, treat the request as edit.
If the user provides images only as references for style, composition, mood, or subject guidance, treat the request as generate.
If the user provides no images, treat the request as generate.

Built-in edit semantics:

Built-in edit mode is for images already visible in the conversation context, such as attached images or images generated earlier in the thread.
If the user wants to edit a local image file with the built-in tool, first load it with built-in view_image tool so the image is visible in the conversation context, then proceed with the built-in edit flow.
Do not promise arbitrary filesystem-path editing through the built-in tool.
If a local file still needs direct file-path control, masks, or other explicit CLI-only parameters, use the explicit CLI fallback only when the user asks for it.
For edits, preserve invariants aggressively and save non-destructively by default.

Execution strategy:

In the built-in default path, produce many assets or variants by issuing one image_gen call per requested asset or variant.
In the explicit CLI fallback path, use the CLI generate-batch subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.

Assume the user wants a new image unless they clearly ask to change an existing one.

Workflow

Decide the top-level mode: built-in by default, fallback CLI only if explicitly requested.
Decide the intent: generate or edit.
Decide whether the output is preview-only or meant to be consumed by the current project.
Decide the execution strategy: single asset vs repeated built-in calls vs CLI generate-batch.
Collect inputs up front: prompt(s), exact text (verbatim), constraints/avoid list, and any input images.
For every input image, label its role explicitly:
- reference image
- edit target
- supporting insert/style/compositing input
If the edit target is only on the local filesystem and you are staying on the built-in path, inspect it with view_image first so the image is available in conversation context.
If the user asked for a photo, illustration, sprite, product image, banner, or other explicitly raster-style asset, use image_gen rather than substituting SVG/HTML/CSS placeholders. If the request is for an icon, logo, or UI graphic that should match existing repo-native SVG/vector/code assets, prefer editing those directly instead.
Augment the prompt based on specificity:
- If the user’s prompt is already specific and detailed, normalize it into a clear spec without adding creative requirements.
- If the user’s prompt is generic, add tasteful augmentation only when it materially improves output quality.
Use the built-in image_gen tool by default.
If the user explicitly chooses the CLI fallback, then and only then use the fallback-only docs for quality, input_fidelity, masks, output format, output paths, and network setup.
Inspect outputs and validate: subject, style, composition, text accuracy, and invariants/avoid items.
Iterate with a single targeted change, then re-check.
For preview-only work, render the image inline; the underlying file may remain at the default $CODEX_HOME/generated_images/... path.
For project-bound work, move or copy the selected artifact into the workspace and update any consuming code or references. Never leave a project-referenced asset only at the default $CODEX_HOME/generated_images/... path.
For batches, persist only the selected finals in the workspace unless the user explicitly asked to keep discarded variants.
Always report the final saved path for any workspace-bound asset, plus the final prompt and whether the built-in tool or fallback CLI mode was used.

Prompt augmentation

Reformat user prompts into a structured, production-oriented spec. Make the user’s goal clearer and more actionable, but do not blindly add detail.

Treat this as prompt-shaping guidance, not a closed schema. Use only the lines that help, and add a short extra labeled line when it materially improves clarity.

Specificity policy

Use the user’s prompt specificity to decide how much augmentation is appropriate:

If the prompt is already specific and detailed, preserve that specificity and only normalize/structure it.
If the prompt is generic, you may add tasteful augmentation when it will materially improve the result.

Allowed augmentations:

composition or framing hints
polish level or intended-use hints
practical layout guidance
reasonable scene concreteness that supports the stated request

Not allowed augmentations:

extra characters or objects that are not implied by the request
brand names, slogans, palettes, or narrative beats that are not implied
arbitrary side-specific placement unless the surrounding layout supports it

Use-case taxonomy (exact slugs)

Classify each request into one of these buckets and keep the slug consistent across prompts and references.

Generate:

photorealistic-natural — candid/editorial lifestyle scenes with real texture and natural lighting.
product-mockup — product/packaging shots, catalog imagery, merch concepts.
ui-mockup — app/web interface mockups and wireframes; specify the desired fidelity.
infographic-diagram — diagrams/infographics with structured layout and text.
logo-brand — logo/mark exploration, vector-friendly.
illustration-story — comics, children’s book art, narrative scenes.
stylized-concept — style-driven concept art, 3D/stylized renders.
historical-scene — period-accurate/world-knowledge scenes.

Edit:

text-localization — translate/replace in-image text, preserve layout.
identity-preserve — try-on, person-in-scene; lock face/body/pose.
precise-object-edit — remove/replace a specific element (including interior swaps).
lighting-weather — time-of-day/season/atmosphere changes only.
background-extraction — transparent background / clean cutout.
style-transfer — apply reference style while changing subject/scene.
compositing — multi-image insert/merge with matched lighting/perspective.
sketch-to-render — drawing/line art to photoreal render.

Shared prompt schema

Use the following labeled spec as shared prompt scaffolding for both top-level modes:

Use case: <taxonomy slug>
Asset type: <where the asset will be used>
Primary request: <user's main prompt>
Input images: <Image 1: role; Image 2: role> (optional)
Scene/backdrop: <environment>
Subject: <main subject>
Style/medium: <photo/illustration/3D/etc>
Composition/framing: <wide/close/top-down; placement>
Lighting/mood: <lighting + mood>
Color palette: <palette notes>
Materials/textures: <surface details>
Text (verbatim): "<exact text>"
Constraints: <must keep/must avoid>
Avoid: <negative constraints>

Notes:

Asset type and Input images are prompt scaffolding, not dedicated CLI flags.
Scene/backdrop refers to the visual setting. It is not the same as the fallback CLI background parameter, which controls output transparency behavior.
Fallback-only execution notes such as Quality:, Input fidelity:, masks, output format, and output paths belong in the explicit CLI path only. Do not treat them as built-in image_gen tool arguments.

Augmentation rules:

Keep it short.
Add only the details needed to improve the prompt materially.
For edits, explicitly list invariants (change only X; keep Y unchanged).
If any critical detail is missing and blocks success, ask a question; otherwise proceed.

Examples

Generation example (hero image)

Use case: product-mockup
Asset type: landing page hero
Primary request: a minimal hero image of a ceramic coffee mug
Style/medium: clean product photography
Composition/framing: wide composition with usable negative space for page copy if needed
Lighting/mood: soft studio lighting
Constraints: no logos, no text, no watermark

Edit example (invariants)

Use case: precise-object-edit
Asset type: product photo background replacement
Primary request: replace only the background with a warm sunset gradient
Constraints: change only the background; keep the product and its edges unchanged; no text; no watermark

Prompting best practices

Structure prompt as scene/backdrop -> subject -> details -> constraints.
Include intended use (ad, UI mock, infographic) to set the mode and polish level.
Use camera/composition language for photorealism.
Only use SVG/vector stand-ins when the user explicitly asked for vector output or a non-image placeholder.
Quote exact text and specify typography + placement.
For tricky words, spell them letter-by-letter and require verbatim rendering.
For multi-image inputs, reference images by index and describe how they should be used.
For edits, repeat invariants every iteration to reduce drift.
Iterate with single-change follow-ups.
If the prompt is generic, add only the extra detail that will materially help.
If the prompt is already detailed, normalize it instead of expanding it.
For explicit CLI fallback only, see references/cli.md and references/image-api.md for quality, input_fidelity, masks, output format, and output-path guidance.

More principles shared by both modes: references/prompting.md. Copy/paste specs shared by both modes: references/sample-prompts.md.

Guidance by asset type

Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in references/sample-prompts.md.

Fallback CLI mode only

Temp and output conventions

These conventions apply only to the explicit CLI fallback. They do not describe built-in image_gen output behavior.

Use tmp/imagegen/ for intermediate files (for example JSONL batches); delete them when done.
Write final artifacts under output/imagegen/.
Use --out or --out-dir to control output paths; keep filenames stable and descriptive.

Dependencies

Prefer uv for dependency management in this repo.

Required Python package:

uv pip install openai

Optional for downscaling only:

uv pip install pillow

Portability note:

If you are using the installed skill outside this repo, install dependencies into that environment with its package manager.
In uv-managed environments, uv pip install ... remains the preferred path.

Environment

OPENAI_API_KEY must be set for live API calls.
Do not ask the user for OPENAI_API_KEY when using the built-in image_gen tool.
Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.

If the key is missing, give the user these steps:

Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
Set OPENAI_API_KEY as an environment variable in their system.
Offer to guide them through setting the environment variable for their OS/shell if needed.

If installation is not possible in this environment, tell the user which dependency is missing and how to install it into their active environment.

Script-mode notes

CLI commands + examples: references/cli.md
API parameter quick reference: references/image-api.md
Network approvals / sandbox settings for CLI mode: references/codex-network.md

Reference map

references/prompting.md: shared prompting principles for both modes.
references/sample-prompts.md: shared copy/paste prompt recipes for both modes.
references/cli.md: fallback-only CLI usage via scripts/image_gen.py.
references/image-api.md: fallback-only API/CLI parameter reference.
references/codex-network.md: fallback-only network/sandbox troubleshooting for CLI mode.
scripts/image_gen.py: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode.