Advanced image generation
The improved image generation task type, which alse supports image edition and more detailed parameters. Once the task moves out of the waiting queue, it typically completes within a few seconds. This task costs 5 credits when model_version is flux.1_kontext_pro, flux.1_dev, gpt_4o, or gemini_2.5_flash_image_preview, and 10 credits when it is gpt_image_1.5, gpt_image_2, midjourney, gemini_3_pro_image_preview, or gemini_3.1_flash_image_preview. No additional credits cost by parameters.
Endpoint
Parameters
Required parameters
type: Must be set to generate_image.
prompt: A text value that directs the model generation. The maximum prompt length is 1024 characters, equivalent to approximately 100 words. The API supports multiple languages. However, emojis and certain special Unicode characters are not supported.
TIP
When using multiple reference images via the files parameter, you can specify which image to reference using [image number] syntax in your prompt (For example, "Use the style of image[1] with colors from image[2]").
Optional parameters
model_version: Image model version. Available versions are as below. If not set, the default version will be used:
flux.1_kontext_pro(default)flux.1_dev(unable to use with image file, if requested with image file, it will upgraded to default version)gpt_4o(gpt-image-1)gpt_image_1.5gpt_image_2midjourney(unable to use with image file)gemini_2.5_flash_image_preview(also known as nano banana)gemini_3_pro_image_preview(also known as nano banana pro)gemini_3.1_flash_image_preview(also known as nano banana 2)
Caution
flux.1_kontext_pro does not support WebP input images.
template: The image template slug used to apply a preset style package. When this field is set, the system prepends the template prompt before your prompt and merges template images as additional reference images. Available values:
| Title | Description |
|---|---|
asset_extraction | Extract scene elements into separate assets for 3D generation. Best with: Nano Banana 2, Image Input, ar 16:9, Smart Mesh |
character_completion | Complete missing parts to restore a full character. Best with: Nano Banana, Image Input |
t_pose | Convert character to standard T-pose for rigging and animation. Best with: Nano Banana, ar 1:1, Smart Mesh |
head_extraction | Extract the head to enhance facial detail for high-fidelity 3D generation. Best with: Nano Banana, ar 1:1, Image Input |
3d_enhance | Enhance 3D structure and detail (2D → 3D). Best with: Nano Banana, Image Input |
variants | Generate multiple consistent variations based on the original input. Best with: Nano Banana 2, Text/Image Input |
print_clay | Convert to high-contrast clay for 3D printing. Best with: Nano Banana 2, Image Input, HD Model |
figure | Convert your photo into a stylized figure character. Best with: Nano Banana 2, Image Input, HD Model |
file: Specifies the image input.
type: Indicates the file type. Although currently not validated, specifying the correct file type is strongly advised.file_token: The identifier you get from upload, please refer to part of Upload directly. Mutually exclusive withurlandobject.url: A direct URL to the image. Supports JPEG and PNG formats with a maximum size of 20MB. Mutually exclusive withfile_tokenandobject.object(Strongly Recommended): The information you get from upload, please refer to Upload in STS. Mutually exclusive withurlandfile_token.bucket: Normally it always will betripo-data.key: Theresource_urifrom returns.
files: Specifies the image inputs. This is a list of file. For flux.1_kontext_pro, the max length of files is 4. For gpt_4o, gpt_image_2, and gemini_2.5_flash_image_preview, the max length of files is 10.
t_pose: A bool value to transform your object to t pose while keeping main characteristics. The default value is false.
sketch_to_render: A bool value to transform your sketch to a rendered image. The default value is false.
Returns
task_id: The identifier for the successfully submitted task.