Skip to content

Advanced image generation

The improved image generation task type, which alse supports image edition and more detailed parameters. Once the task moves out of the waiting queue, it typically completes within a few seconds. This task costs 5 credits when model_version is flux.1_kontext_pro, flux.1_dev, gpt_4o, or gemini_2.5_flash_image_preview, and 10 credits when it is gpt_image_1.5, gpt_image_2, midjourney, gemini_3_pro_image_preview, or gemini_3.1_flash_image_preview. No additional credits cost by parameters.

Endpoint

Parameters

Required parameters

type: Must be set to generate_image.

prompt: A text value that directs the model generation. The maximum prompt length is 1024 characters, equivalent to approximately 100 words. The API supports multiple languages. However, emojis and certain special Unicode characters are not supported.

TIP

When using multiple reference images via the files parameter, you can specify which image to reference using [image number] syntax in your prompt (For example, "Use the style of image[1] with colors from image[2]").

Optional parameters

model_version: Image model version. Available versions are as below. If not set, the default version will be used:

  • flux.1_kontext_pro (default)
  • flux.1_dev (unable to use with image file, if requested with image file, it will upgraded to default version)
  • gpt_4o (gpt-image-1)
  • gpt_image_1.5
  • gpt_image_2
  • midjourney (unable to use with image file)
  • gemini_2.5_flash_image_preview (also known as nano banana)
  • gemini_3_pro_image_preview (also known as nano banana pro)
  • gemini_3.1_flash_image_preview (also known as nano banana 2)

Caution

flux.1_kontext_pro does not support WebP input images.

template: The image template slug used to apply a preset style package. When this field is set, the system prepends the template prompt before your prompt and merges template images as additional reference images. Available values:

TitleDescription
asset_extractionExtract scene elements into separate assets for 3D generation.
Best with: Nano Banana 2, Image Input, ar 16:9, Smart Mesh
character_completionComplete missing parts to restore a full character.
Best with: Nano Banana, Image Input
t_poseConvert character to standard T-pose for rigging and animation.
Best with: Nano Banana, ar 1:1, Smart Mesh
head_extractionExtract the head to enhance facial detail for high-fidelity 3D generation.
Best with: Nano Banana, ar 1:1, Image Input
3d_enhanceEnhance 3D structure and detail (2D → 3D).
Best with: Nano Banana, Image Input
variantsGenerate multiple consistent variations based on the original input.
Best with: Nano Banana 2, Text/Image Input
print_clayConvert to high-contrast clay for 3D printing.
Best with: Nano Banana 2, Image Input, HD Model
figureConvert your photo into a stylized figure character.
Best with: Nano Banana 2, Image Input, HD Model

file: Specifies the image input.

  • type: Indicates the file type. Although currently not validated, specifying the correct file type is strongly advised.
  • file_token: The identifier you get from upload, please refer to part of Upload directly. Mutually exclusive with url and object.
  • url: A direct URL to the image. Supports JPEG and PNG formats with a maximum size of 20MB. Mutually exclusive with file_token and object.
  • object (Strongly Recommended): The information you get from upload, please refer to Upload in STS. Mutually exclusive with url and file_token.
    • bucket: Normally it always will be tripo-data.
    • key: The resource_uri from returns.

files: Specifies the image inputs. This is a list of file. For flux.1_kontext_pro, the max length of files is 4. For gpt_4o, gpt_image_2, and gemini_2.5_flash_image_preview, the max length of files is 10.

t_pose: A bool value to transform your object to t pose while keeping main characteristics. The default value is false.

sketch_to_render: A bool value to transform your sketch to a rendered image. The default value is false.

Returns

task_id: The identifier for the successfully submitted task.