text-to-video

Multi-modal hybrid video generation that combines audio, video, image, and text inputs in any mix. Suitable for music-to-video, extracting camera motion and rhythm references from video, and multi-source blended generation. At least one media type must be provided (text-only generation is not supported).

meitu video-multimodal-generate

Usage Examples

# Music-driven video (required parameters + audio)
meitu video-multimodal-generate \
  --reference_audio_list ./music.mp3 \
  --prompt "Visuals that follow the rhythm of the music" \
  --json

# Image + video driven
meitu video-multimodal-generate \
  --image_list ./style.jpg \
  --reference_video_list ./camera.mp4 \
  --prompt "Camera motion reference + reference image style" \
  --json

# Multiple sources with full parameters and result download
meitu video-multimodal-generate \
  --image_list ./ref1.jpg \
  --reference_video_list ./ref.mp4 \
  --reference_audio_list ./bgm.mp3 \
  --prompt "Multi-source blended generation" \
  --video_duration 8 \
  --ratio 16:9 \
  --sound on \
  --json \
  --download-dir ./output

Parameters

Parameter	Required	Description
`--image_list`	No	Type: string[]; optional reference images (up to 9)
`--reference_video_list`	No	Type: string[]; optional reference videos (up to 3; total duration max 15 seconds)
`--reference_audio_list`	No	Type: string[]; optional audio drive (up to 3; total duration max 15 seconds)
`--prompt`	Yes	Type: string; generation description (*at least one media type must be provided; total assets max 12)
`--video_duration`	No	Type: number; default: -1 (auto); generated video duration
`--ratio`	No	Type: string; aspect ratio
`--sound`	No	Type: string; options: on / off; whether to include audio
`--resolution`	No	Type: string; output resolution
`--download-dir`	No	Type: string; downloads result files to the specified local directory
`--output`	No	Type: string[]; specifies output file paths, mapped in order to data.result.urls
`--json`	No	Outputs results in JSON format for script or agent parsing

Open on desktop

Browser not supported

text-to-video

Usage Examples

Parameters