Building a UGC ad in 90 minutes with Seedance + Remotion (with the prompts)

TL;DR. The fal.ai + Remotion pipeline produces a vertical 9:16 UGC ad in ~90 minutes including review. Stack: GPT-Image for the persona, Seedance 2.0 for the motion, fal AI voiceover, fal lip-sync, Remotion for composition + captions + B-roll. Below: the actual order of operations and the prompts we use as starting points. This is the workflow our Producer + Editor agents run inside Cockpit for the Creative product line.

Tools used

Stage	Tool	Purpose
1. Persona	GPT-Image (OpenAI via fal.ai)	Hero shot of the AI creator (multiple angles)
2. Motion	Seedance 2.0 (fal.ai)	3-6s talking-head clips
3. Voice	fal AI TTS	Voiceover matched to script + accent
4. Lip-sync	fal lip-sync model	Mouth movement aligned to voice
5. B-roll	Seedance / Veo (fal.ai)	Product close-ups, text-only inserts
6. Composition	Remotion	Cuts, captions, music, transitions, export

The 90-minute walkthrough

Min 0-10. Brief + script. Locked-down: one hook (first 1.5s), one problem statement, one product reveal, one CTA. 30-45 seconds total. The script is written by Cockpit's Copywriter agent or by hand — both work.

Min 10-20. Persona generation. GPT-Image prompt template:

Scene: [setting that matches the audience — kitchen / desk / car / cafe], natural light.
Subject: a [age range] [gender] [region] person looking at camera, holding a smartphone.
Important: photoreal, casual outfit, no makeup, no logos visible. Slight smile. Eye contact with camera.
Use case: hero frame for vertical UGC ad.
Constraints: no text, no overlays, single person, neutral background. 9:16 aspect.

Generate 4 variants. Pick one.

Min 20-40. Motion generation. Seedance prompt:

The person from the reference image talks naturally to the camera, gesturing slightly with their free hand. Subtle head movement. Hold the camera angle. Length: 6 seconds.

Run 3-4 takes. Pick the most natural.

Min 40-50. Voiceover. Run the script through fal TTS with the matching accent (FR-MA, FR-FR, EN-MENA, etc.). Generate 2-3 takes.
Min 50-65. Lip-sync. Feed the chosen Seedance clip + the chosen voiceover into the fal lip-sync model. Output is the talking-head clip with mouth aligned.
Min 65-80. B-roll. Generate product close-ups via Seedance (3-second shots of the product from different angles). Add Remotion text inserts for the hook line and CTA.
Min 80-90. Compose + export. Drop everything in Remotion: hook → talking-head → B-roll → CTA. Add captions (we use a hand-tuned word-by-word style — performs better than auto-bouncing styles in MENA testing). Music from a licensed library. Export 4K, 1080p, and a 720p version for fast distribution.

What breaks

Lip-sync drift. If the voiceover is >6s, the model loses sync. Cut the script tighter.
Persona consistency across clips. Use the same reference image for every motion generation in a campaign. Don't re-roll the persona.
Music levels. AI-generated voice is quiet by default. Boost +6 dB before mixing or the music drowns it.
Caption timing. Auto-caption tools mistime first-second hooks. Hand-time the first 1.5s.

Output expectations

One 30-45s ad with one persona, two B-roll inserts, captions, music, three platform exports. Margin for ~2 revisions stays inside the 90-minute window. For a full campaign (8+ ads with persona + script variety), budget 1 day of focused work or — easier — book a UGC Sprint and we ship it.

If you want this

Sprint pack: UGC Sprint (5-10 videos in a week, €1,000–2,500). Monthly drumbeat: Series (4-8 videos/month, €2,000–4,000/month). Send a brief with your offer + audience and our proposal generator scopes it.