Building a UGC ad in 90 minutes with Seedance + Remotion (with the prompts)
9 May 2026 · 7 min read · TheAIgency
TL;DR. The fal.ai + Remotion pipeline produces a vertical 9:16 UGC ad in ~90 minutes including review. Stack: GPT-Image for the persona, Seedance 2.0 for the motion, fal AI voiceover, fal lip-sync, Remotion for composition + captions + B-roll. Below: the actual order of operations and the prompts we use as starting points. This is the workflow our Producer + Editor agents run inside Cockpit for the Creative product line.
Tools used
| Stage | Tool | Purpose |
|---|---|---|
| 1. Persona | GPT-Image (OpenAI via fal.ai) | Hero shot of the AI creator (multiple angles) |
| 2. Motion | Seedance 2.0 (fal.ai) | 3-6s talking-head clips |
| 3. Voice | fal AI TTS | Voiceover matched to script + accent |
| 4. Lip-sync | fal lip-sync model | Mouth movement aligned to voice |
| 5. B-roll | Seedance / Veo (fal.ai) | Product close-ups, text-only inserts |
| 6. Composition | Remotion | Cuts, captions, music, transitions, export |
The 90-minute walkthrough
- Min 0-10. Brief + script. Locked-down: one hook (first 1.5s), one problem statement, one product reveal, one CTA. 30-45 seconds total. The script is written by Cockpit's Copywriter agent or by hand — both work.
- Min 10-20. Persona generation. GPT-Image prompt template:
Generate 4 variants. Pick one.Scene: [setting that matches the audience — kitchen / desk / car / cafe], natural light. Subject: a [age range] [gender] [region] person looking at camera, holding a smartphone. Important: photoreal, casual outfit, no makeup, no logos visible. Slight smile. Eye contact with camera. Use case: hero frame for vertical UGC ad. Constraints: no text, no overlays, single person, neutral background. 9:16 aspect. - Min 20-40. Motion generation. Seedance prompt:
Run 3-4 takes. Pick the most natural.The person from the reference image talks naturally to the camera, gesturing slightly with their free hand. Subtle head movement. Hold the camera angle. Length: 6 seconds. - Min 40-50. Voiceover. Run the script through fal TTS with the matching accent (FR-MA, FR-FR, EN-MENA, etc.). Generate 2-3 takes.
- Min 50-65. Lip-sync. Feed the chosen Seedance clip + the chosen voiceover into the fal lip-sync model. Output is the talking-head clip with mouth aligned.
- Min 65-80. B-roll. Generate product close-ups via Seedance (3-second shots of the product from different angles). Add Remotion text inserts for the hook line and CTA.
- Min 80-90. Compose + export. Drop everything in Remotion: hook → talking-head → B-roll → CTA. Add captions (we use a hand-tuned word-by-word style — performs better than auto-bouncing styles in MENA testing). Music from a licensed library. Export 4K, 1080p, and a 720p version for fast distribution.
What breaks
- Lip-sync drift. If the voiceover is >6s, the model loses sync. Cut the script tighter.
- Persona consistency across clips. Use the same reference image for every motion generation in a campaign. Don't re-roll the persona.
- Music levels. AI-generated voice is quiet by default. Boost +6 dB before mixing or the music drowns it.
- Caption timing. Auto-caption tools mistime first-second hooks. Hand-time the first 1.5s.
Output expectations
One 30-45s ad with one persona, two B-roll inserts, captions, music, three platform exports. Margin for ~2 revisions stays inside the 90-minute window. For a full campaign (8+ ads with persona + script variety), budget 1 day of focused work or — easier — book a UGC Sprint and we ship it.
If you want this
Sprint pack: UGC Sprint (5-10 videos in a week, €1,000–2,500). Monthly drumbeat: Series (4-8 videos/month, €2,000–4,000/month). Send a brief with your offer + audience and our proposal generator scopes it.