How to Make This or That Videos (2026)
The short answer
To make a this-or-that video: line up 8-15 visual binary choices in one theme (outfits, kitchens, travel spots, game loadouts), show each pair side by side for 2-3 seconds with a beat-synced transition, let the viewer pick silently as they watch, and end with 'comment your picks' or 'count your A's and B's.' 25-50 seconds, 9:16, music-driven, no voiceover required. Reelry's free would-you-rather generator produces balanced pairs you can adapt into visual editions.
This-or-that is the lightest-weight interactive format on short-form: a stream of quick binary choices - this kitchen or that kitchen, window seat or aisle - that the viewer answers reflexively in their head. It is would-you-rather stripped of dilemmas and built for aesthetics: no stakes, no agonizing, just taste. That makes it the native interactive format for visual niches (fashion, interiors, travel, food) where the pictures are the content and the choosing is the engagement. This guide covers pair design, pacing and music sync, the scoring mechanic, and niche editions.
Specs at a glance
| Ideal length | 25-50 seconds; 8-15 pairs at 2-3 seconds each |
|---|---|
| Per-pair timing | 2-3 seconds: long enough to see both, short enough that choosing stays reflexive |
| Hook window | Pair 1 within 1-2 seconds, and it should be the prettiest pairing you have |
| Aspect ratio | 9:16 vertical, 1080x1920; split top/bottom or side-by-side halves per pair |
| Platform limits | TikTok up to 10 minutes, Reels and Shorts up to 3 minutes; this format is the shortest interactive lane |
| Audio | Music-driven: transitions cut on the beat; voiceover optional and usually absent |
| Posting cadence | 1-2 daily; themed editions batch-produce in minutes once the template exists |
Free tool for this format: Would You Rather Video Generator
Generates balanced either-or questions tuned for an even split - use it to draft pair ideas, then strip the dilemma framing and render each pair as a visual choice.
Why this format works
- Choosing is involuntary at 2-second pacing: the viewer's brain picks before the next pair arrives, and fifteen micro-participations add up to a watched video.
- Taste is identity in visual niches: commenting 'A, B, B, A' is a tiny self-portrait, which is why scoring endings reliably fill comment sections.
- The format is voiceless and text-light, so it travels across languages and runs perfectly muted - the lowest-friction content there is.
- Production is assembly rather than creation: with generated or curated imagery, a themed edition takes minutes, and the template never wears out.
Step-by-step guide
1.Theme the edition and design real choices
Every video is one themed edition: cozy kitchens, European cities, gym fits, fantasy character builds, wedding venues. Within the theme, each pair must be a genuine style fork (cottage kitchen vs industrial loft kitchen), not a quality gap (nice kitchen vs bad kitchen) - quality gaps have right answers, and right answers kill participation. Aim for pairs where you honestly hesitate; the would-you-rather generator's balance principle applies directly.
2.Source or generate the visual pairs
The imagery is the product, so it has to be good: generate illustrated pairs in a consistent style (Reelry's art styles keep an edition coherent), use your own photography, or license stock. Generated imagery has a specific advantage here - you can hold everything constant except the variable being chosen (same room, two aesthetics), which makes the choice cleaner and the video more satisfying. Label each side simply: 'A / B' or one-word tags ('coastal' / 'alpine').
3.Pace to the music
Pick the track first and cut pairs on its beat - 2-3 seconds per pair means a pair lands every 1-2 bars on most trending audio. The beat-sync is what makes the format feel polished rather than slideshow-like, and using trending audio gives the discovery boost this voiceless format can fully exploit. Transitions stay simple: hard cuts or quick slides on the beat.
4.Add the scoring mechanic
The ending converts silent choosing into visible engagement. Three proven closers: 'comment your letters' (viewers post 'ABBAB...'), 'mostly A's or mostly B's? what it says about you' (a one-frame pseudo-result that drives both comments and screenshots), or 'the pair you couldn't choose - that's your comment.' The scoring frame is also the series hook: 'aesthetic quiz, edition 12' brings the same commenters back.
5.Post daily and let editions find their niches
Editions are cheap, so explore: run kitchens, then travel, then fits, and watch which edition's completion and comments spike - that is your channel's lane telling you what it wants. Caption with the theme plus a participation prompt. Keep one consistent visual signature (font, frame, transition sound) across editions so the format is recognizable in-feed regardless of theme.
Examples by niche
Interiors niche
'Pick your dream kitchen: 10 choices.' Pairs hold the room constant and fork the aesthetic: marble vs butcher block, open shelving vs glass cabinets, brass vs matte black. Interiors editions have the highest save rates of the format - viewers keep them as mood boards - and the comments fill with letter-strings plus 'the marble one but with the black fixtures from B,' which is the audience designing with you.
Travel niche
'Build your perfect trip: this or that, Europe edition.' Window seat vs aisle, Alps cabin vs Amalfi cliffside, museum day vs market day, night train vs morning flight. Travel editions ride strong shareability ('do this with me' sent to travel partners) and pair naturally with destination content the rest of the channel runs.
Gaming niche
'Build your character: this or that, RPG edition.' Stealth vs strength, bow vs blade, lone wolf vs full party, dragon mount vs teleportation. The gaming variant turns the format into a character sheet - the comment string is a build, and 'rate my build' replies chain the engagement. Fantasy illustration styles make this edition fully generatable.
Common mistakes
Pairs with right answers
A beautiful option against a mediocre one is not a choice, it is a quiz with one question. Every pair needs two defensible picks - the hesitation is the engagement. If 90% would pick the same side, replace the pair.
Pacing off the beat
Pairs that land between beats make the video feel like a slideshow. Cut every transition on the music's grid; at this format's speed, the rhythm is most of the production value.
Skipping the ending mechanic
Without a scoring or comment frame, fifteen silent choices evaporate when the video ends. The closer ('comment your letters') is what converts the participation that already happened into engagement the algorithm can see.
Templates
12-pair edition template (40 seconds)
0-2s: title card ('Pick your [theme] - this or that'). Pairs 1-11: 2.5-3 seconds each, cut on the beat, A/B labels consistent top/bottom. Pair 12: the hardest pair, held 4 seconds. Closer (3s): 'Comment your letters. Mostly A's = [type], mostly B's = [type].' Same font, frame, and transition sound across every edition.
Related resources
For hook formulas you can apply across all these formats, read the TikTok hook formulas that convert guide on the Reelry blog.
Generate your first reel with Reelry
Reelry produces complete illustrated TikTok reels from a text prompt - script, frames, voiceover, animation, and assembly - in under 5 minutes.
Starter plan from $19/month · 7-day money-back guarantee · Free plan available, no credit card required
Create your first reel - freeReelry for specific creators
Related guides
Frequently asked questions
What is a this-or-that video?
A rapid sequence of 8-15 visual binary choices in one theme - this kitchen or that kitchen, window or aisle - shown 2-3 seconds per pair over beat-synced music, ending with a prompt to comment your picks. It is the aesthetic, low-stakes cousin of would-you-rather: taste instead of dilemmas, pictures instead of scenarios.
How is this-or-that different from would-you-rather?
Would-you-rather runs on dilemmas (both options cost something, deliberation is the content) at 7-10 seconds per choice with timers. This-or-that runs on taste (both options are good, reflexive picking is the content) at 2-3 seconds per pair with music. WYR suits hypothetical and talk-driven niches; this-or-that suits visual niches like fashion, interiors, travel, and food.
Do this-or-that videos need a voiceover?
No - the format is typically voiceless and music-driven, which makes it language-independent and perfectly mute-friendly. The work goes into imagery quality and beat-synced cuts. Some channels add a single spoken closer for the scoring mechanic, but on-screen text does the same job.
How long should a this-or-that video be?
25-50 seconds: 8-15 pairs at 2-3 seconds each plus a 3-second scoring closer. The reflexive-choice mechanic depends on speed; pairs held longer than 3 seconds invite deliberation the format isn't built for, and editions past 15 pairs see completion fall off.
Where do I get the images for the pairs?
Generate them, shoot them, or license them - search-grabbed images carry rights risk here like everywhere. Generated illustration has a unique advantage for this format: you can hold the scene constant and vary only the aesthetic being chosen (same kitchen, two styles), which makes pairs cleaner than any two stock photos can be. Reelry's consistent art styles keep a whole edition coherent.