How MrBeast wrote his highest-retention gaming video ever
MrBeast claimed his recent gaming upload — "I Survived 100 Days in Skyblock" — had his highest retention ever on the format. We watched the full video to figure out exactly why. The answer comes down to a small number of very specific script-level choices any gaming creator can copy. For the broader framework, see our complete YouTube retention guide.
What makes this video different
Look at the craft across the usual retention dimensions — hook speed, stakes persistence, payoff density, segment transitions — and most of them are roughly format-standard. One isn't:
In plain English: this video keeps stakes alive in the viewer's head much better than the typical gaming creator. Everything else the video does well — pacing, segment density, payoff arrival — is roughly average for the format. The retention edge is almost entirely about how it handles consequence.
Stakes in six seconds
Most gaming videos take 30 to 60 seconds to tell you what's at stake. The hook fires, the title shows up on screen, the creator says hi, the premise gets explained, and somewhere in the middle of that exposition you find out what happens if they fail. By that point the average viewer has either committed or already clicked away.
The Skyblock video lands the consequence at second six. The mustard tub punishment is named before the intro music has finished. There's no greeting, no channel name, no "today we're playing", no "this is going to be hard" — the very first sentence has the win condition AND the loss condition baked into it.
That choice does two things at once. It compresses the entire hook into a single beat so viewers don't have to wait for the premise to land. And it gives every subsequent action in the video a clear stakes context: every block placed, every monster killed, every step toward the end goal is happening against a visible, tangible consequence the audience already understands.
Why "stakes in 6 seconds" works mechanically
- No drift window. The 0:00-0:30 retention drop is where ~25% of viewers exit most gaming videos. If stakes land at 0:06, that drift window is fully covered by tension — there's no flat patch for the viewer to lose interest in.
- Faster premise confirmation. Viewers click on a title that promises something. The faster the video proves it's about to deliver that thing, the faster they relax into committing. Stakes ARE the proof.
- Tangible loss, not abstract risk. "I might fail" is abstract. "I'll sit in a mustard tub if I fail" is concrete — you can picture it. Concrete consequences anchor harder than abstract ones.
The thing nobody else does: stakes persistence
Setting up stakes early is a known move. Most gaming videos do this version of it competently. The Skyblock video's real differentiator is what happens to the stakes after the hook.
Across the 20+ minute runtime, the mustard tub consequence shows up four separate times:
- 0:06 — original consequence statement
- 3:23 — reinforced as the creator hits the first major roadblock
- 7:10 — second reinforcement, mid-act tension peak
- 18:31 — final reinforcement before the climax
That cadence — roughly every 5 minutes — is what produces the strong stakes persistence. Most gaming creators reference their stakes once, maybe twice, before forgetting about them entirely. By minute 10 of a typical gaming video, the consequence the viewer was originally tracking has been replaced by whatever's on screen right now.
And when stakes evaporate, the video becomes structurally equivalent to a creator just playing the game. The reason to keep watching shrinks to "this is mildly entertaining" instead of "I need to find out if they make it." Mildly-entertaining loses the algorithm fight against actually-tense.
What any gaming creator can copy
The Skyblock video is a $100k+ production with a 200-person team and a brand built on this exact format. You can't copy any of that. What you can copy is the script-level structure that produces the retention edge:
1. Name the consequence in your first sentence
Not "today I'm playing X." Not "this challenge is going to be hard." The first sentence should name both what you're trying to do AND what happens if you don't. "I'll sit in a tub of mustard if I fail" works. "I'll donate $500 to my chat if I fail" works. "I have to delete my Steam library if I fail" works. The specific consequence doesn't matter as long as it's tangible and the viewer can picture it.
2. Make the consequence cheap-to-film
MrBeast can afford expensive stakes. You probably can't, and you don't need to. Pick a consequence that costs nothing to execute but is visible on camera — eating something gross, doing pushups, dyeing your hair, deleting a save file the audience watched you build. The point isn't the production value of the punishment; it's that the punishment is something the audience can mentally rehearse during the video.
3. Reinforce on a 5-minute cadence
Set a calendar in your edit: every 5 minutes of runtime, the stakes have to come back into view. This can be:
- A short verbal callback ("we're 40 minutes in and that mustard tub is starting to look closer")
- A B-roll cutaway to the consequence itself
- An on-screen graphic restating what's at risk
- A character reaction to a near-miss that re-invokes the threat
Pick one method and commit. Repetition makes stakes feel sustained; variation makes them feel real.
4. Land the resolution explicitly
When the video ends, the consequence should resolve on camera — either the win condition lands and the punishment is dodged, or it doesn't and the punishment plays out. Don't leave it implied. The audience has been holding the stakes in their head the entire runtime; depriving them of the payoff is the single biggest mistake creators make on this format.
The stakes-craft checklist
Before you upload your next long-form gaming video, run it against the five-question audit we ran the Skyblock video against:
- Are stakes named in the first 10 seconds? (Skyblock: 6 seconds. Floor: 10.)
- Is the consequence tangible and visualisable? (Mustard tub: yes. "I'll be sad": no.)
- Do stakes get reinforced at least once every 5 minutes? (Skyblock: 4 reinforcements across 20 minutes.)
- Does the closing beat resolve the stakes on camera? (Win condition met / punishment paid — explicitly visible.)
- Could a viewer who joined at minute 10 still tell you what's at stake from context? (Skyblock: yes, because of the reinforcement cadence.)
If any of those answers is "no", you've found a structural lever you can pull. Fix the highest-priority one first (stakes naming in the hook) and ship. The retention move is small individually but compounds across uploads.
Want your own video scored against the gaming benchmark?
Upload your video to Retti and we'll break down its retention beat by beat — stakes persistence included — and show you exactly where it holds and where it leaks.
Score my videoRelated
- YouTube retention: the complete guide
- The Preston retention system (long-form gaming)
- Long-form gaming RPM
- First-sentence hooks: what the data says
- How to improve YouTube retention
- How to increase YouTube AVD
Frequently asked questions
What makes MrBeast's Skyblock video retain so well?+
MrBeast called it his highest-retention gaming video, and the standout reason is stakes persistence. The mustard-tub consequence is named in the first few seconds and brought back as a recurring beat throughout the runtime, so the tension never evaporates. That's the dimension where it most clearly outclasses the typical gaming video — everything else it does is roughly format-standard.
How fast did the video establish stakes?+
Stakes land at the 6-second mark — the mustard tub consequence is named before the intro even ends. Most gaming videos take 30-60 seconds to establish what's at risk. The consequence then gets reinforced at 3:23, 7:10, and 18:31 throughout the runtime, treating it as a recurring beat rather than a hook detail.
Can a smaller creator copy this approach?+
Yes — the structural moves are size-independent. Stakes density doesn't require a $100k production budget; it requires a script choice to name the consequence in the first sentence and bring it back every few minutes. The video's "I will sit in a mustard tub if I fail" is functionally identical to the structural move a small creator can make: a tangible, low-cost loss that's easy to film.
Is this about production budget or script craft?+
Script craft. The retention edge here comes from the structural moves visible in the script, audio, and pacing — naming a tangible consequence fast and keeping it alive — not from view count, channel size, or packaging. That's exactly why a creator of any size can borrow it.
What's the biggest mistake gaming creators make on stakes?+
Setting them up in the hook and then forgetting them. The Skyblock video doesn't do that — it brings the mustard tub back four separate times spread across 20+ minutes. Most gaming videos plant a consequence at 0:30 and never mention it again, so by minute 5 the stakes have evaporated from the viewer's mind and the video has no tension floor.