retti.ai › Guides › How Fern writes YouTube documentary hooks — a 5-rule blueprint

How Fern writes YouTube documentary hooks — a 5-rule blueprint

Fern's documentary channel does roughly 20 million long-form views a month at $10-20+ RPM. We pulled his last 20 videos into Retti and analysed the cold opens. Every single one follows the same five rules. Once you see the pattern, you can apply it to your own scripts in an evening. For the broader framework this fits inside, see our complete YouTube retention guide.

Why documentary hooks are worth studying

Long-form documentary content has the hardest retention task on YouTube. You're asking viewers to commit 20-40 minutes to a video where the payoff is usually intellectual rather than visceral. There's no "kill cam" or "they got hit by the slime" moment to deliver fast dopamine. If the hook doesn't work, the rest of the video doesn't get watched — and the format only economically works when the rest of the video gets watched in full.

So the hook craft on these channels is tight, by necessity. Fern's hook formula isn't unique to him — most of the top long-form documentary channels (Lemmino, Wendover, Cool Worlds, in some cases CGP Grey) all converge on similar moves. Fern's is the most consistent across his recent uploads, which is what makes it easy to extract.

The five-rule blueprint

Rule 1: Drop into a scene already in progress

No introduction. No "in this video we're going to look at…" No channel name, no greeting, no "subscribe to my channel". The first frame puts the viewer inside something that's already happening.

The viewer doesn't know what the video is about yet. They know fighter jets are taking off. The video is happening; the explanation will come later.

Rule 2: Name a specific character

Not "an investor." A specific person with a name, an age, and ideally a role. "47-year-old hedge fund manager James Chanos." "Sergeant David Wright of the Royal Marines." "Pilot Sully Sullenberger." The audience attaches to specifics, not categories. Even when the "character" is an institution or a country, give it a specific actor that represents it.

Rule 3: Pin time and place

A specific date, a specific city, a specific room. "It's 11am, October 14th, in a small office on the 6th floor of the Manhattan headquarters of Lehman Brothers." The granularity does two jobs. First, it tells the viewer the story is real (vague stories feel made up). Second, it gives the brain a visual anchor — they can picture the scene rather than abstractly considering it.

Rule 4: Drop in immediate stakes, BEFORE any explanation

By the end of the cold open, the viewer should know that something serious is at risk. They should NOT yet know what the video's larger thesis is. The stakes come first; the context comes second. This inverts what most YouTubers do.

We don't know who X is. We don't know what the decision is. We don't know what "everything" means in this context. All we know is something irreversible is about to happen. That's all the cold open needs.

Rule 5: Use present tense, and withhold one key piece

Present tense ("the phone is ringing") forces the viewer to experience the scene, not be told about it. Past tense ("the phone rang") creates narrative distance — the viewer becomes an audience member at a recap instead of a person living through the event.

Then: deliberately withhold one specific piece of information the viewer wants. The name of the company. The amount of money. The cause of death. The thing the person doesn't know yet. The audience now has a question they actively want answered, and the rest of the video is the structure that answers it.

How to apply this to your next script

The mistake most non-documentary creators make when trying to copy this is reaching for it stylistically — they try to sound like a documentary narrator. The opposite is the better path: keep your normal voice, but restructure the order of information in your cold open to match the blueprint.

Step 1: Write the scene before you write the premise

Before you decide what the video is "about", find one concrete scene in the story that could plausibly play out in front of a camera. A meeting. A phone call. A moment. The video is going to drop into that scene first. The premise comes later.

Step 2: Identify the character carrying the scene

Who's in the scene? Name them. If they don't have a real name, give them a role and a number ("a 32-year-old engineer"). Specificity matters more than the name itself.

Step 3: Time-stamp it

When does the scene happen? Don't write "in the 1990s." Write "March 1993." Don't write "in the morning." Write "around 7:30am." This costs nothing and adds significant retention.

Step 4: Find your withheld piece

What's the single most interesting fact about the scene that the audience CAN'T know yet without the rest of the video? That's your withheld piece. Plant the question; deliver the answer at the act break, the climax, or the closing beat depending on the video's pacing.

Step 5: Rewrite into present tense

Read your cold open aloud. Every past-tense verb gets converted to present. Every "would" becomes "is about to". Every "remembered" becomes "is realising". The shift can feel jarring while you're writing it — it reads almost like fiction. That's the point.

A before/after rewrite

The same content, written badly and then in the Fern blueprint:

Same story. Same facts. The before version asks the viewer to commit because the topic is interesting. The after version asks them to commit because the scene is happening right now and they want to know what happens next. The latter wins on retention every time.

When NOT to use this hook style

The blueprint isn't universal. Three formats where it backfires:

Tutorial / how-to videos. Viewers want to know the steps. Dropping into a scene delays the practical value they came for.
Reaction / commentary content. Your face + voice IS the hook. Scene-setting feels artificial.
News / breaking-event videos. Speed matters; viewers want the headline first. Scene-craft can come after the headline lands.

The blueprint works for any video where the value the viewer is paying for is the story, not the information. Documentaries, video essays, deep-dive case studies, longform narrative gaming. Past those, mix and match.

Want Retti to score your hook against Fern's blueprint?

Drop your video into Hook Review and we'll grade the first 30 seconds against the documentary retention dataset — pinpointing exactly which of the 5 rules you're missing.

Review my hook

Frequently asked questions

What makes documentary YouTube hooks different from regular video hooks?+

Most YouTube hooks introduce — they tell you what the video is about. Documentary hooks drop you INTO an already-happening scene. You're not told "this video is about X"; you're shown X happening in present tense, with a specific person doing a specific thing in a specific place. The framing is novelistic, not explanatory.

Why does dropping into a scene work better than introducing one?+

Because introducing a scene asks the viewer to commit before they've been hooked. "This video is about the 2008 financial crisis" requires the viewer to be already interested in the 2008 financial crisis. "It's 6am. The trading floor is empty. One person is at his desk." makes you want to know what's happening regardless of whether you cared about the 2008 financial crisis before you clicked.

Can a non-documentary channel use this hook formula?+

Most of it, yes. The "specific character + specific time + specific place + present tense" formula adapts cleanly to gaming, vlogs, finance, and educational content. Where it gets harder is the "withhold one piece" move — that requires either a story with an actual reveal in it OR a piece of information you can credibly delay. Tutorial-style videos don't fit as cleanly because the value prop is the explanation, not the reveal.

How long should a documentary hook run before the title card?+

In Fern's videos the cold open averages 30-50 seconds before the title/intro lands. That's long by YouTube hook standards but works for documentary because the cold open is itself the hook — you're not waiting for the premise to arrive, you're watching it unfold. The retention curve through that 30-50s tends to be flat, not declining.

What's the biggest mistake creators make trying to copy this style?+

Reading their hook in narrator voice instead of writing it in scene. "Today we're going to look at how X happened" is narrator voice. "It's 11am. 47-year-old X is at his desk when his phone starts ringing — and he's about to make a decision that costs him everything" is scene. The shift is in the verb tense and the specificity, not the voice acting.