How AI puts you on the jumbotron, explained

By the ActualCoolThings teamJune 4, 20266 min read

Every few days someone replies to one of our videos with the same question: “wait, how is this actually made?”Fair question. The clip looks like real broadcast footage — stadium lights, a score bug in the corner, a crowd you can almost hear — but it was built from a single selfie in about a minute. Here's exactly how, including the version that didn't work and why we threw it out.

One of our own outputs. Notice the person is genuinely inside the scene — not pasted over a background. That's the whole trick.

The version we tried first (and scrapped)

The obvious approach is one step: hand a selfie straight to an image-to-video model and say “put this person on a stadium jumbotron, make it move.” We built that first. It looked fake, and consistently in the same way: because most selfies are tight, well-lit head-and-shoulders shots, the model treated the person like a cut-out and rendered the stadium as a flat backdrop behind them — green-screen energy. The lighting on the face never matched the scene, so your brain instantly flagged it as composited.

The lesson was simple but important: you can't animate your way to realism if the still frame already looks pasted-on. The realism has to be baked into the image before anything moves.

So we split it into two steps

What actually ships today is a two-stage pipeline, and each stage does one job well.

Step 1 — put you in the stadium (a still)

First we run an AI image editor that takes your selfie and re-renders it as a single broadcast still: you, on the big screen, in a real-looking ballpark — summer evening, telephoto crowd-cam framing, an ESPN/Fox-style score bug in the corner. Crucially, the model relights your face to match the scene and places you withinthe environment rather than in front of it. This is the step that kills the green-screen look. If the still is convincing, you're 90% of the way there.

Step 2 — bring the still to life (the video)

Then we feed that finished still into an image-to-video model (we use Kling 3.0 via Magic Hour) to add motion — subtle crowd movement, a touch of camera life, the micro-expressions that sell “this is a live shot.” Because the model is animating a frame that already looks real, the motion reinforces the illusion instead of exposing it. The clip renders at roughly 8 seconds in 16:9 widescreen.

Why 16:9 and not a vertical clip

We actually started in 9:16 vertical (it's what phones shoot, and it's native to TikTok). We switched to 16:9 widescreen on purpose. A real jumbotron broadcast is horizontal, so a vertical frame subtly reads as “phone clip” and undercuts the “caught on TV” effect. The widescreen frame, with the score bug and the letterboxing of a broadcast, does a lot of the believability work before the viewer even processes the content. Small choice, big payoff.

The genuinely hard parts

Two things are hard and worth being honest about:

Keeping your face your face. Identity drift — where the output looks like a cousin of you rather than you — is the number-one failure mode. It gets worse with low-light selfies, sunglasses, heavy filters, or photos where your face is small in the frame. A clear, front-facing, well-lit selfie is the single biggest thing you control. We wrote a whole guide on taking the perfect selfie for this.
Cost and time per clip.Generating video is not free or instant — there's real compute behind every render, which is why it takes a few minutes rather than a few seconds. We'd rather wait and ship something that looks real than rush and ship something that looks like a meme template.

Why we text you the result instead of making you wait

Because step 2 takes a few minutes, staring at a loading bar is a bad experience. So we ask for your number, kick off the render, and text you the link the moment it's ready — you can close the tab and go on with your day. (We only ever message you about your video; reply STOP and we stop. More on that in our privacy policy.)

Is it “real”? No — and we never pretend otherwise

Let's be clear: these are AI-generated entertainment, not footage from an actual game. There's a small onthetron.com mark on every clip, and we say so up front. The fun isn't in fooling anyone — it's in the “ha, that looks exactlylike the real fan cam” reaction when you send it to the group chat.

That's the whole pipeline. If you want to see it run, the easiest way is to just make one with your own selfie— it's free and takes about a minute.

Make your own