How to produce a video

What I learned this year, Part II

There Goes Old Georgetown.

As some of you may know, my big project for the spring was producing a video. COVID led to the cancellation of Georgetown University (of which I am an alumnus) in-person events such as the John Carroll Awards. Usually the Chimes, an a cappella group, participates in the quinquennial class reunions. This year, everything would have to be virtual. This is how a virtual video starring 104 people came together behind-the-scenes. Think of it as analogous to DVD commentary.

We initially tried singing together on Zoom. It was an unmitigated disaster. Partly due to latency and because Zoom muted whomever wasn’t the loudest user, it was impossible to use Zoom for live performance. After looking at a variety of apps and finding none that satisfied our needs, old-fashioned asynchronous recording was used.

The first problem was timing. How do you ensure that everyone is singing at the same time and at the right tempo?

One option was to have people sing to a metronome. Another option was having a virtual conductor. But we needed neither. With a 74-year-old catalogue, we picked a version of the song with a tempo that could serve as the “pace car” for recordings.

Hollywood uses a device called the clapboard to synchronize audio and video in postproduction and we needed to have something similar if we wanted to have any chance of synchronizing who-knows-how-many people (n.b., low power AirPods will have latency). Since there’s no reason a normal person would have a clapboard at home, people clapped at the beginning of each recording instead. This helped line up the sound waves in postproduction.

Lining up the sound waves.

My compatriots George and Matt were responsible for gathering the recordings. They would hold one-on-one recorded Zoom calls where feedback on lighting and sound could be given. But we took any format, not just Zoom. Whatever was easiest for the person who was recording.

George would then place each video onto grids of 16 (“G-16”) before sending it to me for final assembly. I jerry-rigged an invisible grid—I didn’t know you could just drag out guides from the rulers in the Graphic panel until much later in the process!—which was then used to lay out titles for each participant. I used Adobe Caslon Pro since that was the primary formal font used for alumni events.

A view of the group layers in the audio panel.

After all the grids were in, the song was divided into roughly equal portions, with the video switching to a new G-16 wherever it made sense lyrically.

The audio layering was accidental. I was initially going to delink the audio from the video so that each audio track would be playing for the entirety of the video. But after initially placing the G-16s, we’d found an audio buildup effect that made the video better.

Lots of calculations were made. Once we knew we would exceed 64 people, we had to figure out how to lay out the final scene. We had to break up some of the G-16s so that the final would remain rectangular.

I knew I wanted the final shot to include everyone who was in the video. Once we knew we were unlikely to perfectly fit in a 16 by 9 mosaic, we fiddled with various layouts and ultimately settled with a super-widescreen ending.

The final transition was directly inspired by, of all things, a 2009 recording of “Bohemian Rhapsody” by The Muppets.

Will you do the fandango?

Finally, the video was bookended with opening and closing titles. A picture from the Dahlgren quadrangle was considered, but panoramic views of campus I’d previously taken—one from the Potomac River and the other from a rooftop—were used instead.

What I’d do differently

If I knew what I know now, I would do a few things differently.

First, I would more broadly teach people how to properly light their Zoom setup. Ideally, you’d have a three-light setup with a key light, a fill light, and a backlight.

Second, I would use loudness normalization to set the sound level for final delivery. In the U.S., ATSC is used while Spotify is moving to ITU 1770. Here is someone who actually knows what they’re talking about.

Third, I would use what I’ve been told is called the “Texas Master”. You export once into the least lossy format possible. It’s from that Texas Master that you then create formats for delivery: Blu-Ray, YouTube, Vimeo, Mobile, etc. By exporting it to a Texas Master initially, this limits the failure point to one export.

Error message when trying to export to a resolution greater than 8K at 30 fps.

Finally, if there were more time, I’d also cut a version of the video in portrait orientation, which is the main orientation we use when consuming video on our phones. Artificial intelligence might be an option that can help make this an easy change in the future, but right now it’s a huge undertaking.

The Final Grid

One last note: since I wanted full resolution throughout the final transition, there’s actually a sequence that contains all 104 videos, measuring 12,480 by 4,320. Here it is, downscaled to 8K:

The view from virtual Video Village. Never been on a movie set, so I might be completely misusing the term.

Shameless Plug

If you’re interested in hearing more, albums are available at the following links:

—https://music.apple.com/us/artist/the-georgetown-chimes/577980524
—https://open.spotify.com/artist/009xVtsRRrOC5VUjPxhuLh
—https://music.amazon.com/artists/B01N7IEWDB/the-georgetown-chimes

↑Back to Top