For most media archivists, preservation usually means handling tattered tomes and half-disintegrating film reels.
PodcastRE (pronounced “podcaster”) is an ambitious project to preserve podcasts, pre-recorded digital audio shows that some say are experiencing their “golden age.” The project is a first-of-its-kind effort to compile hundreds of thousands of audio files into a unique archive, and to craft a database for searching through and analyzing that material.
The project was among the projects to recently win funding from UW2020, an initiative backed by the Wisconsin Alumni Research Foundation to provide resources for innovative academic research.
According to Jeremy Morris, the communications professor who launched PodcastRE, the project is saving cultural artifacts that are at risk of being lost. Unlike film reels or canvas paintings, podcasts are not vulnerable to decay, fire, humidity or mold. The threat they face is simply getting deleted from a server.
Morris said that podcasts disappear from the internet all the time. Part of the inspiration for PodcastRE came to him when he saw MTV VJ turned podcasting pioneer Adam Curry asking people on Twitter if they knew where he could find early episodes of his show, “The Daily Source Code.”
“If you were to go back to 2005, and 2006, when podcasts first took off, it’s really hard to find some of that audio,” said Morris. “Like with anything on the web, there’s this push and pull — you think it’s going to be there forever, but when you go back to look for what you need, you can’t find it.”
Morris said that it’s important to begin creating such a database now, given that the early years of a medium are some of the most vulnerable. In the case of film, the Library of Congress has estimated that 75 percent of all silent movies from 1912 to 1929 are lost. (The UW’s own Wisconsin Center for Film and Theater Research is one of the world’s major archives of films, television shows and related materials.)
“We know from silent films or radio … how valuable these texts are,” said Morris.
In addition to collecting elusive shows from the medium’s early years, PodcastRE has also spread its tentacles across the Internet to capture the deluge of new content that comes out every day. Morris and his collaborators utilize a script that downloads batches of mp3s from thousands of different podcast feeds, capturing terabytes worth of audio to a hard drive in Vilas Hall.
The online database that already exists lets people easily search for and in some cases stream those podcasts. In that sense, the project functions like iTunes, Stitcher or any other “podcatcher.”
But Morris envisions PodcastRE as something that goes well beyond iTunes. He hopes to build out the database, still currently in a “beta” mode, to include a wealth of information: transcripts, metadata about audio quality, who the hosts are.
Morris said with that kind of sophisticated information, researchers could answer all kinds of questions. What current events were people talking about on podcasts released on August 20, 2013? What percentage of sports podcasts have women behind the microphone? How loud were podcasts in 2008 on average, compared to podcasts today?
Morris said he wanted to create such a resource in part because of his love of the medium. He hosted a culture podcast himself for many years in Montreal, appropriately called “Midnight Poutine.”
“My own enjoyment of the media is just how intimate it is,” he said. “For me, it’s always just been that audio brings story to life in different ways than text does.”
In the near term, Morris said that he and his team are focusing their energy on adding search functionality to the website to make it more navigable for researchers. He also hopes to work with UW-Madison libraries to bring PodcastRE into accordance with best practices for preservation.