In an Increasingly Interactive World, What Is the Future Technology of Audio Storytelling?

Published in

Curious

13 min readFeb 16, 2021

Earlier this year, in a time now alien to many of us, I attended a Seattle XR conference on assignment. The event was hosted by Vulcan, a local and globally recognized documentary company started by the late Paul Allen, with the goal of increasing awareness toward conservation, scarcity, and injustice around the world. The main event space was crowded with makeshift booths advertising VR experiences, lines of over-the-eye headsets dotting the perimeter with opportunities to climb tropical flora, view traditional dance rituals, and see whales just off coral reefs.

This conference was about the future of immersive storytelling, featuring the heads of YouTube VR, HTC Vive, and Oculus, among others. At the center of panel discussions was the question of what VR, AR, and XR mean for the landscape of contemporary storytelling — what role does it play compared to typical film documentaries, how accessible is it as a medium, and how do you measure the tangible influence of an XR story? I heard numerous claims of the power of XR to reshape the model of storytelling in the future. One panelist even noted that VR is the future of empathic storytelling.

One thing I didn’t hear was how audio storytelling fits into this future — an interesting omission considering that an intimate and empathic experience has long been claimed as the superpower of audio.

So where does audio fit in this immersive, tech-driven future of storytelling? Will XR take that honor away from the audio world? Or will audio find ways to fit in as well? At the core of the XR revolution is not just a shakeup in the kind of stories told, or even the immersion. It’s about giving listeners the agency to change their reception of the story, to exist within the story world as an omniscient bystander. Luckily for audio producers, the core components driving the innovation in XR is an old radio trick: creating story spaces.

Audio technology today

When asking about where audio can go next, we should start asking about where it is now. Starting here, the questions then become about why a story is told as a podcast versus video, print, and traditional radio; and what are the elements driving an explosion in podcasting. In a nut shell, what does podcasting add?

Podcasting is reaching a pivotal zenith. With estimates showing nearly 1.8 million podcasts circulating online, mainline programming is consolidating into a spread of production houses under Spotify, Apple, Wondery, and a bevy of others. What used to be a nascent sidecar industry is blooming into a media ecosystem that integrates content into our everyday consumption habits. Brands, increasingly, notice the reach of audio storytelling and are doubling down on sponsored podcasts from travel tips to sleeping tips and employee management walkthroughs. Print-driven podcasting has been an especially disruptive force in the style and popularity of audio storytelling devices (yes, looking at you, New York Times, the good and the bad). And yet, even as the variety of podcasts saturates and production houses test innovations like paid, Netflix-styled subscription services, the technology at the center of audio storytelling has seen more incremental progress.

The question of podcasting future was put to Dawn Ostroff, head of content at Spotify, multiple times over the last year.

Spotify alone hosts over a million podcasts on its platform, and whether they come from internal production studios, the decisions of the app largely guide the decisions of podcasters worldwide. Ostroff brought in Michelle Obama, Kim Kardashian, and Joe Rogan to blockbuster Spotify deals, while also pushing Parcast to produce its first daily true-crime podcast, which at the time was pioneering (and dailies, regardless of genre, are now a standard for any studio’s portfolio). By leveraging Spotify’s massive global audience, the platform is offering digital analytic tools to help find content holes and drive smarter advertising, to help make the traditionally free industry more profitable. Thanks to their immense user base, the company has the numbers to point to yet unsaturated genres and is pioneering the analytical markers of podcast success.

“If you focus on a specific audience and program for them successfully, inevitably what you end up with is a broader piece of entertainment,” Ostroff told the Wall Street Journal last year. Her focus is on understanding younger listeners, who are voracious content consumers: what they want to listen to, where they listen from, and how podcasts can enter into cultural conversation as casually as The Mandalorian or Stranger Things. For those looking to pioneer into audio, organizations have found that podcasting balances valuable engagement with flexibility.

That flexibility is something Ostroff is also taking aim at, what we might consider podcasting’s bread and butter: passive listenership. Certainly, podcasting (and radio at that) has always been a medium of the commuter and the busy. Terrestrial radios built into cars have fueled generations of “drive time” programming, from NPR’s All Things Considered and Morning Edition to regional power-hour programming driven by hosts and talk show interviews. Podcasting was a hyper-innovative approach to filling the nooks and crannies of our days with non-busy programming, shows which captured our passive attention as a thankful break from a world of screens. Intimacy and empathy still reigned supreme, “driveway moments” becoming cleaning, walking, and biking moments.

But since podcasting began to takeover audio listenership, the technology pushing it in new directions has been slow to innovate. For companies like Spotify, the growing push to develop hit podcasts into television or movie adaptations is a newly dedicated avenue to change the way we consume audio stories. But this takes the passivity of listening and fills it with focus, putting you in front of a screen, directing your ears to hear according to what you see (which our brain has a natural bias for). But even with the mobility of video in today’s technology, doesn’t this feel like it takes something away from the core experience of audio storytelling? More hours of content, yes; more opportunities to engage in your favorite podcasts, sure. But the untethered space, the imagined faces and places and spaces in audio — it’s this lack of specifics and biased visual directives that colors audio.

That said, the idea of pushing podcasting past the passive and into the intentional is what really grabs my attention here. Many of us don’t just sit down to listen to a podcast, and many podcasts aren’t asking us to. What the current pandemic has shown, however, is that podcasts can be a mainstay in our home lives, separate from the bustle of commutes and errands. While listenership for mainstay public radio programming dropped with the lack of a morning and evening drive time, podcast listenership continued to grow at home, a positive sign that podcasting has evolved to be more than its assumed mode and context. People are continuing to listen to podcasts during the mundane daily chores of pandemic life, but they’re also listening because they just love podcasts — an exciting revelation.

Virtual reality and what podcasting can learn

We’ve now seen how podcasting has evolved past the commute model, hinting at a deeper attraction to audio storytelling that transcends the places in which stories are consumed. While XR producers are thinking of the spatiality of content production, so too should audio producers look toward the presence of radio in the home as a ripe opportunity for transforming the experience of audio in space.

In Seattle, I participated in experiences ranging from a listless floating with blue whales to the life cycle of a temperate rainforest. Many, like these, depended on the ambiance of nature to provide immersion. The highlight of this conference, though, was a Q and A with Oscar-winning Director Roger Ross Williams about his interactive short-film Traveling While Black, which seated viewers in conversations and cars throughout the U.S. South at the height of Jim Crow.

Williams told the audience that the film went through a number of near-total revisions as they tried to understand the best use for XR’s shift in narratorial perspective. The camera no longer told you what to focus on, shifting the viewer’s relationship with characters and setting. Virtual reality meant a newly discovered freedom, which let the mind wander outside of directorial control. This is the core of XR content: the creation of a reception room where viewers are placed within a controlled but malleable atmosphere. For XR creators, the challenge is turning the camera away from a focused target and instead considering the total atmosphere of an imagined room — one which requires bustling actors in the background, tertiary sounds telling their own tiny stories, and attention to detail in all the corners outside of the primary narrative space. Now that the viewer has the autonomy within the story, the depth of composition — and an offering of interactive elements — is at the center of a story experience. On a fundamental level, XR is about choice.

Audio storytelling, thanks to its lack of prescriptive image-based storytelling, has long been in the business of creating similar imagined spaces through music, sound, and contextual narration. Listening, by its nature, is untethered, meaning the experience of a story from one listener to another is abstracted in unique, individual ways. What audio storytelling currently lacks from VR is the ability — or practical means — for listeners to control their interaction within that space.

Smart speaker technology is the early solution allowing for a more personalized audio experience, but most of these models are following the footsteps of well-established interactive practices. That’s to say that adding quiz options, the ability to search and skip through stories, and the direct phone line models of Alexa skills are novel to the smart speaker experience, but not to the audio world as a whole. They aren’t the next great platform innovation to change the way podcasts are consumed — the way podcasting was — but the way they’re interacted with. The proliferation of smart technology is a critical confluence of both a listening and speaking apparatus, and one which offers the opportunity for listeners to give immediate engagement — and feedback — within a project. Radio, as driven by call-ins for over 100 years, is primed for this level of storytelling as molded and refracted by a listener.

Potential interaction models for audio

Let’s look at a couple of ways this interaction could take place. Interactive audio storytelling will have to be different from the kind of space in Traveling While Black, because while audio creates an imagined space, navigating through that space without visual cues would be a niche of virtual audio storytelling that diverts from the aim of this article (there’s something to be said about the potential of binaural proxemics here, but that’s for another time). What I’m emphasizing here is how audio can create similar models for autonomous interaction within a story space. With XR as a model, journalists can continue to have editorial jurisdiction over the story, but one which simultaneously allows listeners to do their own roaming with their own voice.

There are two interesting thought experiments with this.

First, the “choose-your-own-adventure” style of storytelling could transform the experience and perception of storytelling for producers and consumers. These would be stories with multiple avenues for experiencing perspectives within a story. Picture a central event with five alternate experiences of that event — something like a Rashomon. Listeners can choose to listen between experiences, and as they hear from each individual arc, they learn more about the story — learnings which will color and shade their perceptions of those they hear next and those they heard prior. Another facet of this could be the profile, where listeners are presented with an introduction to a character but then choose separate points in their timeline to jump next, gradually shading in the listeners perception of the story.

Stories like this could upend the choices that go into the crafting of narrative, instead democratizing story construction. This does two things: first, listeners now experience stories and characters outside of the polished and forced direction of the storyteller, in essence transferring some of the power of narrative shaping from the creator to the receiver. Second, this process would ask listeners to think critically about the decisions they make as they progress through a story, because those decisions will shape their perception of it. When we hear a classic rags to riches story, do listeners prefer to skip to the end and hear of success first, before their failures? Do they want to escape failure once they’ve heard it and jump to success? This approach (optimistically, I’ll admit) mimics the ways in which we hear stories and learn of people in real life — in bits and segments, second and thirdhand accounts, the transparent firsts and reserved lasts. While moving these decisions from the storyteller to the listener pulls the seasoned expert in favor of the passive listener, this also shifts the burden of composition, in turn asking the listener to think of their bias and responsibility in creating the stories we prefer to hear about others.

There are additional benefits for the story crafters here, too. If, when this project comes to fruition, producers utilize the data from mass listenership, it can gain useful insights into the preferred story structures of its audience. At the same time, by asking listeners to think retrospectively on the story, there is potential for listeners to analyze the biases that led them to think that way. The listener also gains a renewed understanding of why they like the stories they do and why they chose the storylines they did. It arms storytellers with greater insights into how listeners want stories molded, while also asking listeners to think about why they choose the stories they do. There are introspective and outfacing benefits for both parties.

By increasing this level of intractability, we can also begin to conceive of how personalized content can begin to address biases even more. This is the second experiment of an interactive audio experience. What if, when opening a story, you were first prompted by a question: are you a climate believer, or a climate denier? Are you a Democrat, an Independent, or a Republican? If you saw someone struggling on the street, would you help them, or walk by? What follows after this initial question is then an even more personalized story that builds around the biases of the listener, and creates a fork in the social experience, fostering more individuated experiences. At the close or start, a listener could then hear how many of their shared demographic chose their side of the story, whether or not they felt challenged to change their opinion, and how their experience of the story made them reflect on their chosen bias. It is both an opportunity for individual story experiences and a more connected social experience.

While social media and differing levels of interaction can indicate what kind of stories are preferred, this additional layer of interaction could indicate a preference for how stories are told — a layer of information useful to both storytellers and those who interact with them.

But can this really be a thing?

There is concern that audio, without a drive to innovate alongside the filmmakers, media mogul houses, and technology-driven storytelling, won’t continue as a relevant long-term medium. It’s not that this concern is new; it’s revitalized every decade, and yet audio continues to show its staying power. The unencumbrance of listening, of sitting on the other sides of conversations, is an irreplaceable experience. But today’s emerging technology is more than a new opportunity for padded download numbers and diversified revenue streams. Audio needs to open up a channel for the listener to engage in a conversation about who is speaking and how we listen. As journalism becomes more connected than ever, and is thereby more accessible for creators and consumers than ever, media also needs to reflect on the biases and impact of traditional storytelling methods — and biggest of all, what defines a ‘good’ and ‘successful’ story. By allowing the listener to wade through a misconfigured story, interactive technology has the potential to reshape how producers and consumers think of narrative in unprecedented ways.

Yet, I’m unsure if the industry knows quite how to use this storytelling just yet. Podcasting is reaching new heights everyday, and it’s still considered by many to be an ‘evolving’ or nascent technology. There just aren’t that many problems at the technological moment to really drive experimental innovation in the space. This runs in tandem with the growth of smart speaker technology broadly. Likely, once even more homes have IoT audio devices installed and developers can understand further proxemic uses for them, new features and tools will emerge alongside greater familiarity with the realistic potential of the technology.

And that isn’t to say there isn’t any experimentation within the industry. Spotify added a new polls feature to their platform to allow direct engagement between listener and host in podcast. By mimicking the personalization of features like Pandora, several podcast hosting services are looking to create personalized streams of content, again striking at the story preferences of listeners (and how developers can disrupt those preferences). Numerous applications are seeking to use screen components of podcasting as a nexus for interactive links, photos, and videos. True-Crime continues to be a driver of form, including the example of Solve, an interactive crime podcast that tasks listeners with solving an inspired case — a melding of audio storytelling and gaming. Projects like WBUR’s Voice-Activated Voter Guide are utilizing smart speaker interaction to create a one-stop guide for bundles of stories, much in the spirit of the potential for emerging technologies to shift the paradigm of news interaction. And the emergence of audio-only social apps is another interesting layer on how we collectively consume and produce audio. Even then, these are all scratches on the surface.

Right now, bugs and limited AI capabilities make it difficult for immersive experiences, as does the lack of flexibility in how a smart speaker or object can be engaged with. But this is the potential for smart and interactive technology to be more than yet another platform through which we stream — an opportunity to build communities between listeners and the story builders of radio and podcasting. It is the chance to do those who listen a service, and one that increases the depth of interaction while building in opportunities to personalize and reflect on story preference and biases.

The oldest of radio cliches is that intimacy is the forte of audio storytelling. But as immersive stories gain traction in the visual world, so too will audio need to think of how it can bring new immersive experiences to one of our most longstanding pillars of media. Piecing together a symbiotic relationship between creators and listeners will be at the center of how audio adapts to the media of the future — a continuation of the legacy of radio tied to podcasts and the unpredictable future of technology in audio.

In an Increasingly Interactive World, What Is the Future Technology of Audio Storytelling?

Audio technology today

Virtual reality and what podcasting can learn

Potential interaction models for audio

But can this really be a thing?

Written by Alec Cowan