October 18, 2025
Street Food
Facebook 360 audio encoding and rendering can achieve immersive experience
In the realm of immersive media, Facebook has taken a significant step forward with its 360 audio technology. From 360-degree video to Oculus, Facebook's 360 audio encoding and rendering delivers an immersive experience with fewer channels and under 0.5 milliseconds of latency. This advancement ensures that users can enjoy high-quality spatial audio without compromising performance.
One of the key innovations is the new 360-degree spatial audio encoding and rendering technology, which maintains high quality throughout the entire process from the editor to the user. This is expected to be the first large-scale commercial application of such technology. We also support a technique called "hybrid higher-order ambisonics," which allows spatialized sound to remain high quality while using fewer channels, ultimately saving bandwidth.
Our audio system supports both spatialized audio and head-oriented audio. Spatial audio adjusts based on the user’s orientation within a 360-degree video, while head-oriented audio keeps sounds like dialogue and background music fixed in position. This is the first time we’ve achieved simultaneous rendering of high-end stereo and head-oriented audio.
The spatial audio rendering system offers real-time rendering with less than half a millisecond of latency, enhancing the user experience. The FB360 encoder tool transfers processed audio across multiple platforms, and the audio rendering SDK integrates into Facebook and Oculus Video, ensuring a consistent experience from production to release.
Facebook's 360-degree panoramic video experience is already immersive, but adding 360-degree spatial audio takes it to the next level. When users engage with spatial audio, each sound appears to come from its correct position in space, just as in real life. A helicopter above the camera sounds overhead, and actors in front of the camera seem to be right in front of the user. As the user looks around, the system updates the audio in real-time to match their head movements, creating a truly immersive experience.
To achieve this, we needed an audio processing system that creates a theater-like sense of immersion without requiring a large room. One of the main challenges was structuring an audio environment that mirrors the real world and delivering it through headphones with high resolution while tracking the user’s visual orientation. Traditional stereo audio may help users identify left or right, but it doesn’t convey depth or height, nor does it accurately place sound relative to the user.
Creating and commercializing this spatialized listening experience required many new technologies. While spatial audio research is ongoing, there hasn’t been a reliable end-to-end solution for mass consumer use until now. Recently, we introduced new tools and rendering methods, giving us the first opportunity to bring high-quality spatial audio to the mass market. These techniques are integrated into a powerful platform called the “Space Audio Workstation,†allowing creators to add spatial audio to 360 videos. The rendering system is also available in the Facebook app, so users can hear the same vivid audio uploaded by creators.
These improvements help video producers reshape reality across multiple devices and platforms. In this article, we’ll explore some of the technical details behind these advancements. But let’s start by understanding the history and development of spatial audio.
First-time spatial audio relies on head-related transfer functions (HRTFs), which allow users to perceive sound as if it’s coming from specific directions between their ears. HRTFs help developers create audio filters that make sounds appear in the correct position—before, after, or beside the listener. While HRTFs are typically used in anechoic chambers or with human models, they can also be applied in other ways.
To provide panoramic sound during a 360 video, developers must place sounds correctly. One method is object-based spatial audio, where each sound source (like a helicopter or actor) is saved as a discrete stream with positional metadata. This is common in game audio systems, as the position of each sound can change dynamically based on the player’s movement.
Another method is Ambisonics, which represents the entire sound field. Think of it as a panoramic photo of sound. Multi-channel audio streams can easily represent the full sound field, making them easier to transcode and stream compared to object-based systems. Ambisonic streams vary by order—first-order produces four channels, while third-order produces 16. Higher orders mean better quality and more accurate positioning, though they require more resources.
For workflows, we developed the Space Audio Workstation, a powerful tool for designing spatial audio for 360 videos and VR experiences. It allows developers to place sounds in 3D space and preview them through VR headsets, offering a seamless end-to-end workflow from creation to distribution.
Our system outputs eight audio channels, optimized for VR and 360 videos, known as hybrid higher-order ambisonics. This system is tuned to maximize sound quality and accuracy while minimizing performance demands and latency. It also supports two head-oriented audio channels, which remain fixed relative to the user’s head, ideal for narration or background music.
Our Spatial Audio Renderer condenses years of technological development into a flexible solution that supports various configurations while maintaining high quality. It uses parameterization of HRTFs to balance speed and quality, achieving sub-0.5ms latency, ideal for real-time applications like head-tracked panoramic content.
This flexibility enables widespread adoption across desktops, mobile devices, and browsers. Our renderers are tuned for consistency across platforms, ensuring high-quality spatial audio regardless of device. This uniformity is crucial for maintaining quality across different ecosystems.
The renderer is part of the Audio360 engine, which spatializes mixed high-order ambisonic and head-oriented audio. Written in C++, it provides optimized vector instructions for each platform, is lightweight, and uses multithreading for efficient mixing. It integrates directly with platform-specific audio systems (OpenSL on Android, CoreAudio on iOS/macOS, WASAPI on Windows) to minimize latency and maximize efficiency.
For the web, the engine is compiled into asm.js via Emscripten, allowing consistent code use across platforms. It works well in browsers with minimal modification, routing audio from the Facebook video player to the engine, then back through WebAudio for playback.
As device and browser performance improves, our renderer and engine will continue to evolve, enhancing audio quality. From encoding to client, we aim to make spatial audio easy to use and accessible on all devices.
We faced challenges in finding a viable file format, leading to careful selection of encoders. We chose MP4 with three tracks: two 4-channel tracks and one stereo head-oriented track, encoded at a high bit rate. Metadata is stored in XML boxes for flexibility and scalability.
The Space Workstation Encoder integrates video into the output file, preserving the original video and writing appropriate metadata for 360-degree processing. We also support YouTube’s first-order format.
Once uploaded, videos are transcoded into multiple formats for different clients. We extract metadata to map audio tracks and convert them into needed formats. We also use stereo binaural rendering as a fallback when issues arise.
Audio and video can be processed separately and delivered via adaptive streaming protocols.
Different clients support different formats, so we prepare separate versions for iOS, Android, and web browsers. We prefer MP4 for iOS and WebM for Android and web, due to Opus’s advantages over AAC for spatial audio.
Converting the three-track format into a single 10-channel Opus track posed challenges, but we managed it by transmitting layout information via a manifest file. Future improvements may include lossless encoding and enhanced compression techniques in Opus.
Looking ahead, we’re focused on improving file formats, supporting lossless encoding, and exploring advanced compression techniques in Opus. We aim to enhance user experience by adapting bitrate and channel layout based on bandwidth, pushing the boundaries of spatial audio.
At LiveVideoStack Meet | Shanghai, we explored new trends in multimedia development, including codec competition, WebRTC, AI, and blockchain. These innovations are shaping the future of the audiovisual ecosystem, driving progress and transformation.
Packages For Fiber Optic Communication,Optical Housings,Optical Communication Component,Optical Communication Fine Tube
Shaanxi Xinlong Metal Electro-mechanical Co., Ltd. , https://www.cnxlalloys.com