How we scale live streaming for millions of simultaneous viewers
Real-time video is notoriously hard to scale. Here's how we manage it after several years of trial and error.
The distribution challenge
Flowplayer live infrastructure is a genuinely global thing. When a customer shoots a soccer game in Ireland, it is instantly viewable in Europe and throughout the US. No matter how big the audience, the stream is immediately delivered to everyone.
It is not unusual for a single customer to shoot 20 simultaneous football matches and stream them for local, national, or even global broadcasting. For instance, we can have a small regional customer in Norway and a massive national customer in the US online at the same time, and we must deliver the same, high quality service, for both.
Simply put, live feeds can originate from any part of the world, and they must be instantly delivered to the end users, no matter where on the globe they are located.
The quality challenge
Solving the distribution challenge is hard, but ensuring excellent quality without interruptions is even harder. The real difficulty lies with the original video which won't work on the web directly, so it must be converted to something that browsers understand. And this whole byte manipulation process must happen in real-time while the event is running.
You really cannot make any mistakes because a demanding audience expects a constant, uninterrupted signal.
Of course, the worst thing that can happen is a failure that causes a black screen for everyone watching the live event. Think of a crowd with the equivalent of a full stadium of viewers (50,000+) watching a major ice hockey game, and the broadcasting stops on that critical moment.
It screams a massive social media fiasco, and the brand responsible for the show suffers long-lasting damage.
Doing live streaming at scale is far from an easy job. Anything can (and will) fail at any point in time, yet the video signal must remain solid for everyone watching the broadcast. Even services like Netflix and HBO have an easier job because they won't need to worry about any of the enormous complexities of a live event. After all, it's all happening in real-time, and the streams can be running for hours — even days.
After years of fighting and hundreds of failed attempts, we have finally found a solution that we are happy to grow with.
At its heart, our infrastructure is split into multiple availability zones that all look like this:
Each zone consists of two different clusters:
- An auto-scaling server cluster responsible for the video transcoding.
- Two Redis instances providing the necessary information to the transcoding jobs. The other database is solely for redundancy.
These transcoding jobs are computationally intensive, so all the instances are optimized for maximum CPU/GPU utilization.
We can bring up new servers depending on the amount of broadcasting jobs, and during the peak times, we have 8-10 servers running simultaneously. A single zone can easily handle hundreds of simultaneous live broadcasts.
Network of zones
To meet our global demand, we have wired together multiple availability zones as follows:
The application is distributed in multiple geographical regions around the world. Each region has two availability zones fronted by load balancers which provide two static IPv4 addresses for load balancing and failover.
The following happens when a new video ingest job is handed to the system:
- We look for a region closest to the end user requesting the stream
- We pick the more performant availability zone (from the two)
- The ingest stream is transcoded into tiny HLS segments
- The segments are placed on a CDN for global delivery
In case of an ingest error, we fall back on the other availability zone and in a rare scenario where both zones are down, we will route to a second closest geographical region.
Currently, we are happy users of Amazon Global Accelerator which does most of the heavy lifting.
High video quality
With this global setup, we can see higher bitrates since the ingest streams are automatically processed by the closest healthy endpoint and with the help of auto-scaling groups we can have multiple servers handling the same ingest stream. The quickly transcoded segments are placed on the CDN, so the globally distributed end users can enjoy the video as if originating from a nearby location.
Our health check mechanism can automatically transfer failed jobs into the closest healthy endpoint. There is always a server processing the stream so that we can avoid the fatal black screen error.
Continuous software updates
We can update the service dynamically without impacting the running instances. We can add more transcoding clusters without service interruption. Also, we don't have to worry about updating DNS records.
Like all software, we can always improve. We will introduce new regions, particularly for the growing needs of North America. Moreover, to handle even more massive spikes in traffic, we'll set up load balancing across all the geographical regions.