Contributed by @voluntas.
Shiguredo Inc. develops and provides a software package called WebRTC SFU Sora (Sora) and its cloud service. WebRTC (Real-Time Communication) is a technology for exchanging voice, video, and data in real time over P2P.
Sora is a WebRTC SFU (Selective Forwarding Unit), which is different from P2P communication in that it delivers audio and video data “via server.” With Sora, even if the number of viewers increases, the distributor does not need to send audio, video, and data to all viewers, but can distribute them to many viewers at once in real time via Sora. Sora allows real-time delivery to many viewers at once.
Shiguredo has been developing Sora since 2015, and its customers use Sora not only for web conferencing systems but also to realize various remote services.
What is expected of our product
The products we provide must be able to stay connected. Therefore, measures against failures and load are very important.
For example, if you are remotely operating a bulldozer or remotely assisting an air conditioner installer with instructions, your business will be disrupted if you suddenly lose connection. Or when a doctor is examining a patient remotely and suddenly gets disconnected, it is a big problem.
We also believe that real-time communication technology, which is now deeply ingrained in our daily lives, should not be expensive.
For this reason, we have adopted a combination of Tailscale and bare-metal servers for our cloud service to achieve both stability of continuous connection and cost effectiveness of inexpensive use.
Failure and load countermeasures when using bare-metal servers
Sora’s cloud service uses Sora’s own cluster function, which is a mechanism to achieve load balancing and availability using Raft. This cluster function is built on bare-metal servers to provide the service.
Sora uses a lot of network bandwidth and CPU for encryption because of the real-time exchange of voice, video, and other data through the servers. For this reason, we chose to use a service called DataPacket, which offers flat-rate network bandwidth and high-performance bare-metal servers at a reasonable price.
Bare-metal servers offer significant savings on network and server usage fees when compared to cloud services. However, in the event of an outage or increased load, trying to procure additional bare-metal servers can be time-consuming, in DataPacket’s case as quickly as four hours or more, and in slower cases 24 hours or more.
We chose Tailscale as our solution. With Tailscale, we decided to build a cluster of Sora servers on top of the cloud service servers, which can be provisioned quickly, and the bare-metal servers on DataPacket, as if they were on the same network.
This way, when the load increases, we can choose to have the cloud service servers join the cluster temporarily. Or, in the event of a failure, we can put up a replacement server until the bare-metal server is restored.
By clustering on Tailscale, you can use multiple services without having to be aware of the network between them.
Thanks to Tailscale, we are able to keep the cost of our services low, and we are also able to increase availability. All of our servers have Tailscale installed, and not only the servers themselves, but also the service operators can access them securely via Tailscale SSH. Of course, all server monitoring is also built on Tailscale.
The combination of Tailscale and bare-metal servers allows us to provide high-quality services while keeping prices low.
Erlang/OTP and Tailscale
There is another advantage we have with Tailscale.
Erlang/OTP comes standard with a distribution feature that makes it easy to communicate with other Erlang/OTP nodes. However, Erlang/OTP’s communication for distribution is not encrypted by default. If encryption is desired, TLS communication over Erlang/OTP must be provided, which complicates certificate setup and operation.
We chose to use Erlang/OTP’s distributed functionality on the Tailscale network instead of using the distributed TLS provided by Erlang/OTP. This eliminates the need for Erlang/OTP to ensure the communication encryption of Erlang/OTP’s distributed functionality.
Below are the ping values between the servers we are actually running. All of these are being exchanged on Tailscale.
DataPacket (Tokyo) → Vultr (Tokyo)
Via public network.
64 bytes from 100.71.97.122: icmp_seq=3 ttl=64 time=0.857 ms 64 bytes from 100.71.97.122: icmp_seq=4 ttl=64 time=0.756 ms 64 bytes from 100.71.97.122: icmp_seq=5 ttl=64 time=1.41 ms 64 bytes from 100.71.97.122: icmp_seq=6 ttl=64 time=0.888 ms 64 bytes from 100.71.97.122: icmp_seq=7 ttl=64 time=0.892 ms
DataPacket (Tokyo) → Linode (Tokyo)
Via public network.
64 bytes from 100.120.18.29: icmp_seq=3 ttl=64 time=0.894 ms 64 bytes from 100.120.18.29: icmp_seq=4 ttl=64 time=0.777 ms 64 bytes from 100.120.18.29: icmp_seq=5 ttl=64 time=1.00 ms 64 bytes from 100.120.18.29: icmp_seq=6 ttl=64 time=0.962 ms 64 bytes from 100.120.18.29: icmp_seq=7 ttl=64 time=1.67 ms
DataPacket → DataPacket
Via private network.
64 bytes from 100.87.92.1: icmp_seq=3 ttl=64 time=0.698 ms 64 bytes from 100.87.92.1: icmp_seq=4 ttl=64 time=0.731 ms 64 bytes from 100.87.92.1: icmp_seq=5 ttl=64 time=0.633 ms 64 bytes from 100.87.92.1: icmp_seq=6 ttl=64 time=0.724 ms 64 bytes from 100.87.92.1: icmp_seq=7 ttl=64 time=0.598 ms
- Real-time communication services must stay connected.
- Bare-metal servers are used to reduce costs.
- Tailscale allows us to bring another cloud server into the cluster in the event of a bare-metal server failure or overload.
- Using Tailscale, we can guarantee the encryption of the communication part of Erlang/OTP’s distributed functionality.
If you read this article, please consider deploying bare-metal servers with Tailscale. Thanks to Tailscale, you will be able to differentiate yourself from the competition in a secure and cost-effective way.
Shiguredo Inc. @voluntas