Hey Luke, check out moq.dev (and MoQ in general) if you're considering WebTransport. I totally agree with the issues around WebRTC.
I think WebSockets is actually a pretty good fit considering the current latency/cost of most AI models. You don't want to be too aggressive and drop the output, but that's ingrained into WebRTC. WebTransport won't be an improvement unless you're willing to situationally drop content so keep that in mind.
This is a really good article. WebRTC is like magic for demos and small peer to peer communications but really struggles after say 8 inbound video feeds. It is not the tech that fails. It is that the quality of the meeting is dictated by how well or badly the worst computer in the meeting can decode those incoming streams. I saw that Youtube uses RTSP for the streamer but was at one point using WebRTC for the watchers. That works very well as the server IP is public so you don't get into NAT hell. We're developing AI voice agents and experimenting with surfacing them in browsers but there are lots of extra moving parts to go from PSTN based system to webrtc/browser/stun/coturn/sfu and that is just the delivery layer! Websockets is a much more straightforward approach . The irony with plain old voice is that the customer and client expect it to work 100% of the time. There is zero tolerance for jitter, delay, dropped calls etc. for what is "old" tech.
Love the pragmatism here. WebRTC is brilliant enginering but its a square peg in alot of client-server scenarios. The TURN server debugging nightmares you described are super real - we had similiar issues where 20% black screens with zero error messages drove everyone crazy. Switching to WebSockets cutting out the ICE negotiation dance and getting direct HTTPS proxy support is such a smart move for controlled deployments. Curious about the head-of-line blocking tradeoff tho when packet loss spikes.
Hey Luke, check out moq.dev (and MoQ in general) if you're considering WebTransport. I totally agree with the issues around WebRTC.
I think WebSockets is actually a pretty good fit considering the current latency/cost of most AI models. You don't want to be too aggressive and drop the output, but that's ingrained into WebRTC. WebTransport won't be an improvement unless you're willing to situationally drop content so keep that in mind.
This is a really good article. WebRTC is like magic for demos and small peer to peer communications but really struggles after say 8 inbound video feeds. It is not the tech that fails. It is that the quality of the meeting is dictated by how well or badly the worst computer in the meeting can decode those incoming streams. I saw that Youtube uses RTSP for the streamer but was at one point using WebRTC for the watchers. That works very well as the server IP is public so you don't get into NAT hell. We're developing AI voice agents and experimenting with surfacing them in browsers but there are lots of extra moving parts to go from PSTN based system to webrtc/browser/stun/coturn/sfu and that is just the delivery layer! Websockets is a much more straightforward approach . The irony with plain old voice is that the customer and client expect it to work 100% of the time. There is zero tolerance for jitter, delay, dropped calls etc. for what is "old" tech.
I look forward to seeing what happened!
"Next week: Why we threw all of this away."
Love the pragmatism here. WebRTC is brilliant enginering but its a square peg in alot of client-server scenarios. The TURN server debugging nightmares you described are super real - we had similiar issues where 20% black screens with zero error messages drove everyone crazy. Switching to WebSockets cutting out the ICE negotiation dance and getting direct HTTPS proxy support is such a smart move for controlled deployments. Curious about the head-of-line blocking tradeoff tho when packet loss spikes.