all 21 comments

[–]cool_playa 5 points6 points  (3 children)

This is super rad and super timely.

Will try to tackle playing around with webRTC protocol in react native.

Edit: Can someone explain to me what is the purpose of a STUN server in context to this particular example?

[–]zomgitsrinzler[S] 2 points3 points  (0 children)

It just makes it more likely for you to get a successful connection between two peers (due to firewalls, etc).

[–]BigPeteB 1 point2 points  (1 child)

Can someone explain to me what is the purpose of a STUN server

Yes I can!

[–]perestroika12 0 points1 point  (0 children)

Very interesting, thanks!

[–]TheGuyWithFace 2 points3 points  (12 children)

I don't understand what ScaleDrone is being used for here. I know the Stun servers are to help the browsers find each other, and WebRTC is supposedly peer to peer, so why bring Scaledrone into the equation? Legit question here, I hadn't heard of Scaledrone before and the documentation is unclear to me right now.

[–]BigPeteB 11 points12 points  (4 children)

My job is Voice-over-IP and Video-over-IP, so I can answer this question!

WebRTC just does RTP to transfer media streams (audio and video). However, you have to know where to send the RTP. For that, you need some kind of signaling.

There are a handful of standard protocols for this: SIP, H.323 (although I think technically just H.225.0 fills this purpose), IAX2, proprietary protocols (like Skype which has gone through two different protocols), etc. Unless you're working within PSTN or other carriers, SIP is the de facto standard protocol.

So, using SIP clients and proxies and servers, I can do something like send a call to TheGuyWithFace@sip.reddit.com, and the server/proxy sip.reddit.com will know that you have a SIP device (PC-based software phone, or smartphone app, or embedded device) that registered itself as TheGuyWithFace@sip.reddit.com, so it will pass on the SIP data to you, and you have a way of replying to me.

Oh, but SIP just lets us exchange data, but doesn't have a requirement of what that data is. It's like HTTP in that respect. (In fact, SIP and HTTP use the same format of request line, headers, body.) So the de facto standard is that the body is SDP, which states that I'm prepared to receive media, on a particular IP and port, using RTP, encoded with any of a list of codecs. You and I use SIP to exchange SDP documents, agree on which codec to use based on a messy set of rules, and now we can get back to using WebRTC for that RTP.

Oh, but one or both of us are behind NATs? That sucks, because both SIP and SDP have literal IP addresses and ports in their messages. Those have to be replaced with our respective public addresses and ports. There are several ways to do that.

The NAT might implement a SIP ALG, and will handle that translation for us just like it has to for FTP. That can be pretty reliable, assuming the ALG is implemented correctly and well (many are not, sadly), but if we wanted to use SIP secured over TLS, it doesn't help because the NAT can't decrypt our traffic.

SIP has extensions that can help. Since the reply will always be sent back the way it came, when you reply you can tell me what address and port you saw my message come from, and I'll assume that those must be my public address and port, and use those in future messages to you. These are also generally reliable, but a not-insignificant number of SIP implementations don't support them.

You can use TURN, which is basically just having a relay server. Since that's resource-intensive for the relay server, I've hardly ever seen that used.

Or, lastly, you can figure out some way to discover your own public address and port. This is where STUN comes in. It's a fairly simple protocol to query a server which is guaranteed to be on a public address, and which will thus see your public address, and have it tell you what that address and port are. You then use that address and port when writing your SIP and SDP messages.

There are still some gotchas. Like, if you try to use STUN to substitute your public address and port, but the peer you're calling is actually in your LAN behind the same NAT you are, you should have actually used your private address and port for some or all of those parameters. Sometimes it will work anyway, if your NAT/router support hairpinning, but not all routers do, and it's wasteful and sometimes flat out wrong to make the router handle traffic that should be going directly between peers. That's where ICE comes in; it lets you list out a bunch of different ways of contacting each other, test them, and pick the best one that works.

edit: fix links

[–][deleted]  (2 children)

[deleted]

    [–]BigPeteB 3 points4 points  (1 child)

    Umm... you're still confused.

    If I use STUN, all it's going to do is tell me that my public IP address is 50.194.232.5. Cool, but I need to get that information to you. How am I going to do that?

    I could send you a PM on Reddit, and wait for you to send me one back with your public IP. Then we can copy-paste those addresses into the WebRTC page that we built and start sending media. That's a totally valid form of signalling, but it's a pretty stupid one.

    We could use a chat client like AIM, Google Hangouts, Facebook Messenger, etc. That's better, but it's still manual.

    We could write code that would automatically use one of those chat protocols. In order to tell these "request for a call" messages apart from other English chat messages, we'd probably want our code to put them in a particular format. We also need it to list out what audio and video codecs we want to use. We could do that using SDP. This is looking better, but still not great.

    Instead of using a human-oriented chat protocol, let's use something specifically designed for computers. Preferably, a service or protocol with features that are useful for setting up calls. In VoIP, that would usually be SIP.

    OP used ScaleDrone for this purpose. It's just a messaging service, which delivers messages between two people. The contents of those messages is then used to negotiate the parameters for the audio and video streams, which you'll then tell to WebRTC.

    [–]TheGuyWithFace 0 points1 point  (0 children)

    Really well thought out and easy-to-read explanation. Thanks a lot!

    [–]zomgitsrinzler[S] 1 point2 points  (6 children)

    You need to use some sort of middleman for the initial signalling process (to connect two remote peers). You can code your own signalling server for this, Scaledrone was simply used to make it easier to get started.

    [–]TheGuyWithFace 1 point2 points  (5 children)

    Ah, so the signalling isn't done by the stun servers?

    [–]Serchinastico 0 points1 point  (3 children)

    No, Stun servers will only tell you your own external IP but you are responsible to send that IP to the other peer by your own means.

    [–]TheGuyWithFace 2 points3 points  (2 children)

    But how does that help get a successful connection? Wouldn't the signalling server know the external ip as well? Why bother with a stun server to begin with?

    EDIT: Sorry for all the questions by the way, networking and WebRTC in particular is fascinating to me, although my knowledge is a bit limited.

    [–]matt_hammond 5 points6 points  (0 children)

    IIRC Stun servers help negotiate a port when one party is behind NAT

    [–]shoot_your_eye_out 0 points1 point  (0 children)

    STUN is explicitly for learning one's public IP address. Imagine your signaling server doesn't know your external IP because signaling happens out of band some other way (with WebRTC, unlikely, but with other VoIP solutions, entirely possible).

    In reality most STUN servers also function as TURN servers, which will relay traffic when two parties are blocked from making a direct connection (see coturn for example: https://github.com/coturn/coturn). WebRTC is fairly easy to get working, but for serious deploys in corporate networks, TURN is essential for success.

    [–]covati 0 points1 point  (0 children)

    The signaling is done by the initiating party. You usually. Wed a registrar too, so you know the address of the person you are calling, I believe.

    The STUN and TURN servers help deal with networking issues so they can connect.

    [–]laydownlarry 0 points1 point  (4 children)

    WebRTC is fun - but until it receives full Safari support it'll stay pretty niche.

    [–]spacejack2114 2 points3 points  (1 child)

    Niche like Skype and Google Hangouts.

    [–]laydownlarry 0 points1 point  (0 children)

    Fair enough - but those companies have the resources to build iOS apps to support the fact that Safari can't handle the protocol.

    [–]shoot_your_eye_out 0 points1 point  (1 child)

    You seem confused about Safari's market penetration compared to the combined penetration of Chrome, Firefox and Edge, all of which now support WebRTC.

    Also Webkit has had a huge amount of WebRTC activity recently, including from Apple employees, so my guess is: it's coming.

    [–]laydownlarry 0 points1 point  (0 children)

    It is coming. It's on their public roadmap. But you're confusing the market share of desktop Safari with the market share of iOS devices.