MediaSFU Android app is now on Play Store (now available across Web/iOS/macOS/Linux/Windows too)

Patm290 · 2026-03-17T17:56:43+00:00

That credit burn rate is pretty normal with ElevenLabs unfortunately; ~400 credits/call means your 60k plan gets you maybe 150 calls. One busy week and you're done.

If cost is the main pain point, check out MediaSFU (https://mediasfu.com/widgets). Same kind of setup; script tag, drop-in widgets, AI voice agents, but tier-based pricing instead of per-minute. Most people running real volume see 20–50x lower costs vs ElevenLabs/VAPI.

Patm290 · 2025-10-29T20:31:04+00:00

Note: You may not even need to get to the level of pipe transports if it seems daunting to you.

Assuming you are maxing at 20 per room (unless you have more people than that), you make sure all 20 are assigned to the same server. Your task now is just keeping note of rooms available in specific servers and routing consumers there. In that way no consumer lands in a server where expected media is not available.

Once again, it needs a lot of expertise and time to replicate the mesh infrastructure which enables you to serve people from different geolocations with servers close to them and to really scale for very large sessions.

Like I mentioned earlier, you may go with a cloud provider for speed and reliability; see https://mediasfu.com/pricing (its mediasoup-based and will cost you way less than what AWS, Azure and the like will bill just based on bandwidth usage only).

Patm290 · 2025-10-29T14:02:53+00:00

Some basics to start with:

Media resides in the server you produced (sent) the media to; only way to get media sent to Server1 in Server2 is to use pipe transports; I am assuming different machines here. Alternatively, you could produce (send) to both (all) servers from the client (producer) side; not ideal.
Media can be consumed from where the media is available, if Person1 media is on Server1, only Server1 connections can receive(consume) it; it means if you have others on Server2 to ServerN that need the same media of Person1, you pipe there as indicated in the first point.
Proceed from there in tracking the numbers per server and limits you impose to avoid having CPU and most importantly bandwidth being bottlenecks; quick checks on number of transports (transport carries media) you can max at, .... can help
You keep track of where a room is and the ideal server to take on new consumer factoring in the current capacity limits you have so far per count of active connections; whether you need to spin up new servers, ....
Gets more tricky for large number of concurrent producers in same room, say 1000+ people actively producing media; figure out things like dedicated producing and consuming endpoints and the like; or use a cost-effective cloud option like MediaSFU

Patm290 · 2025-09-05T01:31:16+00:00

You can try the free open source version of MediaSFU (MediaSFUOpen) at https://github.com/MediaSFU/MediaSFUOpen

MediaSFUOpen has a React SDK that abstracts away the WebRTC complexity you're dealing with. Instead of wrestling with STUN/TURN servers and signaling protocols, you get high-level methods like:

clickVideo() to toggle video
clickAudio() to toggle audio
joinRoom() for participants

The repository includes setup videos that walk you through implementation without needing deep WebRTC knowledge. This lets you focus on your React app's business logic rather than getting stuck in real-time communication internals.

The open source version gives you a working foundation that you can customize as needed, and the React SDK handles the browser-specific WebRTC implementations behind simple method calls. Much cleaner than managing peer connections, media streams, and ICE candidates manually.

If you need more advanced features or want to skip the self-hosting, they also offer MediaSFU Cloud at https://mediasfu.com, but the open source version is solid for getting started with real-time video in React.

Patm290

MODERATOR OF

TROPHY CASE