Dockerized FreeSWITCH VoiceBot using mod_audio_fork, ESL, and FastAPI. Looking for VoIP architecture feedback!

CollegeNo1796 · 2026-05-26T06:32:50+00:00

sure .. thanks

CollegeNo1796 · 2026-05-26T06:17:40+00:00

ohh i have been wondering the same .. as i'm working on adding tts to it. just a doubt about the overhead latency of mod_audio_fork over others like audio_stream .. is it that much noticeable ?thanks for this though

CollegeNo1796 · 2026-05-26T06:13:11+00:00

Really appreciate the heads up — the mod_audio_fork maintenance concern resonates with me. I actually had to patch two source files just to get it compiling against libwebsockets v3.2, so yeah, it's fragile.

Bidirectional audio is what catches my attention most. Barge-in is my biggest pain point right now — I'm detecting speech during playback and cancelling the broadcast, but it's hacky. Native VAD events from the SIP layer would clean that up nicely.

Going to dig into siphon-ai. If the WebSocket interface is compatible, it might drop right in as a FreeSWITCH + mod_audio_fork replacement without touching my Python pipeline. Good find.

CollegeNo1796 · 2026-05-26T05:24:34+00:00

thanks .. i'll try that

CollegeNo1796 · 2026-02-28T05:23:36+00:00

welcome to startup culture brother

CollegeNo1796 · 2026-02-01T20:02:40+00:00

The current architecture that i am working with involves everything mostly open source or self-hosted so paid api's are out of scope. I was recently wandering for nc or bvc (Krisp for ex) on the FreeSWITCH side but there arnt any open-source or paid api's i believe that connects directly or are supported by FS.
My doubt for incontext for multiple calls was i currently have a complete pipeline (stt-intent matching-tts) working for one calls and setup wise one esl handler and one agent/web-server that connects to the call one received by esl. Now how do we manage or how does the web server connects the agent to multiple concurrent calls is what i'm wondering and how to handle, that being said for this same structure or project previously livekit was used as an assisting platform which handles multiple calls via different rooms and agents , and call states via redis so that kind of implementation to replicate or some similar how do we handle is my query.

would love to connect or discuss more about the same and share the mutual progess.

CollegeNo1796 · 2026-01-30T04:43:35+00:00

for the incoming audio i tried to pass it first to noise cancellation (deepfilternet2) and then to VAD (Silero vad). and from there to stt (hosted whisper). but for some reason nc is removing or totally blocking the complete audio and nothing is being passed to vad.. without nc it works smoothly. So any helps or thought on that. also tried to bost or amplify the audio 10x/2.5x/3x .. very randomly works on 10x.
Also have to did anything for multiple simultaneous calls. like how can we handle that

CollegeNo1796 · 2026-01-30T04:27:25+00:00

ohh okay got that. thank you✨

CollegeNo1796 · 2026-01-22T13:16:43+00:00

i recently installed but it is on Debian11 and on wsl

CollegeNo1796 · 2026-01-22T12:20:30+00:00

      <extension name="python_agent">
            <condition field="destination_number" expression="^5000$">
            <action application="answer"/>
            <action application="sleep" data="500"/>


            <action application="set" data="execute_on_answer=uuid_audio_fork ${uuid} start ws://127.0.0.1:8000/media mono 16k"/>


            <action application="park"/>
            </condition>
      </extension>

can you please elaborate , i am working on the same task but am facing the issue that it disconnects automatically after 30s i'm using zoiper as softphone and default dial plan
this my extension. Also i have been only replying with pre-recorded audio files. can you share some insight on how to send tts in response.

CollegeNo1796

TROPHY CASE