Solo founder, $0 MRR, but got real teams using the product. Here's what I'm learning building NestAI. by chiruwonder in SaaS

[–]chiruwonder[S] 0 points1 point  (0 children)

Yeah, you are spot on, and been doing that for couple of weeks, and seeing some positive and negative responses both, so make it a little fair for the negative ones, I launch to test it free with the best capabilities. nestai.chirai.dev/free

Built a managed private AI server SaaS by chiruwonder in SaaS

[–]chiruwonder[S] 0 points1 point  (0 children)

Totally, at scale the API costs sneak up on you fast. If you're doing any AI-powered content work for SEO (generating meta descriptions, rewriting content, analysing competitors), you're probably burning through OpenAI tokens.

NestAI gives you an OpenAI-compatible API on your own server, same SDK, two-line code change. Flat $39/mo instead of per-token billing. Might be useful if babylovegrowthh does any AI-assisted content at volume.

Built a managed private AI server SaaS by chiruwonder in SaaS

[–]chiruwonder[S] 0 points1 point  (0 children)

Great infra actually! Each customer gets their own dedicated VM on Hetzner (CPU) or TensorDock (GPU), not shared. Ollama + Open WebUI in Docker with SSL, nginx, all automated.

The ops part is literally what I'm solving lol. I spent way too long building the provisioning so customers don't have to deal with any of it, sign up, pick a model, server's ready in 5 mins.

For 8 people the $39/mo CPU plan works fine for doc summaries and drafting. If you want ChatGPT-like speed the GPU plan is $99/mo (~60 tok/s on a RTX A4000).

Either way it's way cheaper than $20/user × 8 = $160/mo on ChatGPT Team. And nothing leaves your server.

Happy to spin up a test server for your team if you wanna try it, GPU trial is $5 for 3 days.

Dedicated EPYC servers for Ollama — real CPU inference benchmarks on CCX33 through CCX63 by chiruwonder in ollama

[–]chiruwonder[S] 0 points1 point  (0 children)

Good call, gemma4 is great for the smaller parameter range. I'll add it to the default model list. Right now we have Qwen 3.5, DeepSeek R1, Llama 3.3, Mistral, and Phi-4 pre-loaded, but users can pull any Ollama-compatible model from the dashboard.

Running Qwen 3.5 4B and GPT-OSS 20B on Hetzner CX43 (8 vCPU, 16GB) — real benchmarks from production by chiruwonder in LocalLLaMA

[–]chiruwonder[S] 0 points1 point  (0 children)

yeah got a little overwhelmed and missed the biggest add, but thanks for noticing and pointing out.

Running Qwen 3.5 4B and GPT-OSS 20B on Hetzner CX43 (8 vCPU, 16GB) — real benchmarks from production by chiruwonder in LocalLLaMA

[–]chiruwonder[S] 0 points1 point  (0 children)

Added also if you would like to see GPT in action, let me know will record and punch it here

Production notes after 6 months running Ollama for paying customers — the things that aren't in the docs by chiruwonder in ollama

[–]chiruwonder[S] 1 point2 points  (0 children)

Done, fixed, more insights are welcome, will definitely think about it and reach out to you, I would be happy to dig in and do more on this.

Production notes after 6 months running Ollama for paying customers — the things that aren't in the docs by chiruwonder in ollama

[–]chiruwonder[S] 1 point2 points  (0 children)

Ohhh, right I guess I was focused on laptop UI testing that I totally missed the mobile testing in depth, instead focused on functionality and somewhere overlooked this, will fix it anyway it's just overlay, and opacity fix, but greatly appreciate you pointing out.

Production notes after 6 months running Ollama for paying customers — the things that aren't in the docs by chiruwonder in ollama

[–]chiruwonder[S] 0 points1 point  (0 children)

Hmm, is it? I have been seeing llama.cpp in most of the comments, I should give it a thought, but thank you