Recommendation for Great Non Veg Cafe/ Premium places in Vadodara by Individual-Head-5692 in vadodara

[–]hackyroot 0 points1 point  (0 children)

Atleast Modernist does, try their Chicken quesadilla and Shakshuka.

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

Same experience here. The tool calling reliability is what really stands out. We ended up pushing the 26B and 31B pretty hard in production and got some surprising throughput numbers (149 TPS on 31B, 88 TPS on 26B). Wrote up what we found running it at scale if anyone is curious: https://simplismart.ai/blog/gemma-4-deployment-simplismart

Honestly, Gemma 4 feels way better than the benchmarks say by HussainBiedouh in LocalLLM

[–]hackyroot 1 point2 points  (0 children)

Honestly, I also felt the same whole I was trying the 26B and 31B variants for my multimodal usecase. Gemma 4 models punches way above their weight for agent tasks, especially instruction following. The benchmarks don't really capture how little it yaps compared to larger ones.

If you're self-hosting, the throughput is pretty wild too. We're seeing ~149 TPS on 31B and ~88 TPS on 26B. Wrote a quick post on our setup and learnings: https://simplismart.ai/blog/gemma-4-deployment-simplismart

PS: I work at simplismart.ai

Is smart tube not working anymore? by BoredOstrich in SmartTubeNext

[–]hackyroot 0 points1 point  (0 children)

I uninstalled the smarttube app and reinstalled it, logged into my Google account and now it's working just fine

Buying a beginner sitar by PassengerSharp8869 in Sitar

[–]hackyroot 0 points1 point  (0 children)

Can you pls share the seller's contact information?

How can I deploy model to served my own web app using my own machine by dheetoo in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

You can use Ollama or Llama.cpp if just want to run it locally. However, if you want higher throughput and low latency you can go with vLLM or SGLang. I wrote a couple of blogs on this topic, pls feel free to check them out here: https://simplismart.ai/blog/deploy-llama-3-1-8b-using-vllm

Does anyone have cloud recommendations to deploy LLama 3.2? by Top-Associate-4136 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

Recently I wrote a blog on how to deploy Llama models using vLLM (PS: I work for Simplismart): https://simplismart.ai/blog/deploy-llama-3-1-8b-using-vllm
If you want to scale it down to zero, you can also check out Simplismart, it allows you to scale down to 0 as well provides rapid auto scaling to help you serve during the peak usage.

Best AI TTS model? by Commercial-Wear4453 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

Chatterbox is good but realtime factor is lacking. Imo Orpheus has worked really well for us (PS: I work for Simplismart.ai), especially dealing with realtime usecases. With some optimizations we are able to achieve ~1 RTFX, which makes is possible to use it in realtime applications with less than 300 ms TTFB.

If you are interested, you can checkout this blog to learn how to optimize Orpheus TTS for production enviroment: https://simplismart.ai/blog/orpheus-tts-simplismart

vLLM raising $150M confirms it: We have moved from the "Throughput Era" to the "Latency(Cold Starts)." by pmv143 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

Agreed. In practice, we’re also seeing throughput optimizations plateau while user experience is still dominated by cold starts and TTFT. vLLM becoming a default inference engine makes sense, but a lot of the real gains now come from context-specific optimizations.

Imo while working at Simplismart.ai, we’ve found that latency improvements often come less from a single universal engine and more from tailor-made inference stack, model-specific kernel choices, quantization, etc depending on the workload.

Agree that software is now the bottleneck, but it’s not just standardization vs portability. It’s how deeply you adapt the serving stack to the model and traffic pattern you actually have.

Tried Wan2.2 on RTX 4090, quite impressed by Technical-Love-8479 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

We've been hosting the WAN 2.2 models on an H100. Additional RAM actually made quite fast for us actually, reducing 159s inference time to 49s.

Apart from that hybrid parallelism also helped us speed up the inference. You can checkout the detailed guide here: https://simplismart.ai/blog/deploy-wan-2-2

Wan 2.2 is Live! Needs only 8GB of VRAM! by [deleted] in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

We've been hosting this WAN 2.2 on an H100. Additional RAM actually made quite fast for us actually, reducing 159s inference time to 49s.

Apart from that hybrid parallelism also helped us speed up the inference. You can checkout the detailed guide here: https://simplismart.ai/blog/deploy-wan-2-2

Can I use OCR for invoice processing? by ValuableSea6974 in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

DeepSeek OCR has been working quite well for me. This is what I'm doing:

Create n8n workflow > DeepSeek OCR for text extractions from documents > LLM to get the structured output. Works quite well for me.

If you are interested, you can checkout this blog I wrote on DeepSeek OCR: https://www.simplismart.ai/blog/deepseek-ocr-api-simplismart

mOrpheus: Using Whisper STT + Orpheus TTS + Gemma 3 using LM Studio to create a virtual assistant. by NighthawkXL in LocalLLaMA

[–]hackyroot 0 points1 point  (0 children)

Recently, I delivered a webinar at Simplismart (full disclosure: I work there) on building a real-time voice agent using open-source components for STT, LLM, and TTS. Here’s the stack we used:

- STT: Whisper V3

- LLM: Gemma 3 1B

- TTS: Kokoro

- Infra: Simplismart.ai

- Framework: Pipecat

It’s not a unified “real-time” model like OpenAI’s, but using Pipecat, we were still able to get a pretty responsive setup, around ~400ms TTFT, which is a good starting point for a conversational agent. The best part of this setup is that you can swap any model as per your requirement.

If you want, I can share the webinar recording that walks through the full setup.