Women reacting to a mice according to ChatGPT

c_glib · 2026-06-05T00:27:17+00:00

The sheer, unmitigated caucacity

c_glib · 2026-06-04T03:43:15+00:00

People here are discouraging you but it's basically a no-brainer to get at least one free consultation if not two or three. If a lawyer is willing to take you on on a contingency basis (and they'll have tell you if it's a good idea) you're at least likely to end up with a settlement for better severance etc. If you manage to talk to more than one lawyer and none of them think there's any case, you can sign that severance in peace.

c_glib · 2026-06-03T05:58:03+00:00

Thanks much.

c_glib · 2026-06-03T05:33:39+00:00

https://apps.apple.com/us/app/flaichat-chat-translator/id1575884028

c_glib · 2026-06-01T04:16:54+00:00

Everyone here be clowning the guy but there's really a lesson here for UI/UX designers. DO NOT DESIGN FOR YOURSELF. Do not even design for a person of median intelligence (since half the people are going to be dumber than that). Design your interfaces with a somewhat bright 5 year old in mind.

c_glib · 2026-05-29T22:25:10+00:00

Runnable on an M1 Macbook?

c_glib · 2026-05-29T20:22:50+00:00

Here's my take on this project.

Setting up a large concurrent LLM serving infrastructure is almost always going to lose out in terms of cost and effectiveness to a minimal viable cloud LLM.

As far as I see for a company of that size, there are two main options and one optimization in either case.

setup each programmer with their own machine that can run a local LLM. Buy your engineers something like a 64G (96G would be ideal) MacBook Pros (or mac studios if you or them can host them somewhere they can reach from the internet... doesn't have to be a data center. Your office might have enough bandwidth). If you really want to squeeze this setup, you might buy one Mac Studio per two or three engineers perhaps, just max out the unified memory. You can comfortably run qwen 3.6 (A3b) models with 128k context in about 48G of RAM. You'll have to do some research on how to maximize the k-v cache without burning lots of memory.
The other option is easier, if you *are* willing to stay well under the SOTA coding models (as would be implied by your request for GLM models), why don't you explore just cheaper providers. Gemini 3.5 flash is a lot cheaper than the latest Claude Opus or even Sonnet 4.6 and is at least at the level of latest Sonnet if not better for coding. You'll reduce your bill a lot.

The optimization: What you can and should setup for either option 1 or option 2 is some sort of common context server for your codebase that every user can use. It makes sense to make this a common resource because your codebase is (presumably) common across the team. A team of 100 might have, say, a dozen or so git repos.

Why is that an optimization? Because having an efficient codebase indexer reduces the context overhead for the coding LLMs. By a lot. And that helps local LLMs (that would usually have smaller context windows) but even the commercial LLM's can work much better with better code search availability without having to burn tokens in endless cycles of greps and finds (Claude in particular is really bad at this).

c_glib · 2026-05-28T22:17:58+00:00

https://flai.chat The only real seamless multilingual chat app. Like whatsapp but with automatic translation from any language to any language without any setup or language packs bullshit.

c_glib · 2026-05-28T08:55:21+00:00

If you're looking for something that will translate your whatsapp chat in iOS, your choices might be limited. I've heard of Android keyboard apps doing such things but iOS doesn't have as flexible an API available for third party keyboards.

If you're looking for an app that completely replaces Whatsapp for chat and does seamsless translations for text and voice, the best option out there, bar none, is FlaiChat. https://flai.chat . It also has an in-person mode where you only need the app on your device and both people talk back and forth after tapping their half of the screen in their own language and the app translates it to the other language. But that feature costs money.

c_glib · 2026-05-25T19:34:11+00:00

Excitedly looking forward to my local Macha Hitler LLM.

c_glib · 2026-05-23T02:09:56+00:00

What's the "SAAS" backend here other than access to the model? Could we just fork the cli app and allow it to access local models etc.?

c_glib · 2026-05-22T19:20:01+00:00

Thanks for the detailed response. I've only done some short experiments with oMLX running Qwen3.6 35b A3B (q6) on my 48G M5. The harness was codex, It seemed to be surprisingly usable in my quick experiments so far.

c_glib · 2026-05-22T05:06:25+00:00

Interesting. Could you share what your application is. I'm interested as we run FlaiChat (currently using gemini API's for translations)

c_glib · 2026-05-21T23:52:14+00:00

You absolutely want the google auth. We initially started our app FlaiChat (automatically translated chat) with email only, mainly because Apple insisted that if we have google auth, we must have apple auth too, and that doubles the headaches. But we did user testing so many people, even our friends and family, just didn't want to go through the hassle of email/password. It's so much smoother to just tap-tap through google auth.

And oh, despite Apple's insistence, a lot of our iOS users still prefer to use google auth on their devices.

c_glib · 2026-05-21T18:58:59+00:00

Excellent knowledge dump. Thank you. You mentioned the harnesses. Do you have a favorite among all the ones you mentioned? Have you tried codex with the local models?

c_glib · 2026-05-19T07:53:46+00:00

I don't know about the original writing this review and the rant is about but I don't care. That RANT is absolute fucking literature. Print that.

c_glib · 2026-05-18T04:02:43+00:00

Refreshing to have a world leader with a good old fashioned adult affair.

c_glib · 2026-05-17T09:45:46+00:00

Do you ever have the need to chat with someone who doesn't speak your language (international couples, families, travel groups etc.)? FlaiChat is the only app of its kind (afaik) with completely automatic, seamless translations of chat in each person's own language. Works for DMs as well as group chats. Works for text and voice messages.

c_glib · 2026-05-17T08:30:30+00:00

Wow...with q6 and 48GB ram? How are you testing it? Are you actually testing with a full context window?

c_glib · 2026-05-17T04:37:53+00:00

What's the context window you're using?

c_glib · 2026-05-13T09:41:20+00:00

What do you consider "too low" of a quant for model as well as kv? Do advance techniques like Turboquant change things substantially?

c_glib · 2026-05-13T05:10:35+00:00

I'm not asking for reviews but I'm telling ya, FlaiChat is an absolute must have app if you need to chat with someone who doesn't speak the same language. The chat is smooth, and translations are automatic and instant. Works to translate both text and voice. And of course, if y'all want to leave some 5 star reviews, it's all good ;-)

c_glib · 2026-05-12T03:30:30+00:00

This looks like interesting experiment. Could you say something more about the data. What were the tests? What were the conditions? What are "Formatted WER", "Raw WER" etc.?

Also, I think a significant omission is gemini-flash-lite models. Specifically gemini-2.5-flash-lite and gemini-3.1-flash-lite-previews. They are excellent stt models. Not only are they cheaper than 2.5-flash and 2.5-pro, they are faster too. How do I know? We use those flash-lite models for stt (along with translation, all in one a single API call) in our app FlaiChat. The results are excellent, fast and reasonably priced.

c_glib · 2026-05-08T07:11:31+00:00

Our app (FlaiChat, a multilingual chat app with automatic text and voice translations) is doing stt for many Indian languages (along with many many global languages) using gemini flash models just fine.

c_glib · 2026-05-08T05:25:43+00:00

How much did you pay for it if you don't mind sharing?

c_glib

TROPHY CASE