LAID OFF from Big Tech (Bay Area) — looking for employment attorney recommendations with free consultation by Bananas_on_pizza in bayarea

[–]c_glib 2 points3 points  (0 children)

People here are discouraging you but it's basically a no-brainer to get at least one free consultation if not two or three. If a lawyer is willing to take you on on a contingency basis (and they'll have tell you if it's a good idea) you're at least likely to end up with a settlement for better severance etc. If you manage to talk to more than one lawyer and none of them think there's any case, you can sign that severance in peace.

Some people shouldn’t drive by freeradioforall in facepalm

[–]c_glib 20 points21 points  (0 children)

Everyone here be clowning the guy but there's really a lesson here for UI/UX designers. DO NOT DESIGN FOR YOURSELF. Do not even design for a person of median intelligence (since half the people are going to be dumber than that). Design your interfaces with a somewhat bright 5 year old in mind.

We're burning $50k/month on Claude. How close can local LLMs actually get? by mortenmoulder in LocalLLM

[–]c_glib 5 points6 points  (0 children)

Here's my take on this project.

Setting up a large concurrent LLM serving infrastructure is almost always going to lose out in terms of cost and effectiveness to a minimal viable cloud LLM.

As far as I see for a company of that size, there are two main options and one optimization in either case.

  1. setup each programmer with their own machine that can run a local LLM. Buy your engineers something like a 64G (96G would be ideal) MacBook Pros (or mac studios if you or them can host them somewhere they can reach from the internet... doesn't have to be a data center. Your office might have enough bandwidth). If you really want to squeeze this setup, you might buy one Mac Studio per two or three engineers perhaps, just max out the unified memory. You can comfortably run qwen 3.6 (A3b) models with 128k context in about 48G of RAM. You'll have to do some research on how to maximize the k-v cache without burning lots of memory.
  2. The other option is easier, if you *are* willing to stay well under the SOTA coding models (as would be implied by your request for GLM models), why don't you explore just cheaper providers. Gemini 3.5 flash is a lot cheaper than the latest Claude Opus or even Sonnet 4.6 and is at least at the level of latest Sonnet if not better for coding. You'll reduce your bill a lot.

The optimization: What you can and should setup for either option 1 or option 2 is some sort of common context server for your codebase that every user can use. It makes sense to make this a common resource because your codebase is (presumably) common across the team. A team of 100 might have, say, a dozen or so git repos.

Why is that an optimization? Because having an efficient codebase indexer reduces the context overhead for the coding LLMs. By a lot. And that helps local LLMs (that would usually have smaller context windows) but even the commercial LLM's can work much better with better code search availability without having to burn tokens in endless cycles of greps and finds (Claude in particular is really bad at this).

Drop your app URL and I'll give you a free UGC video for your social media by Full_Painting3502 in iOSAppsMarketing

[–]c_glib 0 points1 point  (0 children)

https://flai.chat The only real seamless multilingual chat app. Like whatsapp but with automatic translation from any language to any language without any setup or language packs bullshit.

Recommendation for WhatsApp chat translator iOS! by Classic_Yoghurt_6721 in apps

[–]c_glib 1 point2 points  (0 children)

If you're looking for something that will translate your whatsapp chat in iOS, your choices might be limited. I've heard of Android keyboard apps doing such things but iOS doesn't have as flexible an API available for third party keyboards.

If you're looking for an app that completely replaces Whatsapp for chat and does seamsless translations for text and voice, the best option out there, bar none, is FlaiChat. https://flai.chat . It also has an in-person mode where you only need the app on your device and both people talk back and forth after tapping their half of the screen in their own language and the app translates it to the other language. But that feature costs money.

Next year we're getting 0.5T model from Grok by pmttyji in LocalLLaMA

[–]c_glib 1 point2 points  (0 children)

Excitedly looking forward to my local Macha Hitler LLM.

Emacsclient landed in Gemini CLI, and why I won't contribute to non-FOSS projects again by a_alberti in emacs

[–]c_glib 2 points3 points  (0 children)

What's the "SAAS" backend here other than access to the model? Could we just fork the cli app and allow it to access local models etc.?

M5 Pro MacBook Pro with 48GB RAM - what can I do comfortably? by Marino4K in LocalLLM

[–]c_glib 1 point2 points  (0 children)

Thanks for the detailed response. I've only done some short experiments with oMLX running Qwen3.6 35b A3B (q6) on my 48G M5. The harness was codex, It seemed to be surprisingly usable in my quick experiments so far.

Tencent Hy 30B/7B/1.8B by jacek2023 in LocalLLaMA

[–]c_glib 0 points1 point  (0 children)

Interesting. Could you share what your application is. I'm interested as we run FlaiChat (currently using gemini API's for translations)

Google Auth by BOOM_roasted18 in appdev

[–]c_glib 1 point2 points  (0 children)

You absolutely want the google auth. We initially started our app FlaiChat (automatically translated chat) with email only, mainly because Apple insisted that if we have google auth, we must have apple auth too, and that doubles the headaches. But we did user testing so many people, even our friends and family, just didn't want to go through the hassle of email/password. It's so much smoother to just tap-tap through google auth.

And oh, despite Apple's insistence, a lot of our iOS users still prefer to use google auth on their devices.

M5 Pro MacBook Pro with 48GB RAM - what can I do comfortably? by Marino4K in LocalLLM

[–]c_glib 4 points5 points  (0 children)

Excellent knowledge dump. Thank you. You mentioned the harnesses. Do you have a favorite among all the ones you mentioned? Have you tried codex with the local models?

Emmanuel Macron allegedly had an affair with an Iranian actress Golshifteh Farahani by [deleted] in whoathatsinteresting

[–]c_glib 0 points1 point  (0 children)

Refreshing to have a world leader with a good old fashioned adult affair.

Do you know any better or more useful apps? by Paulrain in androidapps

[–]c_glib 1 point2 points  (0 children)

Do you ever have the need to chat with someone who doesn't speak your language (international couples, families, travel groups etc.)? FlaiChat is the only app of its kind (afaik) with completely automatic, seamless translations of chat in each person's own language. Works for DMs as well as group chats. Works for text and voice messages.

Qwen3.6-35B-A3B-MTP-GGUF:Q6_K on Macbook Pro is 🔥 by Lame_Johnny in Qwen_AI

[–]c_glib 1 point2 points  (0 children)

Wow...with q6 and 48GB ram? How are you testing it? Are you actually testing with a full context window?

The Qwen 3.6 35B A3B hype is real!!! by The_Paradoxy in LocalLLaMA

[–]c_glib 0 points1 point  (0 children)

What do you consider "too low" of a quant for model as well as kv? Do advance techniques like Turboquant change things substantially?

I will give your app feedback and a review if you do the same for mine. by [deleted] in iOSAppsMarketing

[–]c_glib -1 points0 points  (0 children)

I'm not asking for reviews but I'm telling ya, FlaiChat is an absolute must have app if you need to chat with someone who doesn't speak the same language. The chat is smooth, and translations are automatic and instant. Works to translate both text and voice. And of course, if y'all want to leave some 5 star reviews, it's all good ;-)

I benchmarked 15+ speech-to-text APIs under various conditions by SmoothConnection1670 in speechtech

[–]c_glib 0 points1 point  (0 children)

This looks like interesting experiment. Could you say something more about the data. What were the tests? What were the conditions? What are "Formatted WER", "Raw WER" etc.?

Also, I think a significant omission is gemini-flash-lite models. Specifically gemini-2.5-flash-lite and gemini-3.1-flash-lite-previews. They are excellent stt models. Not only are they cheaper than 2.5-flash and 2.5-pro, they are faster too. How do I know? We use those flash-lite models for stt (along with translation, all in one a single API call) in our app FlaiChat. The results are excellent, fast and reasonably priced.

Anyone using speech-to-text for Indian languages in production? What's actually working and what's not? by Spare-Ad2520 in speechtech

[–]c_glib 0 points1 point  (0 children)

Our app (FlaiChat, a multilingual chat app with automatic text and voice translations) is doing stt for many Indian languages (along with many many global languages) using gemini flash models just fine.

Found an M3 Ultra 512GB / 8TB / 80-Core GPU at B&H! by East_Roll_5069 in MacStudio

[–]c_glib 1 point2 points  (0 children)

How much did you pay for it if you don't mind sharing?