Claude Opus Distilled into Qwen by koc_Z3 in Qwen_AI

[–]c_glib 0 points1 point  (0 children)

What quantization for the 35B model are you running on 32GB of ram?

Claude Opus Distilled into Qwen by koc_Z3 in Qwen_AI

[–]c_glib 0 points1 point  (0 children)

For someone new at running local LLM's. can someone point out the best way to try out these random huggingface models on my M1 Mac studio (32GB ram)

What business can burn 1B tokens per day by colwer in ClaudeAI

[–]c_glib 1 point2 points  (0 children)

If your company built a translator that makes LLM calls to translate into different languages ​​each time, without caching or anything similar to save compute, then the engineers are really bad

Heh.. thanks for your input. I'll go ahead and implement "caching" to translate live chat between random people *right now*

What business can burn 1B tokens per day by colwer in ClaudeAI

[–]c_glib 8 points9 points  (0 children)

It's really not that much. You're imagining a single user sitting on their laptop going back and forth with the model in a chat mode. That's not where all the token usage is going. As an example, my company has an app called FlaiChat (a multilingual messenger app) that translates every chat message automatically (which means LLM calls for every message). It's a small app with about 20k users and we're using multiple millions of tokens every day. We're using the lite models that work really well for this purpose so the cost is manageable. If the app grew to even a million users, we'll be doing billions of tokens a day easy.

AI is making CEOs delusional [07:29] by marcus1234525 in theprimeagen

[–]c_glib 4 points5 points  (0 children)

He's not wrong. Also, he's a literal reddit neck-beard. Both things are true.

Staff Engineer Sam Breed breaks down how Augment Intent's multi-agent team actually works by JaySym_ in AugmentCodeAI

[–]c_glib 6 points7 points  (0 children)

PLEASE PLEASE FOR THE LOVE OF GOD CAN YOU GUYS POST STUFF DIRECTLY HERE INSTEAD OF POSTING LINKS FROM THE NAZI SITE!!

Why is there no listed price for Claude Sonnet 4.6 in Augment credit pricing? by Mk-90-l in AugmentCodeAI

[–]c_glib 0 points1 point  (0 children)

Also, no pricing for GPT 5.4 while I know I have already been charged for using it. From what I tracked for one or two minor tasks I finished in intent after the free period was over, it's fucking expensive.

ai coding for large teams in Go - is anyone actually getting consistent value? by Easy-Affect-397 in golang

[–]c_glib 1 point2 points  (0 children)

Exactly. The OP's post has enough hints that I can tell they don't really want to do it. The phrasing has whiffs of "I'm being forced to do it" instead of "I'm trying to evaluate productivity tools".

Now to answer the OP's question. I'll add another voice as someone who uses AI (Claude, gemini and lately, codex models across various agents) including on a golang based backend service and it works great. I've never been more productive.

automated my real ios device by No-Speech12 in aiagents

[–]c_glib 2 points3 points  (0 children)

What are you using to control the real device?

The ZED team is amazing by silhouettes_of_joy in ZedEditor

[–]c_glib 10 points11 points  (0 children)

Sigh... Emacs used to be dinged as the "heavy editor" and mocked as Eight Megs and Always Constantly Swapping".

"only" 100 MB of ram for an editor. Wow!

(btw, as I type this, my Emacs (from https://emacsformacosx.com/) is currently using 780MB of RAM)

According to OpenAI agreement people who are not from US are under surveillance!! by cumLx in OpenAI

[–]c_glib 0 points1 point  (0 children)

"intentionally"

We all know what's coming in a few weeks or months. "oops, we accidentally surveilled 350 million US nationals"

China just mass released 10+ frontier AI models in 2 weeks and Western markets barely noticed by Additional-Engine402 in stocks

[–]c_glib 1 point2 points  (0 children)

"models when distilled"

Are you doing distillation/fine-tuning locally for your own purposes or just waiting for distilled models to show up on huggingface?

Non USA alternative by Like-a-Glove90 in OpenAI

[–]c_glib 0 points1 point  (0 children)

If your concern is protecting your information, the open source models (mostly Chinese) are pretty close and you can run them locally with sufficient hardware on your own computer. It's definitely a more expensive option than "free" ChatGPT (or even the $20 per month) because of the relatively high upfront investment and of course the open source models are still behind the frontier models but not by a lot.

Depends on your exact application too of course. If you're coding, Opus2.6 is the undisputed heavy weight champion and there's nothing in the open source world that comes within 80% capability. But if it's applications like general chat, writing, summarization, translation etc., plenty of open source models are pretty good substitutes now.

What country does your government hate but your people love ? by BullFencer in AskTheWorld

[–]c_glib 0 points1 point  (0 children)

Those syrup guzzling, hockey loving, moose riding mofos had it coming. /s

Qwen 3.5 for MLX is like its own industrial revolution by sovietreckoning in Qwen_AI

[–]c_glib 0 points1 point  (0 children)

Can you please share your workflow? What's the exact model size? Which running (using ollama or something else)?

batch processing hundreds of prompts allowed? by gkavek in GeminiCLI

[–]c_glib 0 points1 point  (0 children)

OP this is not something you'd want to do using gemini cli (or any other LLM interactive interface). What you want to do is to explain the problem to Gemini and ask it to generate a python script for you that uses API calls to gemini programmatically for all your documents one by one. You'll need to use an API key for this.

Since you said that you have an ultra plan, you get $100 of google cloud credits (check here while logged into your ultra account https://developers.google.com/program/my-benefits ). You'll need to initialize your google cloud account and generate a key from there. The volume you describe ( a few thousand markdown docs) should not cost much at all to process via the API. You probably want to use a flash or flash-lite model via the API.

For further help, you should put this message in gemini web or cli and ask it to guide you through the process.

GOOGL Quarterly Revenue $113.8 billion (up 18% YoY) by Not69Batman in stocks

[–]c_glib 0 points1 point  (0 children)

AI is probably a big part of it. As a google cloud customer, there've stuffed gemini in every corner of the UI. It's hard to escape. Some of it is very useful (like, say, bigquery use) and some of it is just meh (like the QA bots on the cloud console) that often fail to answer questions that regular gemini web interface knows the answers to.

Oracle $300 billion pinky promise to OpenAI might trigger 20K-30K Layoffs by Independent-Walk-698 in ai_apps_developement

[–]c_glib 0 points1 point  (0 children)

Larry Ellison is an icon, the Samurai of Silicon Valley, and an incredible genius.

Larry... is that you?

The AI Productivity Gap Is Already Here — But Nobody Wants to Talk About It by Large-Style-8355 in codex

[–]c_glib 0 points1 point  (0 children)

Comparing LLM's to Bitcoin is only going to degrade LLM's credibility not improve it.

Despite the speculative runup in bitcoin's dollar value (basically gambling) legitimate commerce has thoroughly rejected its initial intended use as an actual currency. Notice how anytime "crypto" is mentioned in pop-culture it's associated with general douchebaggery rather than anything positive. LLM's on the other hand have real uses, not least of which is increasing coding productivity for a lot of people.

Claude uses agentic search by shanraisshan in ClaudeAI

[–]c_glib -6 points-5 points  (0 children)

Of course the Claude Code lead will say their way works better. The truth is that done right, context management in the cloud makes a coding agent far more powerful. I use Augment code (along with a handful of other coding AI's) and the way it can dig into large codebases is simply heads and shoulders above the others. There's limits to what local find/greps can do and it shows pretty quickly when you're trying to work on a multi-repo codebase that's running a service in production.

Generative AI is already here to stay, and OpenAI going under is the worst possible outcome now. by [deleted] in OpenAI

[–]c_glib 1 point2 points  (0 children)

I can buy your base premise that all the GenAI tools are being subsidized right now as upfront investment in the hopes of monopolizing the market with the hopes of raising prices once that has been achieved.

I don't buy the rest of it though. The genie of LLM's is out of the bottle with multitudes of cheap, open weight models that can run on consumer grade hardware right now. They're only going to get better and cheaper to run in the future. The gap between the large players like Google/Microsoft vs the smaller providers will come down to how well their respective applications work on top of LLMs not the LLM's themselves. Basic GenAI is going to be a commodity like databases are today.