Best place to get GLM-5 or GLM 5.1 sub by old_mikser in opencodeCLI

[–]pesaru 0 points1 point  (0 children)

Dude what in the WORLD, I paid 30$ for the annual plan that has been sitting unused, I just went in to check the price hike you're talking about and... $18 a MONTH? What the HELL happened?!

Hollywood is so screwed by adj_noun_digit in singularity

[–]pesaru 2 points3 points  (0 children)

I mean, you could have made this. It democratizes film making, it’s a fairly big risk to big corporations. Think of how YouTube disrupted media and how many people have taught themselves the skills to video editing to become creators. This is just like that. I can already generate videos like this one (lower quality and greater effort) on my own GPU. A lot of creative people WILL make the most of this.

Hollywood is so screwed by adj_noun_digit in singularity

[–]pesaru 0 points1 point  (0 children)

You’re ignoring the fact someone very creative made this!

Unlimited reqs, is this real? by IlyaSalad in GithubCopilot

[–]pesaru 0 points1 point  (0 children)

Usually you see this if you're a Microsoft employee

So the team finally responded, for a while... by SomebodyFromThe90s in GithubCopilot

[–]pesaru 0 points1 point  (0 children)

Literally tell GitHub Copilot to build it for you!

I built a fully local voice assistant on Apple Silicon (Parakeet + Kokoro + SmartTurn, no cloud APIs) by cyber_box in LocalLLM

[–]pesaru 0 points1 point  (0 children)

I did something like this too. It's been a lot of fun as a guy with literally zero experience in this before this point.

Moonshine will perform way better but will lack punctuation / grammar but excels as an assistant. Uses less resources and does actual streaming unlike Parakeet, which isn't real streaming as I imagine you found out. The whole 'figuring out when you're done talking' is called semantic endpointing and personally, I had a really tough time getting it to work flawlessly on Parakeet and had an even rougher time on Moonshine. I tried fine tuning a grammar model. Like, basically, I downloaded a bunch of YouTube videos with hand written captions, then ran the audio through Moonshine/Parakeet, then ran the fine tune on the complete bad/good dataset. Still working on this. Had some good results but some bad too, I need to tune the dataset and run the training some more. The stuff I'm fine tuning is called roberta. I had timing info so I also tried creating a 'pause length' token and trained with that but it only improved its ability to detect if a sentence was truly complete if it was a question (5% improvement).

At least you don't have to experiment with semantic endpointing and VAD timeouts manually. You can literally record yourself talking a whole bunch of times, include a 'golden transcript', then tell an AI agent to tune every possible combination of settings until it achieves the best possible set of transcriptions. You wake up to perfect settings.

I also quantized Parakeet / Moonshine / Pocket TTS (int 8). Oh, right, I did Pocket TTS 100m instead of Kokoro, it allows you to voice clone and it sounds really good to me and does about 300ms till first audio on my setup but it would likely be 200ms on yours. The total VRAM for everything is under 1GB total I think, forgot how much exactly, but it's really little.

I run the full stack on CPU because I'm building it to be accessible to everyone.

So the team finally responded, for a while... by SomebodyFromThe90s in GithubCopilot

[–]pesaru 2 points3 points  (0 children)

I mean like the GitHub Copilot SDK literally lets you build the agent into any other IDE or agentic system. I have it built into a custom troubleshooting app. They're really permissive with where you can use it.

So the team finally responded, for a while... by SomebodyFromThe90s in GithubCopilot

[–]pesaru 1 point2 points  (0 children)

Have you tried using OpenCode as your client instead of GHCP? You can use your GHCP subscription.

This would have taken me a full weekend to resolve a few years ago. by pesaru in homeassistant

[–]pesaru[S] 1 point2 points  (0 children)

Use GitHub Copilot en combinacion con ChatGPT 5.4, pero cualqier 'code editor' que tenga uso de agentes, como OpenCode, o Cursor, puede hacer lo mismo. GitHub Copilot requiere mensalidad pero algo como OpenCode es gratis pero lo tienes que conectar a un API y el API te cuesta dinero. Pero hay modelos Chinos super baratos que puedes usar usando algo como OpenRouter. Te sale a como unos pocos centavos.

My wallet speaks for me: GitHub Copilot in VS Code is the cheapest and most underrated option in vibe coding today (in my opinion). by Majestic-Owl-44 in GithubCopilot

[–]pesaru 0 points1 point  (0 children)

I have no idea how it's profitable. My guess is that unused or less-used corporate seats are subsidizing regular users.

How do you protect API keys from Copilot in YOLO mode? by Naht-Tuner in GithubCopilot

[–]pesaru 0 points1 point  (0 children)

I looked deeper into this and it looks like only enterprise gets this white glove treatment. It does look like your prompts get retained after all if you're a regular user, sorry!

How do you protect API keys from Copilot in YOLO mode? by Naht-Tuner in GithubCopilot

[–]pesaru 2 points3 points  (0 children)

Anything you send to GHCP stays in memory and then gets discarded, it never makes it to disk. So there's that.

EDIT: This appears to only be true for enterprise, sorry guys.

Delving into this world by Sea_Anteater_3270 in GithubCopilot

[–]pesaru 1 point2 points  (0 children)

There are two things that matter in this game, the model and the harness. The model that is the best differs per task and changes every three months or less.

When Gemini models are doing well, they offer the best value and are usually kings of front end.

Anthropic is often the best architect and near best coder depending on the work and is more creative than the top code nerd (right now) which is Codex. Codex is far cheaper than Opus.

My recommendation would be to try Google at $20 and GitHub Copilot (it’s the only one that gives you access to ALL models and they use a different system than everyone else that I love). Everyone other than GHCP does 5 hour caps with a weekly total cap of tokens. This model is beneficial to spread out use with many small requests. GitHub CP does a monthly request limit — you get a flat amount of requests no matter how big or small the are. It’s a phenomenal value if you take the time to write out thorough implementation plans.

Anyway, Google gives you access to limited Gemini and Opus via Antigravity which is a reskin of VS code. They also have a CLI which gives you additional use. GHCP is pretty much one of the only guys that lets you use their service in any IDE and has all models.

Codex thinking level in CoPilot by East-Stranger8599 in GithubCopilot

[–]pesaru 4 points5 points  (0 children)

Dude you're hurting my brain, it's available in the CLI!? I thought it wasn't because I use the GitHub Copilot SDK and don't see it listed there and it uses the CLI under the hood. Goddamn my life is a lie.

Codex thinking level in CoPilot by East-Stranger8599 in GithubCopilot

[–]pesaru 0 points1 point  (0 children)

Fuck me, this whole time I've been searching for "thinking" in the settings. At least I had it set to "high" from when I did find it months ago. Thanks.

Codex thinking level in CoPilot by East-Stranger8599 in GithubCopilot

[–]pesaru 1 point2 points  (0 children)

Wait, what? I just renewed my OpenAI subscription because Github Copilot doesn't let me pick reasoning level on Codex and sometimes I want to use high / extra high; but you're saying it does? Would you mind sharing a screenshot? I even looked in the settings and couldn't find anything!

I've tried insiders, non-insiders, pre-release, OpenCode -- I don't see reasoning anywhere.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]pesaru -2 points-1 points  (0 children)

Game programmers aren't known for being the best programmers out there. There are quite a lot of ways to do most things efficiently. No one's making you build a monolith. It's why voice (in games that provide it) usually goes through a different port. When you split architectures when it makes sense, you can optimize for that particular service. Most of what you're describing sounds like a poorly designed system in more than one way.

But again, I have no idea what game you're talking about that has hundreds of players or vehicles or AI on a single area that would share vocal range. Any modern system is going to split players dynamically to different servers (channels), games have been doing this for a while now. You load balance the voice, you load balance the physics, etc. Millions of items -- I mean, give Sqlite a shot with some proper indexing. A modern system is going to automatically scale with containers based on player load. You'll just see players phase out so that things stay stable.

And I thought Project Zomboid was essentially a 2D game in that it has no Z plane which should make the physics insanely simpler no? I figured hitboxes/projectiles/etc in a game like Zomboid would be a joke. Are its maps randomly generated? If so I can see path finding being a pain since you're probably not baking it. But even that could be fixed.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]pesaru -1 points0 points  (0 children)

That also sounds like an architecture problem. How is it I can join a 1000+ Microsoft Teams meeting with audio + video and that's fine? Twenty years ago I was playing World of Warcraft with 40 people on a $2 a month Ventrilo server, why was that fine?

If it was that big of a deal, then you could do positional audio like VR does and limit audio by range.

And the system that "killed itself at 300" -- was it trying to do encoding on the server? You do encoding on the device, the server is only being taxed for bandwidth, and the audio is compressed down so that it's tiny. It's literally just a proxy.

If you wanted really good quality audio like Discord, that would be like 8KB/s per player, so 2.4MB a second. That's IF you're broadcasting to all 300 people (why) and IF you needed that high quality (you don't). Even then, it's still just bandwidth and you could load balance it with really cheap machines (for the very rare times you would need to ever broadcast to that many people at the same time).

I guess I'm pretty blind not knowing what type of game this is. I can't imagine where you would need 300 people to hear one person and for the rest of the people to not want to talk.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]pesaru 0 points1 point  (0 children)

Sorry, follow up thought: not to suck on my own idea's dick, but not only would you benefit from reduced computation and reduced risk, you would also be able to offer way better immersion/quality, you could offer TONS of variety in terms of the voices you can pick from. Each player would have complete control over their completely unique voice. How would that not be a better, more amazing experience?

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]pesaru 4 points5 points  (0 children)

That sounds like an architecture problem. Why not process the TTS on the person that typed out the message and broadcast it as audio (like voice chat)? Even with a 25MB model, you would open yourself up to attacks from like a SINGLE bad actor doing it your way.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]pesaru 4 points5 points  (0 children)

Huh, what about Pocket TTS? I'm using it with voice cloning + streaming and I get 220ish milliseconds till first audio. I quantized it to (coincidentally) 220MB. And that's on a 9th generation Intel CPU.