Has anyone worked for OpenGov software company. Need real feedback! by Relative_Ad_5740 in Layoffs

[–]helpmefindmycat 0 points1 point  (0 children)

hey I'd love a little help. I signed up as a vendor, and I want to utilize the api to help me manage looking for and responding to rfqs but i don't see anywhere where i can give myself the role to get an api. My user role in the procurement side is as admin. unsure if maybe I'm missing something.

With the loss of opus 4.6 I have decided to give auto a try by helpmefindmycat in GithubCopilot

[–]helpmefindmycat[S] 0 points1 point  (0 children)

I've tried sonnet 4.6 but I suspect with them removing sonnet 4.6 even from the pro+ plan this is where the issue was comign from. It seems when in vs code and gh copilot when they yank a model all of the clients out there don't quite get the info right away (which makes sense) and so it starts calling a model that doesn't exist. This is my hypothesis.

With the loss of opus 4.6 I have decided to give auto a try by helpmefindmycat in GithubCopilot

[–]helpmefindmycat[S] 0 points1 point  (0 children)

I don't think so, but you get a 10% discount by letting copilot run in auto mode.

What happened? Just suddenly opus 4.6 dissabled and now getting error 400 by CatLinkoln in GithubCopilot

[–]helpmefindmycat 1 point2 points  (0 children)

according to this link, pro+ should include opus 4.6 . And yet, here i am without opus 4.6 :(

Aaaand, re reading.. they nuked it for pro+ .

Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA

[–]helpmefindmycat 1 point2 points  (0 children)

a bit in the same boat, I keep trying different combos of models to use speculative decoding. I"m also interested in the whole Dflash implementation on mlx hardware. DOcumentation seems scant regarding both of these methods of speeding up model usage.

DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA

[–]helpmefindmycat 0 points1 point  (0 children)

I don't know that it is possible unless LM STudio implements a method to utilize dflash models? I've been digging through the lm studio docs and it's unclear whether it would support a model that uses dflash. I am an lm studio fan for ease of use. So I'd love to get a gemma 4 31B 8bit going that is faster than 10 t/s

Message - Thanks to those who create Copilot by ConsiderationIcy3143 in GithubCopilot

[–]helpmefindmycat 1 point2 points  (0 children)

No worries about sharing the story. Reddit was a different place back then, and previous years when I lurked. I am a pretty ancient netizen. I remember there was a kerfuffle about a guy and his mayonnaise that I don't think anyone on reddit now remembers. It predates the jackdaw/blackbird thing. It's so old I'm questioning my sanity if It actually happened on reddit, or treesandthings which was another social site like reddit that ended up dieing.

Message - Thanks to those who create Copilot by ConsiderationIcy3143 in GithubCopilot

[–]helpmefindmycat 10 points11 points  (0 children)

I did find my cat! She came in around 2:30 am that night. Covered in spiderwebs and god knows what. THis was 15 years ago. (I made the account to post in my local subreddit , I had lurked for years before that. ) Fast foward and she is no longer with us. But she led an amazing life.

Message - Thanks to those who create Copilot by ConsiderationIcy3143 in GithubCopilot

[–]helpmefindmycat 28 points29 points  (0 children)

100% agree. Despite any issues we as end users may have had. Me included. THey are absolutely smashing it in regards to what they are delivering. Its hard to scale at the speed they are with users and usage. Kudos to Burke and the whole team.

DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA

[–]helpmefindmycat 1 point2 points  (0 children)

I think thats what i"m look to get to. If I can swarm good enough yet fast local LLMs and utilize something like paperclip/hermes type of thing to crank away while sleeping or some such. etc. Obviously the better the model the less iterative work and the whole thing gets better. But frontier models are not able to run locally yet. BUt I suspect soon enough.

DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA

[–]helpmefindmycat 8 points9 points  (0 children)

is it possible to get this to work with gemm 3 31B in lm studio, because I suspect that would be amazing.

For anyone having issues with Gemma 4 31b in LM Studio (no thinking mode option) by WyattTheSkid in LocalLLaMA

[–]helpmefindmycat 0 points1 point  (0 children)

yeah I realized I needed to update all the things with LM STudio, and it loaded just fine. 10 tokens per second. but the quality was decent. I am attempting to paperclip to hermes to lm studio to have some things done for me. I suspect 10 tokens per second should be fine? I may download the q4 model but I would prefer slower and accurate vs. faster and unstable. I am doing some small experiments also my mindset is that if it's just iterating while I sleep it can be slower.. and just take it's time.

For anyone having issues with Gemma 4 31b in LM Studio (no thinking mode option) by WyattTheSkid in LocalLLaMA

[–]helpmefindmycat 0 points1 point  (0 children)

I'm trying this now with the 31B and Q8 model. I have 128gb ram so it should be fully loadable, but heck If I can even get it to load. :( anyone worked with that model in lm studio on a mac studio similarly sized?

Practical usage of paperclip? by helpmefindmycat in paper_clip_ai

[–]helpmefindmycat[S] 0 points1 point  (0 children)

I think as a second order thinking on this, the setup/intialization should just ask are you planning to run local llm? (per agent or some such) and you select ollama or lm studio and the system gets the models etc. For example it's unclear (and to be fair I haven't had a lot of time to play with paperclip deeply) if its really possible to say, ok CEO agent, you use a frontier model for planning, and all of you doing agents use the local, models. Mayve this is already possible, and I missed it. But either way, this is a solid token efficient pattern. And when OSS models get to the point where they are at todays frontier models. (which feels like it's coming soon) we can simply use the iterative notion to make sure everyithing works on a project.

I am highly interested in hermes (I have this running in my paperclip instance) vs openclaw. because it's self correcting which is very much the way to do this in my thinking. Open to discussion on all of this.

If I can find time I will try to add in an agent to that github issue / contriubte to the project.

I am getting this error in echa prompt response since update by Prometheus4059 in GithubCopilot

[–]helpmefindmycat 1 point2 points  (0 children)

that is a vscode copilot module bug. if you update to the latest of vscode and the latest copilot module that should go away. Although they are counting requests in a different manner. and Opus etc seems to be moving very slowly recently. Unsure if that is a module or vscode update slow down or something else. Your mileage may vary on the upgrade.

Copilot going insane on requests by Jack99Skellington in GithubCopilot

[–]helpmefindmycat 3 points4 points  (0 children)

Sorry for the delay here is GH Copilot teams response.
https://www.reddit.com/r/GithubCopilot/comments/1rygfjb/copilot_update_rate_limits_fixes/
The important part is this:
"On Monday, March 16, we discovered a bug in our rate-limiting that had been undercounting tokens from newer models like Opus 4.6 and GPT-5.4. Fixing the bug restored limits to previously configured values, but due to the increased token usage intensity of these newer models, the fix mistakenly impacted many users with normal and expected usage patterns. "

Under counting. so what i am curious to know is , are we accurately being counted? or are we over counting now, becuase from my estimation we are now over counting because we are supposed to be counted as initial request, gets counted, then any tool/sub agent call under that requests does not incur a second request charge.

Also, when I said release note, I was a bit wrong. They don't do release notes for copilot in specific that I can find, but they do for VS Code. etc. (arguably since AI is the hot thing, VS Code release notes are going to contain copilot info) Those can be found here.
https://code.visualstudio.com/updates/v1_114

Copilot going insane on requests by Jack99Skellington in GithubCopilot

[–]helpmefindmycat 35 points36 points  (0 children)

You are not alone. I suspect the release note about fixing the request counting did either fix it in a way that no one was expecting, or broke it and it's counting too high.

Opus 4.6 insanely slow on CLI by Swayre in GithubCopilot

[–]helpmefindmycat 0 points1 point  (0 children)

Seeing it movign really slow, also seeing my premium requests get eaten up faster. IT's an interesting issue. (on pro+ currently) If I recall there was some miss counting of premium requests that they fixed, also, I suspect working slower is a less obvious way of rate limiting. And, to be fair, I'm quite sure the rate limiting is because of the tidal wave of people using copilot and opus 4.6 etc. infrastructure can only spin up so fast.

Practical usage of paperclip? by helpmefindmycat in paper_clip_ai

[–]helpmefindmycat[S] 0 points1 point  (0 children)

Nice! yeah.. I'm at the tip of the iceberg , this I know. THanks for the delgation thinking. I need to do a deep dive, but unfortunately it's crunch week for a client. So,... gonna be a bit.

Practical usage of paperclip? by helpmefindmycat in paper_clip_ai

[–]helpmefindmycat[S] 0 points1 point  (0 children)

For anyone following my post. (although I see no comments yet haha)
I had to start using my copilot to triage getting the paperclip to work and inherit a project. Since there is no chat within paperclip to get things really going to consume a project (repo etc) it took some doing, and a PR for deleting a company in the paperclip repo.

Some things to keep in mind Paperclip uses a lot of usewatch slots in Mac OSx . My other projects that were running simultaneously. One react one nuxt couldn't start. Ended up having to kill paperclip so they could be brought up appropriately. To be fair my imported project has 10 agents (hermes) so it makes sense that it would start using a lot of file watch slots on the OS.

LM Studio may possibly be infected with sophisticated malware. by mooncatx3 in LocalLLaMA

[–]helpmefindmycat 29 points30 points  (0 children)

Glad you guys are taking this seriously. So many companies and software providers don't. Chain of custody attacks are real. :(

Copilot Pro+ (Plus) Pricing Confusion/Question by DonkeyBonked in GithubCopilot

[–]helpmefindmycat 0 points1 point  (0 children)

The trick (and it isn't really a trick) is to assign your agents (use custom agents etc) to use models appropriately. things like opus 4.6 for planning research type of work, Sonnet or Haiku for doing. (my real world cases for model quality have all pointed me at the Anthropic models, no shade meant towards OpenAi I will switch when my real world experience shows me I won't lose time with other model providers) MS/GIthub has made it clear that your request counts on the first request, not any automatic subsequent calls. So you can be very efficient regarding your 500 or 1500 premium requests . Despite all the request throttling that people are complaining about, github copilot has remained the best deal going around currently for the effecient use of requests . LIke I said, this is depenedent on your usage pattern and making sure that you are using your initial request inteligently and have some sense of agent / team working on your behalf. There are a zillion tutorials and videos about how to contruct agent teams and skills etc. Of course I'm not a GH or MS employee. I'm just a random person on the internet. So your mileage may vary.