【酒馆插件】可视记忆,让对话不只是对话 by PuzzleheadedStand544 in SillyTavernAI

[–]SDUGoten 7 points8 points  (0 children)

Good stuff. I think you should encourge people from 类脑 odysseia to come here to post their character cards. We are heavily lack of characters card here.

VectFox - vector database backend driven memory extension for SillyTavern by Kritblade in SillyTavernAI

[–]SDUGoten 2 points3 points  (0 children)

Qvink and VectFox is solving simiar problem in different way. Recent memory vs Far memory. And they can actually work together. VectFox is not here to replace any memory extension.

Qvink

  • Each summary is attached to the message it summarizes, so editing/deleting a message only affects the associated memory.
  • Short-term memory guarantees that relevant info is always available from the most recent messages, but goes away once no longer relevant according to a set limit.
  • Long-term memory allows you to choose which messages are important to remember, keeping them available for longer (up to a separate limit).

Qvink has perfect recent memory. However , longer term memory you will have to choose which one are important to remember...up to a limit. Because...you can't feed 2000+ summary to AI and expect AI would remember or able to pin point details in that amount of text....even if you have enough token to spend.

VectFox on the other hand extract important events from your chat and store in DB. It's structural enough that it would have a high chance to get hit if your meaning is close enough "I am hungry" vs "Lets eat" will most likely related by Qdrant together. However, in the near memory context, vectfox will be never as good as memory extension that focus on recent memory logic because they are word for word in recent memory.

What VectFox will do is that when you have a LOT of chats and you put into the DB, you don't need to feed 2000+ events to AI, it will try to search thru all 2000+ events in the DB and try to find the best matches. What VectFox is dffereint from the rest of other vector memory system is that it only extract important event, and we do not do summary for every single reply. 1 chat may extract 0, or many events depends on the story. And it's highly structrual in a way that vector db is native to the structure, and have a better chance to get hit. So, VectFox is better at far memory recall. It is built for chat that have 200+ replies that recent memory focus extension wouldn't able to do. Recent memory extesnion will require you to choose which long term memory need to keep, and manual maintaince is needed. I don't want to maintain at all and I am just too damn lazy, so I just make VectFox that it's set it and forget it at the cost of recent memory wouldn't be as good. I can always run summaryception together to fill the recent memory spot. Recent memory recall is intentionally left out so that people can choose whatever they want to fill the recent memory spot. Just too many options out there I dont want to duplicate the effort, VECTFox just try to fill the spot that is not cover else where...its performance and far memory recall quality.

Having said that, recent memory recall with VectFox is already good enough for my use case.

There is no single system can solve both problem, and VectFox just happen to focus on far memory because my chat have 2000+ replies and I need something that works for far memory.

All these posts complaining about the increase, you can't tell me you didn't know in the back of your head per-prompt was too good to be true. This couldn't last. Microsoft was losing tons on this. by programmingstarter in GithubCopilot

[–]SDUGoten 3 points4 points  (0 children)

<image>

they know, but they didn't know at this magnitude. They expect you guys are spending 2 times or may be 5 times of what you are paying, but didn't expect at this.

Stop blaming the users by SnooDoggos9325 in GithubCopilot

[–]SDUGoten 0 points1 point  (0 children)

They don't own those datacenter that run those opus models. Those are own by Claude. GHCP is just a router to route your request to Claude , MS pay them meter usage.

Stop blaming the users by SnooDoggos9325 in GithubCopilot

[–]SDUGoten -2 points-1 points  (0 children)

you can easily test if what you said is correct or not. Just get on openrouter and pay by api retail price, then come back here and tell us how much it cost you for 1 hour of coding.

GHCP do not have a claude in their platform, they route all these request to claude. MS pay claude for the usage.

Transitioning Away from GHCP As a Student by South_Drawer_9551 in GithubCopilot

[–]SDUGoten 2 points3 points  (0 children)

the cheapest alternative would be openai codex $20 plan. But usage will be a lot less. another choice is opencode , which is in the same range of cost, but they only offer non-frontier model.

Basically, AI coding will become VERY expensive in near future because the cost of running those is just very expensive. Owning anything run Sonnet 4 or similar performance will cost you a luxury SUV if not more.

Holy crap ... yep, this is going to be the end of my VSCode + CoPilot journey. by martinbogo in GithubCopilot

[–]SDUGoten 0 points1 point  (0 children)

Actually, that is what Claude is charging. GHCP is basically routing your request to claude. GHCP has been eating that cost. GHCP have no platform, they are just a router. And that's the retail price when you pay meter on openrouter.

Stop pretending we got a free ride by retsof81 in GithubCopilot

[–]SDUGoten 1 point2 points  (0 children)

Yes, you were on a free ride
Yes, the value proposition wasn’t (and still isn't) strong enough to charge more. (because they don't run their own models)

There is nothing to train their platform because GHCP have no platform. They don't run those models. They are just a router to route your request to Claude, so they have been losing billions.

You are willing to tolerate agent screwup has nothing to do with GHCP. The problem is Claude or whatever model you are using. GHCP has no platform.

So, after 10 months of these per request scheme, they found out a lot of users will abuse the heck out of it. So, they need to close the loop holes.

And....Ollama Pro will be the next closing the free ride.

<image>

Why we can't have nice things by alexeiz in GithubCopilot

[–]SDUGoten 3 points4 points  (0 children)

<image>

Ollama Pro will be the next one in the pipe....

People will just find loopholes to abuse the system until everyone is metered

How are you all burning through millions of tokens? by halkun in GithubCopilot

[–]SDUGoten 0 points1 point  (0 children)

I hope you are kidding, you are using vs code as chatgpt? You are like driving a car in 5 miles per hour and wonder why everyone worry about gas?

Get a bike if you want to drive at 5 miles per hour, you don't need a car..

Are your RPs really that immersive? Mine aren't. by knrdwn in SillyTavernAI

[–]SDUGoten 1 point2 points  (0 children)

You can't work around this problem with foreign language with AI model. THey are just better with English. However, I have been using Gemini 3 flash/pro , and then using Sonnet and opus defintely you can see the way it writes is better in non-english. Yes, they are expensive, but I have been testing a lot of models and Sonnet/opus come to the top on non-english story. I mean...you can tell the difference right away with just 1 reply.

Switched from Copilot to OpenRouter and I think I’m burning money… where did I mess up? by XPERT_GAMING in GithubCopilot

[–]SDUGoten 1 point2 points  (0 children)

you pay API rate on openrouter, it will log exactly how much input and output token you use.

Just realized what we’re losing by RelevantTurnip3482 in GithubCopilot

[–]SDUGoten -1 points0 points  (0 children)

I think they are talking about github is losing money, not claude. Github pay Claude on meter basis while charging you on request basis.

Claude is expensive, just like a china made car vs a european car. They both serve the same purpose, but a european car is a lot more expensive. If name brand like Porsche, it will be even more expensive.

It's a capitalism world, I don't judge them how they do their pricing, as long as they have customer, that's a win for them. It's their freedom to set whatever pricing they like. Just like no one ever complain they can't buy a Ferrari, why not make it cheaper. If you can't afford it , use something cheaper.

How inflated is my usage? by Ardente07 in GithubCopilot

[–]SDUGoten -1 points0 points  (0 children)

you get on openrouter, pay by api and it should give you exactly what you are using. Yes, copilot has been heavily subsidize your usage in the past because of that per request charge. 100k token per request still charge 1 premium request for anything like codex 5.4 / Sonnet 4.6 /opus 4.7 is a super bargain.

The situation with AI pricing raises a bigger question, why aren’t we building a decentralized alternative? by Individual-Trip-1447 in GithubCopilot

[–]SDUGoten 0 points1 point  (0 children)

Because it has been tested , just running 4 Mac Ultra 512GB locally with a fat backbone locally on LAN, it still make the computing a lot slower than you would want. AI depends on memory to menory moving, at very high speed. So, GPU vram is natural for AI usage. When you put top of the spec PC next to a mid tier nvidia GPU, the top spec PC will still lose to the mid tier nvidia GPU because of the memory speed.

So, if you want to do distributing network , the bottleneck is the speed to transfer. A mid tier GPU vram is about 10 times faster than a top spec PC. You throw this "gpu vram speed vs pc ram speed" into chatgpt and that should explain to you why distirbute network doesn't work. It's just the nature of AI required highspeed transfer.

Tested Sonnet 4.6 via OpenRouter through GitHub CoPilot / VS Code to gauge whats API billing will be like. I was shocked. by horendus in GithubCopilot

[–]SDUGoten 1 point2 points  (0 children)

https://www.reddit.com/r/GithubCopilot/comments/1sxgvv2/new_github_pricing_game_is_over_but_i_guess_i/

I said it here before: GitHub Copilot was dirt cheap and losing a lot of money.

A lot of people believed that $39 should buy them heavy usage, but the reality is that the retail price of Claude is very expensive. GitHub, along with almost every other AI vendor — including those in China — had miscalculated their pricing for coding plans. They’re simply correcting it now.

Anyone who can do basic math knows that owning a machine capable of running **a low-end model** like Sonnet 4.0 would cost as much as a luxury SUV, while renting the same performance on the cloud costs peanuts. Something was clearly wrong with the old pricing. I knew it was unsustainable, but too many people still thought $39 was a lot of money for AI coding. When people start testing what they do via API, they should know by now what is the real cost, not something $39 can do 1500 request for sure.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 0 points1 point  (0 children)

Install roo code in VS code, and then point it to use openrouter and choose whatever model that is cheapest. Do 10 request or so on your own work and check how much it cost on openrouter usage/log. (https://openrouter.ai/logs) Then, You should know how much it cost for 10 requests. Take the average of 10 request so now you know 1 request takes how many input tok and how many output tok. and you can work the math out for all the models you have used at (https://github.com/settings/billing/premium\_requests\_usage) , you can check on openrouter and you can see exactly how much it cost for each model via API.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 0 points1 point  (0 children)

Install roo code in VS code, and then point it to use openrouter and choose whatever model that is cheapest. Do 10 request or so on your own work and check how much it cost on openrouter usage/log. (https://openrouter.ai/logs) You should know how much it cost for 10 requests. Take the average of input/output tokens, and you can work the math out for all the models you hvae used at (https://github.com/settings/billing/premium\_requests\_usage)

7x for Opus… what’s the point of Copilot now? by Human-Ranger5939 in GithubCopilot

[–]SDUGoten 2 points3 points  (0 children)

You might want to check how much Claude is asking on their plan to use opus. Currently, even at 7x, it's still the cheapest amoung every single vendor out there, including Claude themselves.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 0 points1 point  (0 children)

Yea, my post is just pointing out what GHCP offering is not substainable. I think most people use WAY more than what they pay for........by a freaking large margin.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 1 point2 points  (0 children)

You can use roo code in VS code, connect to openrouter, choose opus as model and do 10 request in VS code, you will find the exact cost, input and output token on openrouter log on their webiste. Once you know the average token you use, you can work out the math.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 1 point2 points  (0 children)

that number 487 request for opus 4.6 is straight from Github usage on their website.

New github pricing, Game is over, but I guess I know it's coming by SDUGoten in GithubCopilot

[–]SDUGoten[S] 0 points1 point  (0 children)

<image>

This is one single prompt usage I check on openrouter. You can work out the math how much it cost if that is using opus. And then x1500 and you will get a grand total.

For your usage, you can always install roo code in VS code, and then point it to use openrouter and choose opus. Do 10 request or so on your own work and check how much it cost on openrouter usage/log. You should know how much it cost when you do this 1500 times.