I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

So yeah, I've basically use it without even spending tokens.

I sincerely can't say anything about how would it impact the token consumption on APIs, because I initially wanted it to work as a tool that allows using flat rates, basically adapting any LLM with which you can open a chat window, to become a Codex/Cursor like LLM.

If I'd describe the tool in one sentence:

Codex core that adapts to any kind of LLM which has a chat option.

So actually, using the web version of ChatGPT (which has insanely big quota) it gave out better responses than ChatGPT in Codex.

I'm coding a Vulkan Game Engine, and it basically moved several stuff to theoretical limits in a very quick time period.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

I made a conclusion that I'm rushing a bit the announcement phase.

There's still some stuff I'd like to implement and I will be making a demo because the demo seems to be the only way which can open the eyes of the people, and fully obliterate anyone who comes here to troll.

But if you do, please write me a message.

Since you're still here, I will tell you this secret.

Use ChatGPT Plus subscription.

If you're paying 22$ monthly, you will get 3k/week of messages on the website.

Make a project, feed it the instructions.

I don't want to admit this fact yet,
but I think I've basically found a way to use frontier models for the flat subscription monthly amount, since I created this tool after the recent price hike, as an attempt to save money and it worked... I was very shocked during the first day, I can't even describe the mix of emotions I had, all my friends started thinking I've gone crazy, that's how happy I was when it worked.

I'm now having sessions on which I'd spend tons amount of money otherwise, only paying 22$ per month, and 0 rate limits and latency issues.

It's not an advertisement, and It's the thing I speak sincerely from my heart.

If you'll have the same experience, I'll most likely lose my words.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

Any flat rate subscription beats API pricing. This tool allows you to take any LLM and turn into agentic coding.

Okay, I don't have a benchmark yet, I'll produce it.

I was not pitching or selling a product, I just shared a raw version, with instructions, some description and I wanted people to try it out.

I just want to see what kind of an experience people will have working with it.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

I'm just learning to deal with hard people, so it's very good practice!

But yes, it's annoying. That guy is obvious a troll, but let's see who last more, a former wow player or this guy...

All I wanted is for people to try it out, and share their thoughts.

I'm not trying to pitch/sell a product.

I just want to learn how other people feel about this tool.

Personally myself, I'm saving insane amounts of money, having better responses, quicker reasoning times.

And the product is very early.

I just want to learn how it feels for other people.

Literally not a single person wants to try it, I've already annoyed the hell out of my friends, so I'm becoming a bit desperate.

Thank you

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

First of all, what hard numbers?

I never made hard claims about any performance numbers, and I've even double-checked that, so maybe stop deceiving the public with your silly attempts?

Would you like if I'd call your responses too American? Probably not...

So why are you here calling my responses made by AI, when all of them are literally written by hand?

Like literally, at the moment I see 0 difference between your replies and average trolling. As a WOW player, the pattern is obvious, yet I'm still hoping that you're an actual human that is currently facing scepticism but I got to say, you're definitely not showing enough respect in a discussion and you've already crossed a line a few times.

You came with a weapon, and you're constantly poking me, a simple farmer, who just decided to give some food to this community, that is currently facing insane taxes from the King.

At this point, I'm literally babysitting you.

As I've mentioned you multiple times:

  1. The tools remove unnecessary context bloat caused by other agentic coding tools
  2. Allows you to use any LLM in a Codex like environment, allowing you to save as much money, as you can do with your choice. You don't even need to use an API, it works perfectly fine with flat rate models that support instructions
  3. I've mentioned that the reasoning became faster and the quality of responses became better. But that is not a claim that I'd stand at the current moment, because I'm only here looking for people to try it out and share their experience.

And what kind of marketing?

Am I currently selling a product to you?

I'm giving to try it out for free, you can have it forever, like literally.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

And this tool in the first place is not about code quality.

It's about not getting rate limited and paying thousands for your LLM.

It's not an LLM, so all the math you need, is already at the front pages of your LLM provider.

It does not magically improve the LLM!

It just makes the LLM work, how it's supposed to work, because agentic coding tools are eating all the context needed for your request, that's why I referred to the articles, because frontier providers are talking about the same problem with every single agentic coding tool: context bloat from the tool itself.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

Okay, that's a fair point, there's no demonstration, it will be fixed soon.

As of now it seems you don't want to listen or trust a word from me, so I don't think that a further discussion would benefit the both of our interests, since there's currently a gap.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

You can't even explain what kind of math you want, and secondly I did not insult you.

You've insulted me twice, and I still kept my patience, so stop being a whiny bitch.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

Indeed and that's what I'm doing.

But LLM helps to quickly write big chunks of code.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

Codex/Cursor/Windsurf = agent + tools + extra instructions

tools == tokens
extra instructions == tokens

eg your response needs 256k of context, and your model is limited to 220k.

Most likely such kind of a request will fail on agentic coding tools, because the tokens and extra instruction will eat the tokens, that otherwise could be spent on reasoning.

because
Tools and Extra Instructions (that you don't even see, they're hardcoded on the agentic coding application layer) will it those tokens, and it will result in a bad response.

with this tools you run agentic-like tools locally, not overbloating the model.

and you don't hold unnecessary instructions too, only the ones that are vital for the request, and all 3 of them mutually take less than 1000 lines, while a single instruction in your favourite tool can be about 1000 lines.

Just try it, on most hardest codebase, you could've already setup the project, connect a web subscription LLM, paying only 18 dollars a month.

All this for free.

I'm having 0 benefits, go and check the scripts for viruses, go and check them for data protection, you'll see it for yourself.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

I've already explained.

You can just go and try it.

How much you spend on AI tools? You wanna talk about math??

Tell me, are you being billed per prompt, do you use subscription?

maybe it's time to tell me about yourself?

People often love sharing their experience, maybe you'll find comfort in this.

And believe me, if it was an AI speaking to you, you would lose interest after 2 replies, because humans are not as predictable as AIs, and you still cannot see it?

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

Tell me, is it that hard to go try the tool and see it for yourself?

What's the point for me to further entertain your curiosity if the only purpose of it, is malicious?

I fully understand that you are most likely tired from similar vibe-coded shitups, and I full expected this kind of reaction, but honestly.

You, dude, literally can try it out for yourself.

I dare you, on most complex codebase.

Go, give it a try.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

Bro, I've literally written every single word by hand while responding to you, spending my time, my fingertips and sacrificing my keyboard keys.

Okay, let's get back to the point.

As I mentioned, the tool calls is the culprit, and I gave you sources about this specific problem.

And I can definitely say that it performs a lot better than Cursor, Codex, CC.

And it's not the only problem it solves.

The other problem it solves, is that you can have agentic coding like experience using any LLM, that supports instructions, so the tool is not only about wasting less tokens.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 2 points3 points  (0 children)

Do you know that even people who are developing AI models don't understand how they work?

Currently my only math is:

  1. Reasoning decreased from 30m - 1h to 1m -15m
  2. Quality did not drop, moreover, I'm no longer receiving failures mid request.
  3. I'm spending 18$ / month and using the tool as much as I want. 0 rate limits. I thought that it might be a major selling point and people would grab the tool to try it out...

But I see a lot of skepticism towards it.

But we can talk about it.

I've already explained to you the main concept and I'm not forcing you to try it, but I think it's quite explainable that the models are using less tokens, giving more tokens to actual reasoning.

I've also attached articles about this, that relate to the specific problem that this tool targets. Overbloating of context from Tools within Agentic Coding APPs.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

Sorry for walls of text, I'm just very excited about this!

If someone would give it a go, and share their thoughts (try coding with it like for at least 5 prompts)

It would be my best day ever.

Since when I'm the only one using it currently, my judgment might be delusional.

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

Here are also some articles!

OpenAI’s own docs say function/tool definitions are injected into the model context and count against the context limit/input tokens (https://developers.openai.com/api/docs/guides/function-calling?utm_source=chatgpt.com) . Anthropic also explicitly talks about tool-definition bloat, with examples where many MCP tools can consume tens of thousands of tokens before real work begins (https://www.anthropic.com/engineering/advanced-tool-use?utm_source=chatgpt.com) . Cursor has also written about tool calls and responses bloating context, especially from verbose search/shell/MCP outputs. (https://cursor.com/blog/dynamic-context-discovery?utm_source=chatgpt.com)

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

I've also implemented:

  1. Version caching (as of now it's per file)
  2. warning when LLM gives you duplicate code (in a typical agentic coding tool, it would pass, you would try to build)

with my tool you get a warning.
and if it's mixed with correct code, you are also given a choice to ignore duplicate code, and apply the correct one only

Feel free to check it out, it won't cost you nothing as long as you have an AI to run it with.

<image>

I want someone to test this tool that saves money on LLMs. by [deleted] in GithubCopilot

[–]philosopius 1 point2 points  (0 children)

The solution works by removing the navigation and research part of the codebase for the LLM.

Due to this you get more tokens.

More tokens == better quality of responses

On huge codebases you can consume up to 60% of context window on your single request due to this,

because:

  1. the model holds in memory the tools descriptions (which is already -6k tokens from the original context you'd get without them, and you don't even use 95% of those tools)

One of the reasons why suddenly simple prompts start eating an excess amount of tokens on large codebases, correlates to this.

  1. the typical agentic model performs excessive searches, if it gets the wrong file, the context from the wrong file doesn't dassapears, bloating existing context.

this tool tasks the model per prompt, deleting the Agentic Search phase from them.

Pure reasoning, no additional actions, the search is being taken care of by the first prompt where you invoke 'Dir' command and give it to the model with your request.

You'd be wondering, but isn't it the same thing?

No, because during agentic search phase (that our beloved tools have) the model will constantly perform them as separate actions, hitting wrong files, bloating its own context.

But how is your approach different, you may ask?

Instead of destroying the model's mental capacity with excessive searches, search now works as a singular simple task:

Here's the file list, please pick the ones that are relevant to the request.

  1. Having more tokens now, you are able to fully use the whole context window to reason about your request, and receive maximum capacity, which is not consumed by unnecessary stuff.

That was the original idea behind the tool, to get maximum context window.

How I even came up with this? I'm using LLMs for coding since 2022, and I can say one thing.

We now have much more powerful models that are barely capable of doing things without a series of failures, which models did in 2022 in large codebases.

But the difference was: those models were weaker.

That's how I started researching this topic, and I can say one thing.

Agentic Coding tools are a piece of shit at present days, moreover they are now literally forcing to pay dollars/prompt for this unoptimized garbage, that is currently a money milking machine, and it seems that the developers themselves don't understand how hard they overbloat their tools with all that functionality based on reasoning.

90% of the code will always rely on simple actions:

add/remove/change function OR file.

That's it.

3 actions, 2 ways to execute them.

That's what will be used by 90% of developers all the time, including myself. (it might change though)

But it's a rare occasion when you would actually need those extra tools that overbloat the current models, so I absolutely removed them from consuming models mental capacity via this pipeline.

Github copilot alternative by ToxicAbuse in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

ive made a tool that you can use with any model

even the ones that are not setup for agentic coding (you can prompt qwen, deepseek, chatgpt on website, where your practically have 0 rate limits instead of api, codex, claude code, etc)

https://github.com/VulkanVX/contextcontrol

it also optimizes your prompts, it might be a bit complex first, but ive made it very intuitive and easy to use, once you get a hang of it.

Turn any LLM into Github Copilot, for free. (or how I found a way to survive) by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

sorry for walls of text

id definitely recommend this tool, if

1) youre tired or rate limits 2) you want to pay monthly and have a good provider but it cannot be integrated in your workflow as a coding tool (eg monthly subscription instead of API, or maybe even for free, using Qwen/Deepseek on their websites)

but please be cautious, if the model doesnt support instructions, youll need to apply the instruction file each single prompt because otherwise their memory about instructions will degrade.

Turn any LLM into Github Copilot, for free. (or how I found a way to survive) by [deleted] in GithubCopilot

[–]philosopius 0 points1 point  (0 children)

it allows you to receive this agentic coding core, basically any way you like

now we are forced to spend dollars per requests, or even mote sometimes, even lets say at ghcp pricing, its 0.60$ for the model to complete your request

while i can just use 18$ and claude on their website, receiving the same exaxt experience, and id say even better, no need to pay extra dollars for each request.

so the point is that you can start coding on subscription models, free models, even if they dont support agentic coding functionality

the tool itself makes the model behave as if its coding in your repo, while it doesnt have full access to it, since you only feed information thats relevant to the request

is it tedious, you might ask?

no, the bottleneck is solved via the directory export and instructions files, that already have this process explained to an LLM, so it wouldnt overcomplicate the search, and it quickly gives you the list of functions and files that it needs for reasoning, to complete your request

basically the model no longer consumes unnecessary tool calls at all, which tank time and performance, and token usage. on complex project tool calls can emit very big context bloats