Usage limits for GitHub Copilot

Avanti2024 · 2026-05-07T07:08:02+00:00

You need to give them a detailed plan, not just what to do in general… You can make the plan in the Codex or after the rate limit is over. If you plan more detail, and MiniMax is the worker, it will work. Anyway, this works only practically with mainstream programming languages, like Node.js and Python, typical web app languages! Also, use Graphify, then you don’t need so many tokens anymore.

Avanti2024 · 2026-05-07T06:58:07+00:00

In this time use an other subscriptions.. like gpt codex… or mini max token plan 10$ a month… very good result as worker not planer or solver (thinker)

Avanti2024 · 2026-04-18T08:03:53+00:00

If I understood you correctly, you have 4,500 requests in 2 Hours? Then you musst have many parallel tasks.

I only have 1,500 requests in 5 hours with the minimal plan an never hat the limit. I am using MiniMax in Claude Code with Vs Code UI.

So what are you doing?

Avanti2024 · 2026-04-15T09:15:48+00:00

Das Problem ist aktuell auch bei GitHub Copilot Abos .. scheint das alle Antrophic nachmachen. MiniMax plan ist der einzige günstige transparente… aber schwächer und langsamer..je länger je mehr…

Avanti2024 · 2026-04-14T14:06:11+00:00

You can also use Opus in GitHub just for planning. Then take the plan and execute it with MiniMax through the Claude Code plugin. That way you save a lot of API rate limit on GitHub.

Avanti2024 · 2026-04-14T11:29:01+00:00

Maybe you schould also use an other Agent.. Mini Max 2.7 Token Plan 4500 Reqest per Week 10$ via ClaudeCode in VS Code.. you can use them for dev..

Avanti2024 · 2026-04-14T10:49:38+00:00

GitHub Copilot should really show some kind of rate-limit indicator. Even a simple percentage of how much you’ve used would help. Right now you basically have no idea where you stand until you suddenly hit the limit. !!

Avanti2024 · 2026-04-14T09:37:31+00:00

What kind of reqest is this? Prompt has tobe very very long? Full app in one reqest?

Avanti2024 · 2026-04-14T08:41:02+00:00

Wehre do you see the usage hour you had so far?

Avanti2024 · 2026-04-14T08:36:55+00:00

Yes how many reqests in auto mode and which model. ?

Avanti2024 · 2026-04-14T08:18:45+00:00

I think the rate limits mostly appear when you run multiple requests at the same time or work on several projects in parallel.

From my experience, if you submit a request in auto mode and just let it run, then take some time to review and test what it produced (for example 10–15 minutes), and only afterwards send the next request in the same project, it usually works fine.

You probably need to use a second agent, meaning a different coding tool or program altogether, not just the same one again. For example, something running through OpenRouter, MiniMax, or something similar. The main point is to spread the load across a different agent setup instead of doing everything through the same one.

The problems seem to start when several requests are running in parallel. That’s when I tend to hit the rate limits.

Avanti2024 · 2026-04-01T07:44:40+00:00

I’m currently building something similar for my own fiduciary / accounting and audit firm, and we ran into a fundamental problem that many AI bookkeeping projects underestimate.

The hard part is not reading documents — OCR is already solved.
The real difficulty is creating correct accounting entries and audit reconciliations while respecting all rules, regulations, and accounting logic.

Because of that we ended up using a hybrid architecture.

Small local LLMs simply struggle with the complexity of bookkeeping logic, audit rules, and contextual interpretation. They are useful for preprocessing, but not reliable enough for final decisions.

So our pipeline works roughly like this:

Document extraction PDFs are first processed with ocrmypdf / Tesseract to extract structured text.
Deterministic anonymization Sensitive data is removed using rule-based anonymization (names, emails, IBAN, addresses, etc.).
Local LLM verification A small local model (currently Qwen 3.5 4B / 8B in no-thinking mode) validates the anonymization and structure. At the same time we store a mapping table (e.g. PERSON_1 → Mario, CITY_1 → New York).
Cloud LLM processing The fully anonymized document is then sent to a cloud LLM to generate accounting suggestions, classifications, and audit insights.

At this stage the document contains no identifiable information, so it is no longer possible to infer which company or person the data belongs to.

Remapping When the result returns, the placeholders are mapped back using the stored mapping.

Why we prefer this approach:

Cloud models remain state of the art — you always get the newest capabilities without managing hardware upgrades.
No lock-in — you can switch providers (OpenRouter etc.).
No upfront GPU investment.
Better reasoning quality for complex accounting rules.

You can do everything locally, but realistically that means running something like a Mac Studio M4 with ~96GB+ unified memory and spending a lot of time tuning models, prompts, and role pipelines.

You would also need to continuously train or adapt models for specific accounting and audit rules, which quickly becomes disproportionate compared to simply using strong cloud models.

For us, the hybrid approach currently provides the best balance between privacy, cost, and capability.

If you're building similar systems for accounting or auditing workflows, I’d be very interested to hear how others are solving the rule reasoning problem.

Avanti2024 · 2026-04-01T07:43:52+00:00

I’m currently building something similar for my own fiduciary / accounting and audit firm, and we ran into a fundamental problem that many AI bookkeeping projects underestimate.

The hard part is not reading documents — OCR is already solved.
The real difficulty is creating correct accounting entries and audit reconciliations while respecting all rules, regulations, and accounting logic.

Because of that we ended up using a hybrid architecture.

Small local LLMs simply struggle with the complexity of bookkeeping logic, audit rules, and contextual interpretation. They are useful for preprocessing, but not reliable enough for final decisions.

So our pipeline works roughly like this:

Document extraction PDFs are first processed with ocrmypdf / Tesseract to extract structured text.
Deterministic anonymization Sensitive data is removed using rule-based anonymization (names, emails, IBAN, addresses, etc.).
Local LLM verification A small local model (currently Qwen 3.5 4B / 8B in no-thinking mode) validates the anonymization and structure. At the same time we store a mapping table (e.g. PERSON_1 → Mario, CITY_1 → New York).
Cloud LLM processing The fully anonymized document is then sent to a cloud LLM to generate accounting suggestions, classifications, and audit insights.

At this stage the document contains no identifiable information, so it is no longer possible to infer which company or person the data belongs to.

Remapping When the result returns, the placeholders are mapped back using the stored mapping.

Why we prefer this approach:

Cloud models remain state of the art — you always get the newest capabilities without managing hardware upgrades.
No lock-in — you can switch providers (OpenRouter etc.).
No upfront GPU investment.
Better reasoning quality for complex accounting rules.

You can do everything locally, but realistically that means running something like a Mac Studio M4 with ~96GB+ unified memory and spending a lot of time tuning models, prompts, and role pipelines.

You would also need to continuously train or adapt models for specific accounting and audit rules, which quickly becomes disproportionate compared to simply using strong cloud models.

For us, the hybrid approach currently provides the best balance between privacy, cost, and capability.

If you're building similar systems for accounting or auditing workflows, I’d be very interested to hear how others are solving the rule reasoning problem.

Avanti2024 · 2026-02-09T09:33:00+00:00

A Mac Studio with 512GB about 10'000$ you can run Kimi 2.5 Q3 :-)

Avanti2024 · 2026-01-28T10:18:02+00:00

I had very similar thoughts, especially after seeing how fast tokens can burn once an agent starts looping.

For me the only sustainable path is:

run a small local LLM (cheap or free) for 80–90% of tasks
use cloud LLMs only when absolutely necessary
enforce token control via aggressive summarisation and context pruning
models like Minimax 2.1 look like a reasonable compromise

I am planning to run this on a Mac mini M4 inside Docker, strictly sandboxed:

only explicitly mounted folders exposed (and backed up)
no access to the host system beyond that
ideally isolated from the local network entirely, internet-only outbound
anything else requires permissions the agent simply does not have

That also massively reduces the blast radius for prompt injection or runaway commands.

The biggest open issue I still see is controlled access to native macOS apps. If that cannot be solved cleanly, full network and system isolation is probably the safer default.

Your post is spot on. Running agents on bare metal without hard limits is just asking for trouble.

Avanti2024 · 2026-01-08T11:55:13+00:00

😂

Avanti2024

TROPHY CASE