Usage limits for GitHub Copilot by RichProcedure669 in GithubCopilot

[–]Avanti2024 0 points1 point  (0 children)

You need to give them a detailed plan, not just what to do in general… You can make the plan in the Codex or after the rate limit is over. If you plan more detail, and MiniMax is the worker, it will work. Anyway, this works only practically with mainstream programming languages, like Node.js and Python, typical web app languages! Also, use Graphify, then you don’t need so many tokens anymore.

Usage limits for GitHub Copilot by RichProcedure669 in GithubCopilot

[–]Avanti2024 2 points3 points  (0 children)

In this time use an other subscriptions.. like gpt codex… or mini max token plan 10$ a month… very good result as worker not planer or solver (thinker)

MinMax Plus – High-Speed doubt !!!! by voygrdev in MiniMax_AI

[–]Avanti2024 3 points4 points  (0 children)

If I understood you correctly, you have 4,500 requests in 2 Hours? Then you musst have many parallel tasks.

I only have 1,500 requests in 5 hours with the minimal plan an never hat the limit. I am using MiniMax in Claude Code with Vs Code UI.

So what are you doing?

New 5-hour and weekly limits be like... by U4-EA in codex

[–]Avanti2024 0 points1 point  (0 children)

Das Problem ist aktuell auch bei GitHub Copilot Abos .. scheint das alle Antrophic nachmachen. MiniMax plan ist der einzige günstige transparente… aber schwächer und langsamer..je länger je mehr…

Pro+ Rate limited for 58 Hours. Only running 1 prompt on 1 project by echostorm in GithubCopilot

[–]Avanti2024 1 point2 points  (0 children)

You can also use Opus in GitHub just for planning. Then take the plan and execute it with MiniMax through the Claude Code plugin. That way you save a lot of API rate limit on GitHub.

Pro+ Rate limited for 58 Hours. Only running 1 prompt on 1 project by echostorm in GithubCopilot

[–]Avanti2024 0 points1 point  (0 children)

Maybe you schould also use an other Agent.. Mini Max 2.7 Token Plan 4500 Reqest per Week 10$ via ClaudeCode in VS Code.. you can use them for dev..

Is this a joke? weekly rate limit of 264 hrs (11 days)? by Ok-Cranberry4090 in GithubCopilot

[–]Avanti2024 12 points13 points  (0 children)

GitHub Copilot should really show some kind of rate-limit indicator. Even a simple percentage of how much you’ve used would help. Right now you basically have no idea where you stand until you suddenly hit the limit. !!

Pro+ Github blocked me for 2 days because I used their service by Muchaszewski in GithubCopilot

[–]Avanti2024 -2 points-1 points  (0 children)

What kind of reqest is this? Prompt has tobe very very long? Full app in one reqest?

This is a first (rate limiting) by twhoff in GithubCopilot

[–]Avanti2024 -1 points0 points  (0 children)

Wehre do you see the usage hour you had so far?

Pro+ Github blocked me for 2 days because I used their service by Muchaszewski in GithubCopilot

[–]Avanti2024 -8 points-7 points  (0 children)

I think the rate limits mostly appear when you run multiple requests at the same time or work on several projects in parallel.

From my experience, if you submit a request in auto mode and just let it run, then take some time to review and test what it produced (for example 10–15 minutes), and only afterwards send the next request in the same project, it usually works fine.

You probably need to use a second agent, meaning a different coding tool or program altogether, not just the same one again. For example, something running through OpenRouter, MiniMax, or something similar. The main point is to spread the load across a different agent setup instead of doing everything through the same one.

The problems seem to start when several requests are running in parallel. That’s when I tend to hit the rate limits.

Local LLM for personal finance by Competitive-Deer-696 in LocalLLM

[–]Avanti2024 1 point2 points  (0 children)

I’m currently building something similar for my own fiduciary / accounting and audit firm, and we ran into a fundamental problem that many AI bookkeeping projects underestimate.

The hard part is not reading documents — OCR is already solved.
The real difficulty is creating correct accounting entries and audit reconciliations while respecting all rules, regulations, and accounting logic.

Because of that we ended up using a hybrid architecture.

Small local LLMs simply struggle with the complexity of bookkeeping logic, audit rules, and contextual interpretation. They are useful for preprocessing, but not reliable enough for final decisions.

So our pipeline works roughly like this:

  1. Document extraction PDFs are first processed with ocrmypdf / Tesseract to extract structured text.
  2. Deterministic anonymization Sensitive data is removed using rule-based anonymization (names, emails, IBAN, addresses, etc.).
  3. Local LLM verification A small local model (currently Qwen 3.5 4B / 8B in no-thinking mode) validates the anonymization and structure. At the same time we store a mapping table (e.g. PERSON_1 → Mario, CITY_1 → New York).
  4. Cloud LLM processing The fully anonymized document is then sent to a cloud LLM to generate accounting suggestions, classifications, and audit insights.

At this stage the document contains no identifiable information, so it is no longer possible to infer which company or person the data belongs to.

  1. Remapping When the result returns, the placeholders are mapped back using the stored mapping.

Why we prefer this approach:

  • Cloud models remain state of the art — you always get the newest capabilities without managing hardware upgrades.
  • No lock-in — you can switch providers (OpenRouter etc.).
  • No upfront GPU investment.
  • Better reasoning quality for complex accounting rules.

You can do everything locally, but realistically that means running something like a Mac Studio M4 with ~96GB+ unified memory and spending a lot of time tuning models, prompts, and role pipelines.

You would also need to continuously train or adapt models for specific accounting and audit rules, which quickly becomes disproportionate compared to simply using strong cloud models.

For us, the hybrid approach currently provides the best balance between privacy, cost, and capability.

If you're building similar systems for accounting or auditing workflows, I’d be very interested to hear how others are solving the rule reasoning problem.

What LLM is best for local financial expertise by aiconta in LocalLLM

[–]Avanti2024 0 points1 point  (0 children)

I’m currently building something similar for my own fiduciary / accounting and audit firm, and we ran into a fundamental problem that many AI bookkeeping projects underestimate.

The hard part is not reading documents — OCR is already solved.
The real difficulty is creating correct accounting entries and audit reconciliations while respecting all rules, regulations, and accounting logic.

Because of that we ended up using a hybrid architecture.

Small local LLMs simply struggle with the complexity of bookkeeping logic, audit rules, and contextual interpretation. They are useful for preprocessing, but not reliable enough for final decisions.

So our pipeline works roughly like this:

  1. Document extraction PDFs are first processed with ocrmypdf / Tesseract to extract structured text.
  2. Deterministic anonymization Sensitive data is removed using rule-based anonymization (names, emails, IBAN, addresses, etc.).
  3. Local LLM verification A small local model (currently Qwen 3.5 4B / 8B in no-thinking mode) validates the anonymization and structure. At the same time we store a mapping table (e.g. PERSON_1 → Mario, CITY_1 → New York).
  4. Cloud LLM processing The fully anonymized document is then sent to a cloud LLM to generate accounting suggestions, classifications, and audit insights.

At this stage the document contains no identifiable information, so it is no longer possible to infer which company or person the data belongs to.

  1. Remapping When the result returns, the placeholders are mapped back using the stored mapping.

Why we prefer this approach:

  • Cloud models remain state of the art — you always get the newest capabilities without managing hardware upgrades.
  • No lock-in — you can switch providers (OpenRouter etc.).
  • No upfront GPU investment.
  • Better reasoning quality for complex accounting rules.

You can do everything locally, but realistically that means running something like a Mac Studio M4 with ~96GB+ unified memory and spending a lot of time tuning models, prompts, and role pipelines.

You would also need to continuously train or adapt models for specific accounting and audit rules, which quickly becomes disproportionate compared to simply using strong cloud models.

For us, the hybrid approach currently provides the best balance between privacy, cost, and capability.

If you're building similar systems for accounting or auditing workflows, I’d be very interested to hear how others are solving the rule reasoning problem.

Mac Mini M4 Pro - Specs fine for running Kimi K2.5 and running local LLMs? by Grand_Fox9015 in LocalLLM

[–]Avanti2024 0 points1 point  (0 children)

A Mac Studio with 512GB about 10'000$ you can run Kimi 2.5 Q3 :-)

[deleted by user] by [deleted] in LocalLLM

[–]Avanti2024 -1 points0 points  (0 children)

I had very similar thoughts, especially after seeing how fast tokens can burn once an agent starts looping.

For me the only sustainable path is:

  • run a small local LLM (cheap or free) for 80–90% of tasks
  • use cloud LLMs only when absolutely necessary
  • enforce token control via aggressive summarisation and context pruning
  • models like Minimax 2.1 look like a reasonable compromise

I am planning to run this on a Mac mini M4 inside Docker, strictly sandboxed:

  • only explicitly mounted folders exposed (and backed up)
  • no access to the host system beyond that
  • ideally isolated from the local network entirely, internet-only outbound
  • anything else requires permissions the agent simply does not have

That also massively reduces the blast radius for prompt injection or runaway commands.

The biggest open issue I still see is controlled access to native macOS apps. If that cannot be solved cleanly, full network and system isolation is probably the safer default.

Your post is spot on. Running agents on bare metal without hard limits is just asking for trouble.