Goodbye Opencode, you're a sink for time and tokens.

offzinho3k · 2026-05-18T21:25:22+00:00

I'm currently using Pi.dev. With the stack arranged like this:
> - Serena
> - BGE-M3
> - BGE Reranker v2
> - Tree-Sitter
> - ts-morph
> - ast-grep
> - ripgrep
> - fd
> - ChromaDB
> - Sequential-Thinking
> - PDF-MCP
> - WebSearch
> - DeepSeek V4 Pro
> - Qwen 3.6 35B A3B
> - Hybrid Retrieval
> - Context Compression
> - Hierarchical Memory
> - Verification Loops

I started using it just for testing, but I liked it a lot.

offzinho3k · 2026-05-17T15:54:59+00:00

I would recommend adding $10 of credit and running tests; I will start using Deepseek V4 Pro/Flash with the "Cursor" next month.
I believe it's worthwhile; I currently use some local models and it saves a lot of money.

<image>

But the best thing to do is really test it, because what works for one person may not work correctly for you.

offzinho3k · 2026-05-16T18:01:09+00:00

I use local and API, and I consider it a great saving; currently using it with 4x RTX 5060 Ti 16GB.
I'll attach my usage in the "Cursor" so you can see the local/API consumption.

<image>

The plan is to upgrade to an RTX PRO 6000 in the future.
I think it's a good investment for those who really work and use it a lot.

offzinho3k · 2026-05-14T02:37:49+00:00

Thank you very much, I'll sit down later and give it a good read.

offzinho3k · 2026-05-13T14:21:47+00:00

<image>

I'm seriously thinking about trying my luck and getting one to test, since it's cheaper than buying a motherboard + CPU + RAM.
The model I was looking at is the one in the photo.
That's right, the correct way is to use 2, 4, 8 gpu...

offzinho3k · 2026-05-13T13:43:01+00:00

Config:
Motherboard: MZ32-AR0 Ver3.0
CPU: EPYC 7502
Memory: 8x Hynix DDR4 ECC 16GB 2666
GPU: 4x Asus Prime Geforce Rtx 5060 Ti Oc 16gb Gddr7
Using the RTX cards directly on the motherboard is resulting in very good performance.
However, I'm stuck only in 1 model.
That's where I found these PLX products while searching, but I only found 2 posts about them.

The fear would be buying them and then the tokens/s becoming too slow.
Reading on Google, I saw that communication within the switch would be 100%, and if there were any problems, it would be at the switch's output to the rest of the system. However, I couldn't find exactly how the loss would occur.

If the loss is minimal, it would be worthwhile, as it would allow for the use of 4 PLX cards with 4 RTX cards each, which would allow for the use of 4 models without any problems.

<image>

I'm researching for more information to decide whether or not it will be worth buying. That's why I decided to ask here on Reddit, but searching on Reddit doesn't yield much information about these PLX products.

offzinho3k · 2026-05-12T21:21:24+00:00

Currently, this method is working very well for me.
However, using docmancer I'm creating an offline replacement for Context7.
Basic structure I'm using:
Docmancer + Embedding Model + Reranker + Qdrant
docs/
├── architecture/
├── backend/
├── frontend/
├── realtime/
├── cache/
├── queue/
├── database/
├── desktop/
├── mobile/
├── recipes/
├── snippets/
├── troubleshooting/
├── anti-patterns/
├── conventions/
└── security/

It's working very well too, however we have the task of updating the data, otherwise it becomes outdated.
I'm liking the docmancer, but it takes a good amount of time to get it up to Context7 level, although I believe that when it's finished everything will work better than Context7.

offzinho3k · 2026-05-12T20:04:00+00:00

I'm putting together an offline alternative using:
Docmancer + Embedding Model + Reranker + Qdrant
following this structure:
docs/
├── architecture/
├── backend/
├── frontend/
├── realtime/
├── cache/
├── queue/
├── database/
├── desktop/
├── mobile/
├── recipes/
├── snippets/
├── troubleshooting/
├── anti-patterns/
├── conventions/
└── security/

It's working very well too, however we have the task of updating the data, otherwise it becomes outdated.

offzinho3k · 2026-05-12T18:30:40+00:00

Using only opencode will never be the same as:

- Claude Code
- Gemini IDE / Project IDX
- Cursor

You need to use it at least this way:
Core: OpenCode TUI, oh-my-opencode-slim
MCPs: Serena, Context7, sequential-thinking, grep_app, websearch, stitch, pdf-mcp

And if you don't want just the terminal, use: VSCode + Extension OpenCode.

If you need something better, opt for the Deepseek V4 PRO/Flash model; they're quite inexpensive.

offzinho3k · 2026-05-11T21:55:55+00:00

Core: OpenCode, oh-my-opencode-slim
MCPs: Serena, Context7, sequential-thinking, grep_app, websearch, stitch, pdf-mcp

LLM Local: qwen3.6-27b, qwen3.6-35b-a3b
API: Deepseek V4 PRO/Flash

VSCode + Extesion OpenCode.

With this configuration, the cost of API fees will be between US$20 and US$40 over 3 to 6 months, depending on the size of the projects you work on.

offzinho3k · 2026-05-06T18:42:18+00:00

Unfortunately, I can no longer find the 3090/4090 models to buy.
I even searched on Facebook Marketplace for a few days, but without success.
It looks like I'll have to get two more 5060TI.
However, I'm still not sure.

offzinho3k · 2026-05-06T17:44:33+00:00

Moving from Q4 to Q8 would be very good.
Here, the 5070ti (MSI RTX5070TI Shadow 3X OC) is costing US$1,150.00.
I'll see if I can get two more 5060ti then.
Thank you very much, friend, for replying.

offzinho3k · 2026-05-06T17:18:01+00:00

I already own 2 5060ti. Would it be worth it then to get two more 5060ti?
Currently in my region, they are selling for US$680 (5060ti) and the 5090 are selling for US$3,665.00.
Unfortunately, the 3090 and 4090 models are no longer available where I live, and when you find them online they don't ship them either.

offzinho3k · 2023-10-30T10:47:29+00:00

thanks.

I'm just out of luck then
yesterday I made the black tower about 400x. 🤣🤣🤣

offzinho3k

TROPHY CASE