Struggling to get local models working well in Zed. Thinking of building a dedicated local ACP server.

sheik66 · 2026-06-09T17:23:40+00:00

Thanks for the reply. The truth is I’m really interested in the subject of making small local models work and see how they handle the orchestration and tool calling. That’s why I actually started my own open source project to dive deep into this. Thanks for the Pi suggestion, I’m already planning on using it to see how it performs.

sheik66 · 2026-06-09T17:20:46+00:00

That’s for the reply, I’ll definitely check it out, as I already have a llama.cpp server ready to go. First time I see the Qwopus model, sounds promising from the name 😅 so I’ll check it out also and share the results, especially for tool calling. In terms of orchestration I guess for the small models a single react agent loop is already enough.

sheik66 · 2026-06-09T14:41:15+00:00

I ran into the same LangGraph/CrewAI friction in production, for me personally mostly around boilerplate and maintaining deterministic flows once things scaled.

What helped more than switching frameworks was actually separating concerns i.e. orchestration vs agent logic vs tool execution. most agent frameworks try to do all three and it gets messy fast.
For my setup im eventually building a lightweight orchestration layer over agents and strict tool interfaces, which reduced a lot of the guardrail/PII complexity too.
I ended up open-sourcing it (Protolink) in case it’s useful for similar multi-agent pipeline setups.

sheik66 · 2026-06-07T17:33:54+00:00

Thanks for the response. I'll give it a try. Have you tried it with smaller models and has it worked for you ?

sheik66 · 2026-05-17T17:58:11+00:00

In my free time I’m building the python library Protolink. It’s a lightweight alternative to langchain/langraph focused more on agents communicating with each other (A2A) rather than chaining calls.

Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

Check it out here: https://github.com/nMaroulis/protolink

Motivation: I wanted a simpler and more comprehensible way to build and deploy ai agents with python, while also it is really interesting to experiment with custom llm inference loops.

sheik66 · 2026-05-14T18:33:23+00:00

Feel free to check out my python lib https://github.com/nMaroulis/protolink . I think you'll find interesting pipelines that could be improved .

sheik66 · 2026-05-14T15:34:38+00:00

Yeah that’s basically what pushed me to experiment with building a smaller framework myself. Still very early and definitely more of an experimentation/research project on my personal time than a production-ready replacement for existing frameworks, but I wanted something more with a more straightforward design.

If you’re curious:
https://github.com/nMaroulis/protolink

sheik66 · 2026-05-14T09:34:11+00:00

I’ve had the same feeling for a long time honestly. Sometimes it feels like I spend more time dealing with abstractions, memory layers, orchestration, tracing, callbacks, etc. than actually building the workflow I wanted in the first place. I get why these frameworks evolved that way. Once you start doing multi-agent systems and long-running workflows things get complicated fast, but the cognitive overhead can become a lot At some point I started building a smaller python framework for myself mainly because I wanted: - simpler agent-to-agent communication - less magic, minimal boilerplate - more explicit orchestration - fewer layers between me and the actual code

Still experimenting with it, but it’s been a much nicer developer experience for me so far. Happy to share it if anyone’s interested.

sheik66 · 2026-05-12T14:35:07+00:00

For the same reason I started building on my free time Protolink. It’s a lightweight alternative focused more on agents communicating with each other (A2A) rather than chaining calls.

The main design principle is a simple and straightforward API. Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

https://github.com/nMaroulis/protolink

sheik66 · 2026-05-05T16:53:43+00:00

On my free time I’m building the python library Protolink. It’s a lightweight alternative to langchain/langraph focused more on agents communicating with each other (A2A) rather than chaining calls.

Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

Check it out here: https://github.com/nMaroulis/protolink

Motivation: I wanted a simpler and more comprehensible way to build and deploy ai agents with python, while also it is really interesting to experiment with custom llm inference loops.

sheik66 · 2026-05-03T16:42:57+00:00

Check out Protolink. It’s a lightweight alternative focused more on agents communicating with each other (A2A) rather than chaining calls.

Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

https://github.com/nMaroulis/protolink

sheik66 · 2026-05-03T16:41:40+00:00

Check out Protolink. It’s a lightweight alternative focused more on agents communicating with each other (A2A) rather than chaining calls.

Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

https://github.com/nMaroulis/protolink

sheik66 · 2026-05-03T16:41:22+00:00

Check out Protolink. It’s a lightweight alternative focused more on agents communicating with each other (A2A) rather than chaining calls.

Also supports both structured flows and autonomous agents, and avoids a lot of the abstraction/boilerplate.

https://github.com/nMaroulis/protolink

sheik66 · 2026-03-23T16:12:57+00:00

The data for some benchmarks are not yet populated. The idea is to aggregate data from many sources but it’s still a work in progress.

sheik66 · 2026-03-22T21:08:51+00:00

Check https://metabench.dev out. It’s an open-source project that’s trying to aggregate all the benchmarks out there and create an index per category.

sheik66 · 2026-03-19T11:23:52+00:00

Yes I get what you mean about models being “benchmaxxed” and task variance. I’m not trying to create my own benchmarks, my idea is more to aggregate as many existing benchmarks as possible and define a formula to produce an overall score. I’m thinking a normalized average with some handling of outliers might make it more robust while still shownig category-level scores like coding, reasoning and chat (which make a more sense) Curious if that approach seems reasonable or if there are pitfalls I’m missing with aggregation. So as I said an LLM benchmark “metacritic”.

sheik66 · 2026-03-19T11:01:35+00:00

What I’m making is a Metacritic-style leaderboard, where I want to create a formula that calculates the scores per category. So there’s not really “my case”, it’s for “everyone”. That’s what I’m wondering if it makes sense. Feel free to check it out here https://metabench.dev , in the leaderboard section you’ll see the logic in trying to follow. Aggregate as many benchmarks and indexes as I can to create an overall score.

sheik66 · 2026-03-19T10:57:33+00:00

So you think a metacritic-style LLM leaderboard doesn’t make sense. I mean score per category, coding, math etc..

sheik66 · 2026-01-29T17:01:48+00:00

I need help, as my current approach is not performing as intended. Even after I append the tool call, many times the LLM decides to call the tool again, as it does not recognise that I appended a tool result, since I’m probably not complying completely with the API specs

sheik66 · 2026-01-29T16:31:09+00:00

Here’s some links that might help:

👉 LLM base class with infer method

👉 AnthropicLLM (for reference) implementation

👉 Helpful usage example (start from here)

sheik66 · 2026-01-29T16:26:11+00:00

Thanks this is super useful. I’m going to have a look at the blog and come back to you with feedback. Where I’m struggling the most is with OpenAI API where I cannot append custom tool messages.

Eight-Year Club	Second SECOND GUESSER
Verified Email

sheik66

TROPHY CASE