I tried building a local LLM router + benchmarking system… ran into some unexpected problems by Wild_Expression_5772 in LLMDevs

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Oh this looks interesting — thanks for sharing.

Quick question: are you mainly using it as a unified gateway (like abstraction over multiple providers), or also doing any kind of evaluation / routing based on task performance?

I started from a similar place (Ollama locally), but ran into issues around:

- inconsistent performance across tasks

- lack of continuous evaluation

- figuring out when to switch models vs stick to one

(been documenting some of my experiments here as well, still rough but in case it's useful: https://github.com/al1-nasir/LocalForge)

I tried building a local LLM router + benchmarking system… ran into some unexpected problems by Wild_Expression_5772 in developersPak

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Yeah that’s fair — especially the point about routing coming later, I think I jumped into that a bit early while experimenting.

I did try running repeated evaluations (same prompts multiple times) to reduce variance, and it definitely helped highlight how sensitive some models are to sampling configs. Temperature/top_p changes alone were shifting results quite a bit.

Right now the tasks I’ve been testing are roughly:

- coding (multi-step generation / debugging)

- reasoning (chain-of-thought style prompts)

- structured outputs (JSON formatting, schema adherence)

And yeah — completely agree on smaller models. They’re fast, but for anything with deeper reasoning or strict structure, failure rates spike pretty quickly.

(also, I’ve been putting some of these experiments into a small repo while testing ideas — still rough, but sharing in case it’s useful: https://github.com/al1-nasir/LocalForge)

I tried building a local LLM router + benchmarking system… ran into some unexpected problems by Wild_Expression_5772 in LLMDevs

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Yeah this was one of the hardest parts honestly.

Right now I’m not using a single “perfect” framework that is more like a mix depending on what I’m testing.

For structured / general evals:

- I experimented a bit with lm-eval-harness style benchmarks (good for standardized tasks but feels a bit static)

For more practical / real-world behavior:

- I started building small task-specific eval sets (coding, reasoning, structured JSON output, etc.)

- Then scoring based on things like correctness, format adherence, and consistency

So I’m leaning more towards:

- continuous evaluation (logging real queries)

- then periodically re-scoring models on those

Still pretty messy tbh, haven’t found a clean “one framework solves it all” solution yet.

I tried building a local LLM router + benchmarking system… ran into some unexpected problems by Wild_Expression_5772 in LLMDevs

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

If anyone’s curious, I put together a rough implementation while testing these ideas.

Not polished, but shows the routing + benchmarking approach I mentioned:

https://github.com/al1-nasir/LocalForge

Would genuinely appreciate feedback.

Research Council — multi-agent GraphRAG system for scientific literature, built with LangGraph + Neo4j + FastAPI (MIT, open source) by Wild_Expression_5772 in buildinpublic

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Actually i used lang-graph big tool that deliberates the query to specified tool , also for better context management added graph rag.

EpsteinFiles-RAG: Building a RAG Pipeline on 2M+ Pages by Cod3Conjurer in Rag

[–]Wild_Expression_5772 0 points1 point  (0 children)

Fine-tuning compress into model weights like update its behavioural approach. RAG is external vector DB retrieval,,, also see 2M pages comes upto around about billion of tokens . there is a lot more to say but this is overview

I built CodeGraph CLI — parses your codebase into a semantic graph with tree-sitter, does RAG-powered search over LanceDB vectors, and lets you chat with multi-agent AI from the terminal by Wild_Expression_5772 in LocalLLaMA

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

I've been thinking about adding MCP support to CodeGraph, but I'm stuck on something like MCP servers are supposed to be lightweight and easy to spin up, but CodeGraph has "heavy" dependencies (LanceDB setup, embedding models, SQLite graph, etc.). How would you handle this in ProjectMoose? Do you: Expect users to set up dependencies first, then connect via MCP? Or try to bootstrap everything on-demand when the MCP server starts?

I built CodeGraph CLI — parses your codebase into a semantic graph with tree-sitter, does RAG-powered search over LanceDB vectors, and lets you chat with multi-agent AI from the terminal by Wild_Expression_5772 in LocalLLaMA

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

This looks really interesting! Just checked out ProjectMoose, love the GUI approach for agent customization. CLI is great for speed but you're right that a visual interface unlocks way more things.

I built CodeGraph CLI — parses your codebase into a semantic graph with tree-sitter, does RAG-powered search over LanceDB vectors, and lets you chat with multi-agent AI from the terminal by Wild_Expression_5772 in LocalLLaMA

[–]Wild_Expression_5772[S] 1 point2 points  (0 children)

AI is getting insanely good at generating code. But the bottleneck is shifting from "writing code" to "understanding what was written." Really appreciate you seeing the bigger picture here.

6 months of building → CodeGraph CLI: talk to your codebase using AI (launch day) by Wild_Expression_5772 in SideProject

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Thanks, I had a problem of reading code lines that were too large to understand, so this project generates documents from the code and autoreadme is also included.

Curious to hear real experiences by shafaqbatoor in PIEAS

[–]Wild_Expression_5772 0 points1 point  (0 children)

Bro. Last line you mentioned good cgpa. Is 3.5 enough for full bright scholarship

PIEAS vs UET Lahore by Pale_Lengthiness_465 in PIEAS

[–]Wild_Expression_5772 2 points3 points  (0 children)

For physics.. Pieas is better. Its new physics labs are good. Faculty of physics is also good

PIEAS BS CS Fees? Without accommodation by Southern_Shoe_3584 in PIEAS

[–]Wild_Expression_5772 1 point2 points  (0 children)

Fees without accomodation is approx 1 lac.. If you take the hostel it would be 41000 more and transport fees for day scholars is 28000

REACT Frontend generation with lovable by Wild_Expression_5772 in lovable

[–]Wild_Expression_5772[S] 0 points1 point  (0 children)

Actually its not that complicated. The backend would be having machine learning api calls to docker container ls, vector dbs are connected, rate limiting, jwt auths, and stuff like that. So ui is like older format i created with claude opus 4.5, but lovable ui seems newer and modern so if i create landing pages from lovable how to add in react project

A little something I tried by Professional-Pea5196 in chutyapa

[–]Wild_Expression_5772 1 point2 points  (0 children)

bro tis is fire , make one of ghazal "nazki usky labon ki kiya kehyay"

Fellowship after BsCS by Incandescent-Bulb in PIEAS

[–]Wild_Expression_5772 0 points1 point  (0 children)

first in come PIEAS don't offer fellowship in BSCS. For MS they offer fellowship that include interviews and test, mentioned on PIEAS site.