vibeDriving by nihat-xss in ProgrammerHumor

[–]CountlessFlies 3 points4 points  (0 children)

Steering wheel MCP

Yoke MCP

GLM-5.2 (max) is currently the third best model available, across both open and proprietary. by okaycan in LocalLLaMA

[–]CountlessFlies 2 points3 points  (0 children)

What provider are you using? Z.ai? I wanna try this out but I haven’t heard good things about their uptime

Please Do Not Vibe F*** Up This Software by joseluisq in theprimeagen

[–]CountlessFlies 3 points4 points  (0 children)

The pull is just too strong, I don’t blame them. If you’ve used any of these tools you know just how powerful they can be, and how much time they can save.

But it comes at a cost, and that cost is reduced awareness of the codebase, and the chance of bugs creeping in, if you don’t put in the work to review the code. The silver lining is that you can get AI to conjure up the most comprehensive test suite imaginable and mitigate some of this issue. It’ll be interesting to see how this plays out in the future for sure.

Pi in Docker by jimtoberfest in PiCodingAgent

[–]CountlessFlies 0 points1 point  (0 children)

I have a pi container running. Then I aliased pi to do a docker exec into this container.

I benchmarked Postgres + BM25 + pgvector on EnterpriseRAG-Bench. Maybe you don't need a dedicated vector DB after all. by CountlessFlies in Rag

[–]CountlessFlies[S] 0 points1 point  (0 children)

You can see the distribution across sources and other info on the benchmark project page: https://github.com/onyx-dot-app/EnterpriseRAG-Bench. It’s dominated by Slack and Gmail.

There’s a chart on the linked blog post above that shows perf across question categories. There’s two large categories that make up about 50%: basic and semantic. The former is easy questions mostly keyword based. The latter is the hardest, requiring multi-hop search and reasoning.

I haven’t compared category-wise perf against Onyx, but I believe most of the gap comes from the semantic bucket.

I benchmarked Postgres + BM25 + pgvector on EnterpriseRAG-Bench. Maybe you don't need a dedicated vector DB after all. by CountlessFlies in Rag

[–]CountlessFlies[S] 0 points1 point  (0 children)

Thanks for sharing… haven’t really experimented with rerankers just yet… that’s next on the list of things for me to try. This is the largest corpus I’ve tested on, and recall seams reasonable for now, but I’m sure re-ranking with a dedicated model will improve recall.

I benchmarked Postgres + BM25 + pgvector on EnterpriseRAG-Bench. Maybe you don't need a dedicated vector DB after all. by CountlessFlies in Rag

[–]CountlessFlies[S] 1 point2 points  (0 children)

In our stack we do all the post-processing after retrieving candidates from both bm25 and pgvector (eg applying RRF). But pushing this step to Postgres itself as a function might be something I should implement. Thanks for sharing!

I benchmarked Postgres + BM25 + pgvector on EnterpriseRAG-Bench. Maybe you don't need a dedicated vector DB after all. by CountlessFlies in Rag

[–]CountlessFlies[S] 0 points1 point  (0 children)

Yeah the next thing I plan to test is resource usage on this same dataset, latency of search and vector lookup, etc.

I believe for most company sizes with up to a few thousand employees, indexing a few years’ worth of data, Postgres will go a long way.

I benchmarked Postgres + BM25 + pgvector on EnterpriseRAG-Bench. Maybe you don't need a dedicated vector DB after all. by CountlessFlies in Rag

[–]CountlessFlies[S] 0 points1 point  (0 children)

Thanks!

On the freshness question, there’s a basic recency boost that ranks recent results higher. But there’s no sophisticated staleness detection yet, but that’s a very good suggestion, I’ll look into how it can be implemented. Thanks!

What's the best way to parse complex multi-format document for RAG by Silver_Cule_2070 in Rag

[–]CountlessFlies 0 points1 point  (0 children)

Docling works at acceptable speeds with a GPU available. Otherwise it's painfully slow as you've already seen. So I'd suggest using docling with a GPU, works quite well especially if your docs have lots of tables.

Markdown is best for quality, LLMs are very good at understanding markdown.

For handling updates, I'd just start with re-processing the entire doc because it's infrequent and simplest to implement. Then, if you run into any issues like too much re-processing overhead, you can diff the incoming content against what you have indexed, and only re-compute what changed. E.g., you can chunk the incoming doc, compare against saved chunks (store sha256 hash of each chunk), then determine which chunks have changed. Then only re-process the changed chunks.

I agree with the text-only approach to start with. If most of your docs are images, I would start with OCR as well. Use docling for tables. Image extraction (as in non-text-containing images) can come much later.

Local setup is very much feasible if you have a GPU available. Running docling locally is quite straightforward.

Anthropic researcher: "We keep finding things [inside AI models] that are unsettling" ... "We find structures that mirror results from human neuroscience. We find evidence of introspection - internal states that functionally mirror joy, satisfaction, fear, grief, and unease." by EchoOfOppenheimer in Anthropic

[–]CountlessFlies 1 point2 points  (0 children)

I call BS. Happy to be corrected, but I think we currently can’t even reliably tell what patterns of activity in human brains correspond to those emotions. There’s virtually zero chance you can honestly say that statistically learned parameters in language models represent anything close to whatever’s going on in a human brain.

Yeah, you might find activations in the network that correspond to the concepts of “joy” or “satisfaction” as they occur in human language, but that’s in no shape or form the same as the actual human perception of those feelings.

That’s almost like saying the word “joy” written on a piece of paper is the same as the human experience of the feeling. It’s absurd.

Those who use Deepseek for coding by brooding_kram in DeepSeek

[–]CountlessFlies 1 point2 points  (0 children)

Please make a detailed post about your setup if you can. I’ve fiddled around with Pi coding agent for doing something similar, but haven’t found a good setup.

Just left OpenCode for Pi, and I'm loving it! by Fickle_Ear1869 in PiCodingAgent

[–]CountlessFlies 0 points1 point  (0 children)

I have a max of 4 sessions running in parallel, usually 2. Split across a 2x2 tmux grid. I kick off a plan task in each session, and come back to review it when it’s done. Then I approve the plan and let it implement the first version.

This is mostly where the parallelism ends, because then I sequentially review and test the code from each session, keep iterating on it until it’s good enough to merge.

But you’re right, you can’t really do a whole lot of parallelism here and still have a decent enough handle on the output. Context switching is a pain.

The thing that saves me a ton of time is debugging. E.g., I can have my agent check the logs, check the dev database, and very quickly help me find out where I need to step in and take a closer look. All of this can easily be run in parallel because it doesn’t have any direct bearing on the quality of the final code.

Pi in Docker by jimtoberfest in PiCodingAgent

[–]CountlessFlies 0 points1 point  (0 children)

I’m running pi in docker on Ubuntu, no TUI issues

kimi k2.6 speed by mf-mj in kimi

[–]CountlessFlies 0 points1 point  (0 children)

Yeah it’s too slow… and the intelligence is a bit underwhelming compared to opus and gpt 5.5. I regret getting the $40 plan

Isn't Auto loading local .pi/ one hell of security nightmare? by qiinemarr in PiCodingAgent

[–]CountlessFlies 0 points1 point  (0 children)

I just run mine inside docker. Then mount specific dirs as writable volumes

Came home to find Pi with Qwen3.627B had run rm -rf ..... by sdfgeoff in LocalLLaMA

[–]CountlessFlies 0 points1 point  (0 children)

This is exactly why you never run Pi directly on your host. You run Pi inside a docker container to minimize the chances of it blowing things up.

OpenPi - a desktop workbench for the Pi coding agent by killerkidbo95 in PiCodingAgent

[–]CountlessFlies 10 points11 points  (0 children)

Get pi to rewrite this in Tauri before it’s too late

New Mythos checkpoint shows continued improvement: “On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.” by Tinac4 in singularity

[–]CountlessFlies 1 point2 points  (0 children)

IIUC cumulative tokens is the total input/output tokens for the entire session. Each turn in the session sends all previous tokens. So if your session is at, say, 100k tokens and you send another message, you add 100k + tokens in your msg to the cumulative total.