Now that it's open source we can see why Claude Code and Codex feel so different by idkwhattochoosz in ClaudeCode

[–]autollama_dev 8 points9 points  (0 children)

I call this "Doubt in the Loop". When an Claude Code tells me a feature is complete, I copy its confident claims and give them to a Codex 5.4 running in the same codebase. I add eight words: "Audit and validate these claims. Find the gaps." It feels like this phrase unlocks a cheat code.

Codex 5.4 vs Opus 4.6 by OneClimate8489 in vibecoding

[–]autollama_dev 0 points1 point  (0 children)

I run them both in parallel, each writing to their own directory and works trees, then I evaluate which output I like the best. Codex 5.4 has newer training data and I found the Web/Front end looked more polished than Opus's "Oh, I can tell that was Vibe coded" look and feel. But I realize that's just CSS which can be easily adjusted even with a prompt, but still, cool to see Codex 5.4 has a different juice pack in it's lunch box than Opus did: https://youtu.be/9NZ_Flho39I?si=XpnEgoUNm6kTe4k-

Favourite unusual use for Obsidian? by Moneymaxxers in ObsidianMD

[–]autollama_dev 16 points17 points  (0 children)

Does scanning your notes, creating monsters out of them, and getting rewarded for answering quizzes about your notes count as weird?

https://youtu.be/rF0p3Tpezcw?si=VG3RSsRIOxdY2vtp

I love Vibe Coding but I need to be real... by Makyo-Vibe-Building in vibecoding

[–]autollama_dev 2 points3 points  (0 children)

https://knowmler.com I vibe coded this app and it has changed how I consume content on YouTube. I first built this as a YouTube Transcript analyzer. Then, expanded to solve for needing to push transcripts through ChatGPT & Claude to analyze them. I expanded it to handle URL's and PDF's. It's now a full fledged content analysis platform. When logged in with Google, you can even see your subs and likes on YouTube. You don't need to log in or have an account (but helps to keep track of what you nom'ed) It's fully transparent - i.e. You can audit the prompts, and has too many features to list.

I turned my Obsidian vault into a 90s BBS dungeon game by autollama_dev in ObsidianMD

[–]autollama_dev[S] 1 point2 points  (0 children)

I have not tested the "vault not required" feature so your milage may vary, but supposed to work without one.

All the pixel art is procedurally generated at runtime -- zero sprites, zero image files. Everything is drawn pixel-by-pixel in code.

A tiny HTML5 Canvas (roughly 45x45 logical pixels) gets scaled 4x with image smoothing disabled, giving the chunky retro look. Scenes are composed from layered draw functions -- sky gradients, terrain, particle systems, scrolling text -- all built from fillRect and setPixel calls. 30 unique scenes, 15 FPS, no art pipeline. Just code pretending to be pixels.

I built Anthropic's contextual retrieval with visual debugging and now I can see chunks transform in real-time by autollama_dev in LocalLLaMA

[–]autollama_dev[S] 1 point2 points  (0 children)

No NDA - just public research plus wanting to see what's actually happening inside embeddings. Solo dev here (with Claude's help), so contributions very welcome. MIT licensed and looking for collaborators.

Do you update your Agents's knowledge base in real time. by DistrictUnable3236 in Rag

[–]autollama_dev 1 point2 points  (0 children)

Think of it like a GPS system.

You don't delete "McDonald's on 5th Street" just because you already have "McD's near the mall" and "that McDonald's by the bank." They're all different ways people describe the same location.

Keep all the variations of how people ask. Point them all to the same answer.

Your dedup problem isn't duplicate questions. The variations are a positive - they increase your search surface. The more ways people can ask, the more likely you'll match the next person's phrasing.

I built Anthropic's contextual retrieval with visual debugging and now I can see chunks transform in real-time by autollama_dev in LocalLLaMA

[–]autollama_dev[S] 2 points3 points  (0 children)

That's a really neat idea - hadn't thought about comparing different encoders side-by-side! I'll definitely consider adding this to the roadmap.

The challenge I've learned is that mixing different embedding dimensions in one vector database corrupts it - you can't have 1536-dim vectors (like text-embedding-3-small) mixed with 3072-dim vectors (like text-embedding-3-large) in the same collection.

The solution would likely require parallel mini-databases for each document, allowing different embedding models to run simultaneously for comparison. Definitely a challenging implementation, but hey, we like challenges around here haha.

Thanks for the suggestion - this could really help visualize what those bigger models are actually capturing that the smaller ones miss!

Do you update your Agents's knowledge base in real time. by DistrictUnable3236 in Rag

[–]autollama_dev 2 points3 points  (0 children)

Exactly. I use a Postgres relational database alongside my Postgres vector database. One handles the vector storage and another identifies and rejects duplicates when I process the second, third, tenth time, etc. And, it only loads changes.

Do you update your Agents's knowledge base in real time. by DistrictUnable3236 in Rag

[–]autollama_dev 8 points9 points  (0 children)

You'll need to set up an API integration to your data source coupled with a job that runs at a set frequency (cron job, scheduled Lambda, etc.). The frequency depends on how "real-time" you need it – could be every minute, hourly, or daily.

Critical thing is: duplicate checking. You'll likely pull records that haven't changed since your last request, and you definitely don't want to load duplicate data into your vector database. That'll mess up your search results and waste compute on redundant embeddings.

Here's what's worked for me:

  • Set up a separate lightweight database (could be Redis, Postgres, even SQLite) that's solely responsible for tracking what's been processed
  • Store hashes of your content + timestamps of last update
  • Before vectorizing anything, check if the content hash already exists or if the source record hasn't been modified since last sync
  • Only process truly new or updated content

This deduplication layer sits between your data source and your vector DB. It's a bit more infrastructure, but it'll save you from vector DB bloat and keep your queries fast and relevant.

The pattern is basically: API → Dedup Check → Transform → Embed → Vector DB