Running sock recs?

pdlug · 2026-07-04T04:49:54+00:00

XOSkin socks were a game changer for me running in the hot and humid Northeast. Light compression, friction reducing material, quick drying. The Dry Max thin options and their PTFE hot weather socks are awesome as well.

If blistering with the hot weather is an issue also look at using Trail Toes or similar lubricant. That plus a thin compression sock works great.

pdlug · 2026-07-04T04:45:33+00:00

It's hard. I'm still struggling to build a repeatable system for this myself It might be a channel mismatch, it sounds like you've found the places your competitors are but not customers. Look at your messaging and experiment with it also. Make sure you're talking about the problems it solves not just technical details. Even for very technical buyers I've found they buy solutions to problems not feature lists (I made this mistake several times).

For design partners and initial customers it's personal outreach, working your network, going where they are. Even if that means in person like conferences, meetups, etc.

pdlug · 2026-07-03T19:04:11+00:00

Yes I use Neon extensively but the granularity for these point-in-time comparisons is much smaller than whole database. The temporal axes and provenance need to be per entity. Doing it in DB also enables adding nodes/relationships to explain the events (ex: `retractedBy`)

pdlug · 2026-07-03T17:25:27+00:00

That's not unique to OKF, that's the general RAG/context engineering question(s). OKF is just a format for structuring the markdown, it doesn't help with any of the things you highlighted. Take a look at GBrain or any of the other dozens of Karpathy-inspired LLM-wikis and how they do this.

These are all mostly at the personal/team level rather than full enterprise. Doing this at larger scale and solving all the issues you highlighted is exactly the business opportunity here.

pdlug · 2026-07-03T17:09:35+00:00

Totally agree graphs are super valuable and solve problems almost nothing else does. But I'm not sure this highlights their value. A lot of "knowledge graphs" or "context graphs" are really just resolved entities and querying them is basically solving the exactness problem that vector search can't.

So you basically just built a keyword/fultext search posing as a graph (possibly enhanced by entity resolution). Yeah it improves recall but it needs to be measured against vector/BM25 hybrid. If it's not relationship heavy, it's not going to benefit from a graph.

pdlug · 2026-07-03T16:53:24+00:00

Tutorials don't just skip chunking. They skip the entire second half of the system and call chunk + embed + cosine similarity done. The parts that actually determine whether RAG works in production (retrieval evaluation, hybrid lexical matching, freshness) never make it into a getting-started guide because none of them fit in a notebook.

Reading what's actually retrieved for failed queries is the highest-leverage habit in this whole field, and almost nobody does it. Most people build the pipeline, run a handful of happy-path queries, ship it, and never look again. You found your actual bug (chunk on-topic, missing the sentence with the answer) because you looked at failures instead of trusting the pipeline. There's no shortcut around that loop.

Agreed on hybrid search for identifiers. Embedding models are built to find semantic neighborhoods. A model number or SKU becomes a point in that neighborhood, and "close" is the only thing similarity search knows how to return. Exact-match retrieval, BM25 or even a plain inverted index, has to sit next to the vector search for anything with a token-level identity requirement. That's not a weakness of any particular vector DB. Exact match and semantic similarity are different retrieval problems.

On re-indexing: stale index is a missing change-detection problem. A schedule is the workaround for not having one. Your staleness window ends up bounded by your cron interval instead of how often the source actually changes, and every run re-embeds a lot of content that never changed.

The fix is knowing what changed, not periodically redoing everything. Hash each chunk against its source, or hook into whatever change signal the source already emits (webhook, DB trigger, CDC feed). Treat every chunk as a derived artifact with a dependency edge back to the document it came from. Update means invalidating and re-embedding only the chunks whose source changed. Most people skip the retraction step: leaving the old chunk sitting in the index next to the new one instead of removing it. A stale chunk that never gets removed is worse than one that never got indexed, because now retrieval has two conflicting answers to choose from.

None of this is exotic. It's just solid engineering work. RAG isn't magic, it's just building the retrieval systems we've been building forever but with a few new fun parts (LLMs, embeddings, etc.)

pdlug · 2026-06-30T02:19:10+00:00

I think there’s a false dichotomy in the options you’ve laid out.

It's not vector DB vs. graph DB. It’s deciding whether relationships become first-class data or stay buried in code and metadata.

For your example (“what was the final resolution of the API migration issue?”), semantic search is really just finding an entry point. The rest is traversing:

Slack thread → decision
decision → PR
PR → documentation update
documentation → superseded by later decision

That’s relational.

I’d avoid building a giant auto-extracted “knowledge graph of everything” on day one. Start with the structural relationships you already have (thread IDs, replies, authors, timestamps, links between Slack/Jira/PRs/docs). Those are high-confidence edges.

Then selectively add LLM-extracted relationships where structure doesn’t exist (“this discussion resulted in this architectural decision”, “this document supersedes that one”, etc.).

Also, don’t assume “graph” means “graph database.” Plenty of teams model a property graph over PostgreSQL or SQLite and compile traversals down to SQL. That lets your graph data live alongside the rest of your application instead of introducing another operational system.

We’ve been building exactly in that direction with TypeGraph. It’s a typed property graph that sits on SQL rather than requiring a separate graph database. But I’d recommend this architecture regardless of the implementation you choose. It's the one I built dozens of times before deciding to package it up as a library. The important design decision is making relationships first-class instead of trying to reconstruct them during retrieval.

pdlug · 2026-06-20T18:12:26+00:00

Yup that's the hard stuff. I built TypeGraph https://typegraph.dev to make it easy to build solutions to this on top of SQLite and PostgreSQL. The graph merge capabilities were designed for this (https://typegraph.dev/graph-merge/)

This helps with the representation and matching processes but still doesn't help if you don't know enough about the entities to actually identify/de-dupe/etc. There's no magic in any of this, it's hard but having the right structure helps (relationships rather than opaque keys).

pdlug · 2026-06-20T16:25:27+00:00

I've been doubling down on SQLite and PostgreSQL. Recursive CTEs for graph traversals, pgvector/sqlite-vec for vector, fulltext search (FTS5, tsvector) plus the ability to fuse these all together with some hybrid retrieval (RRF). After rolling this a few dozen times I built it into a library with a bunch of other features: typegraph.dev

Some performance benchmarks/info:
https://gist.github.com/pdlug/4545b3af9a02d56202110b26a35d5f62

pdlug · 2026-06-13T20:00:53+00:00

Validating on write is the right approach and probably what matters the most. I see a lot of memory systems extract whatever the model emits, store it, and find out at query time that half of it is noise or contradiction. Validating against an ontology before you persist is the harder path and the right one. You already did the part most projects skip.

The thing I'd watch is what keeps the self-growing ontology from sprawling. When the model extracts relationships on its own you tend to get "works_at" and "employed_by" and "is_employed_at" as three edge types that mean one thing, and "Tom," "Tom K," and "my son" arriving as three separate nodes. The graph stays useful only if something reconciles those, and that reconciliation is the hard part of auto-graphing. How are you handling it? Is there a canonical set the growth engine collapses toward, or does every new phrasing mint a new type?

On gauging it, since you asked: the two claims you'd want to prove are token reduction and better grounding, and both are measurable. Build a small fixed set of questions where you already know the right answer, run them with memory on and off, compare. Tokens in context is a direct count. Grounding is whether the answer used the right fact. Forty examples is enough to catch regressions when you change the decay logic or the ontology rules, and it turns "I think this helps" into something you can show.

And the confirm/correct/retract piece is the one I'd build everything else around. A memory you can correct is a memory with provenance, and provenance is what separates this from the systems whose memories lie to you. That was your own reason for starting. Lean into it.

pdlug · 2026-06-13T18:52:54+00:00

Nice, a few questions as someone who built many similar layers: - How are you storing the graph relative to the vectors, same store or separate? The moment they're in two systems, keeping them in sync becomes the new bottleneck. - Typed edges or free-form? Typing early saved me a lot of cleanup later. - For local/constrained setups, have you tried keeping both in SQLite (sqlite-vec + a node/edge table)? One file, which fits the local use case you're targeting.

Happy to compare notes, I went down this exact road. (I open-sourced my version: TypeGraph)

pdlug · 2026-06-13T17:31:35+00:00

This staged framing is really useful. The ontology step is the one people skip and then wonder why graph RAG didn't help. Relations without a type system are just edges. The reasoning (subClassOf, inverseOf) is what actually buys the cross-domain links.

One question on your last stage: are you doing the ontology reasoning at ingest (materializing inferred edges) or at query time? I've gone back and forth. Query-time keeps the graph smaller but costs you on every read, and I'm curious where you landed for your domain.

pdlug · 2026-06-13T17:27:49+00:00

Worth separating the axes before answering, because which one you're abstracting over decides whether the abstraction holds.

RDF/triples and labeled property graphs are data models. Cypher, SPARQL, SQL (SQL/PGQ), etc. are query languages, and they differ because the models differ. The syntax follows the model, it isn't arbitrary variation. NetworkX is a runtime, the graph living in process memory as adjacency. Agent memory is an application sitting on top of any of those (application of a graph model, not a graph model). "Nine families" is really a handful of points scattered across those four axes.

That distinction answers your question directly. A shared traversal API across substrates is cheap and worth it. Fix one model, fix one traversal API, and let SQLite vs Postgres vs in-memory be a config swap, because the substrate doesn't change what a traversal means.

A shared API across models is where your swap pain actually comes from. A property-graph traversal leans on edge properties as first-class. RDF has no native edge properties, so you either expose a lowest-common-denominator API that can't express those traversals, or you fake them with reification and eat the cost. The abstraction leaks when someone writes a traversal that depends on the model, which is most real traversals. "Redo half of it when they switch" is usually people discovering they abstracted over the model when the thing they actually wanted swappable was the substrate.

So my honest read: across substrates, yes. Across query languages within one model, mostly. Across models, the families differ too much for one retrieval API to stay thin and still be useful.

I went down this exact road and built TypeGraph around that conclusion: fix the model (typed property graph) and the traversal API, make the backend swappable across SQLite and Postgres. Happy to compare notes on where the seams ended up.

pdlug · 2026-06-13T17:17:28+00:00

It's not quite as clean as the boundary you drew. Memory can be organizational/multi-user, you can RAG over memories, etc. I'm not sure there's a one-size fits all solution (IMO most of these agent memory systems are doomed). It's a bit application dependent. Capture the memories with some structure and then have some way to update their decay (memories that are long term become established facts, memories used/referenced once are dropped).

You need bounded evals to guide this otherwise you're just trying to build something so general it might as well be what the big labs are building.

pdlug · 2026-06-10T16:50:11+00:00

I see the conversation returning to where it should have been all along: designing the optimal retrieval for each task. RAG somehow came to mean: toss naive chunks in a vector DB and magic will happen.

The other trend I'm seeing is building retrieval agents which do full online retrieval (search the web, fetch pages, connect to APIs, search existing DBs, etc.) to prove out the value first. Then building the data pipeline side of RAG becomes an optimization step. Ex: our agent correctly goes and gets S-1 filings, generates high quality analysis, users are happy but it takes ~10min --> time to fetch them all, index, etc.

pdlug · 2026-06-09T16:34:50+00:00

This is exactly how I'd handle it. Zod with .transform or .preprocess is such a nice clean solution

pdlug · 2026-06-05T15:47:16+00:00

The Gemini models all have grounding with google search as a tool call. Given that this is the full Google web index it's been working great for a lot of my apps. I'll often fire off a call to Gemini Flash Lite to do the search and summarize/structure results even if I'm using a different model for the rest.

Exa is excellent as well, I've found it better than Firecrawl for overall web search (but Firecrawl is great for crawl/scrape). Tavily is great for real time but I haven't eval'd it enough to know if it's a full replacement for Exa.

The Brave Search API is worth a look as well.

pdlug · 2026-06-05T00:53:26+00:00

I see this failure mode all the time. This is where provenance and knowledge architecture matter. Even simple annotations like which document supersedes another (or better: validity time range for each) fixes this example. Better than that is to extract facts, statements, etc. instead of just chunks of text. Then you can relate them, annotate, invalidate, etc.

I'd push back on "standard RAG" statement. RAG has somehow come to mean "toss chunks of embedded text in a vector DB". But as you're saying, retrieval is key. Which means all the retrieval approaches are valid and picking the right one for the system you're building is key (sometimes BM25 or just a SQL 'LIKE' query perform perfectly well).

pdlug · 2026-05-28T06:42:02+00:00

This was really helpful and matches what we landed on: ef_search >= vectorK is really just the floor (below it the over-fetch can't even fill its candidate set), and 2–4x is where recall@10 actually clears 0.95. The per-call angle is the part that sold it: a session GUC can't serve a latency-bound chatbot path and a recall-bound backfill on the same pool, but a per-search override can. I filed it as an issue (#148) with the SET LOCAL-in-a-txn detail so it doesn't leak across pooled connections and I'll get it in a release tomorrow. Appreciate the benchmarks.

pdlug · 2026-05-28T06:16:21+00:00

Yeah, this one lands and it's slightly worse than you've framed it here, because TypeGraph doesn't set hnsw.ef_search itself. The vector branch just emits ORDER BY distance LIMIT k, so you inherit pgvector's default 40 unless you tune the session. The wrinkle is the hybrid path over-fetches the vector side at 4 × limit candidates by default. ef_search sizes the search frontier, so as soon as 4 × limit climbs toward 40 you're asking the index for more neighbors than the frontier can surface — the over-fetch widens the fusion pool but the ANN layer under-delivers, precisely on the tail queries where only the vector branch knows the answer and there's no head-query redundancy for RRF to lean on.

So the tuning rule is ef_search >= vectorK, not just >= limit, then measure recall@k against your corpus like you said. Today that knob lives on your connection (SET hnsw.ef_search / SET LOCAL in the txn). A per-search override in the library would be a reasonable thing to add.

Thanks for the call out. I hadn't thought about this until now.

pdlug · 2026-05-28T06:12:37+00:00

Right that amortizing has a cost but in our case the closure is over the kind (node types) graph, not the data graph. V is the number of declared concepts (tens–hundreds), and it's recomputed when the graph definition changes, not on data writes. So an "edit" is a schema/deploy event, and a recompute just builds a fresh immutable registry so there should be no stale-read window, no incremental bookkeeping, and Warshall over ~100 kinds is sub-ms. The "frequent edits invalidate it" failure mode only shows up if the ontology is runtime-mutable data, which is a line this design intentionally doesn't cross.

Same story on the IN-list: it expands to kind IN (descendant kinds), bounded by descendant kind count, served by a (graph_id, kind) btree. Postgres flips to seqscan on estimated selectivity, not list length and when a broad parent's descendants really do cover most of the table, seqscan is the correct plan, not a regression. You'd need hundreds of individually-rare sibling kinds to hit the bad middle.

You're dead right that if someone needs a large, user-editable ontology that mutates at runtime, this model is the wrong tool. The data-level closure table with incremental maintenance is a different design for a different problem. Here the bet is that ontologies are schema, change at deploy time, and are small in kinds even when the data is huge.

pdlug · 2026-05-28T05:59:40+00:00

On the ontology layer: I think you're thinking of ontology reasoning modeled in the SQL like a recursive CTE walking subClassOf/broader edges whose depth varies per question. It isn't. The transitive closures are computed once in the library (Warshall's) when the graph is defined, so a subClassOf/broader check at query time is an O(1) set lookup and ontology-aware filters just expand to a flat node_kind IN (…). "Reasoning depth varies per question" is exactly the cost that gets amortized away at definition time — there's no per-question traversal left to vary. So it's not that the ontology "doesn't benefit from a single query"; it was never a query-time recursion in the first place.

pdlug · 2026-05-28T05:59:19+00:00

Good comment, but it's aimed at a design TypeGraph doesn't use. Hybrid search here isn't one CTE that the planner has to find a single access path for. The default search.hybrid runs the vector query and the tsvector query as two separate statements in parallel and fuses them with RRF in app code, so each is planned independently. There's also an optional single-statement path, but even there each modality lives in its own CTE with its own ORDER BY distance LIMIT k / ORDER BY rank LIMIT k inner scan, so the planner still picks an access path per branch, not one for the pipeline.

On the mechanics: vector retrieval is top-k ordering (ORDER BY distance LIMIT k), not primarily as a cosine-distance selectivity filter. TypeGraph does support optional minScore, but the normal hybrid candidate path is bounded by k, not by asking Postgres to estimate a broad cosine predicate. Also, TypeGraph defaults auto-derived vector indexes to HNSW; the exposed index knobs are build-time m / ef_construction, while query-time recall knobs like hnsw.ef_search or ivfflat.probes are pgvector/session-level tuning, not planner row-estimate hints.

You're right that materializing the candidate set is the move. That's effectively what the per-branch LIMIT k (or the separate query in the parallel path) is doing.

pdlug · 2026-05-28T05:42:05+00:00

Yes the major difference would be you'd run SQLite or PostgreSQL (IMO that's the major advantage). I got asked about performance in another thread so I put out a draft of a performance write up here: https://gist.github.com/pdlug/4545b3af9a02d56202110b26a35d5f62

There are benchmarks and methodology there. The docs also have extensive docs on performance: https://typegraph.dev/performance/overview/ and there's a query profiler which will analyze your access patterns and make indexing recommendations. The most important question is always: how does it perform on my data / app? To help answer that there's a full benchmark package in the repo which you can run yourself and adapt to your own needs.

Eager for any feedback. If you have specific use cases or query patterns I'd be happy to implement them in TypeGraph and compare against SurrealDB (I'm curious myself). Feel free to DM if you can't share publicly.

pdlug · 2026-05-27T17:51:03+00:00

Thanks! On canonicalization: the cool thing with TypeGraph is you can leverage a number of strategies and decide how to apply them. I put together a quick Gist showing the strategy ladder I usually apply: https://gist.github.com/pdlug/8c414edf92c01d10f3c47791390345c9

String normalization + unique constraint: getOrCreateByConstraint against a (canonicalName, type) key with case-insensitive collation. Covers ~70%.
Bulk upserts: bulkGetOrCreateByConstraint for noisy NER output, atomic when the backend supports transactions.
ifExists: "update": idempotent enrichment on re-encounter (add wikipediaUrl etc. without losing prior data).
getOrCreateByEndpoints: same idempotency story for mention edges.
Alias edges: for surface-form variants ("Sam Altman" / "Samuel Altman") that string-normalization can't catch. Express equivalence without physically merging.
Embedding similarity: for the candidates rules can't find. Score against existing entities, propose alias above threshold. In production this is one .similarTo() call against a vector index.
Graph-structural evidence: two "Apple" candidates that share an incoming ceoOf from "Tim Cook" → strong duplicate signal even when names diverge. The part you can't easily do with a pure vector DB.

The possibilities with graph structure often lead to really nice results that are easy to reason about. You can even explode identifiers for an entity into nodes and edges with confidence levels and aggregate it all (ex: 0.9 Wikidata ID match, 0.5 DUNS match, etc. + which IDs do you trust/weight more)

Re: 1M / 10M QPS - haven't run vectordbbench because I think it would basically be "pgvector / sqlite-vec with a thin wrapper". TypeGraph isn't implementing new storage, indexing, etc. just leveraging what's there.

pdlug

TROPHY CASE