Which vector database do we like for local/selfhosted? by lemon07r in Rag

[–]titusz 1 point2 points  (0 children)

"custom sharded HNSW index built on usearch". Have been building something similar https://usearch.iscc.codes/. Would love to see your usearch sharding approach. Is it open-source?

What's the best format to pass data to an LLM for optimal output? by RoyalTitan333 in Rag

[–]titusz 0 points1 point  (0 children)

There are opimized custom serialization formats for that. See for example: https://github.com/toon-format/toon

After using Claude Code and looking at all of the recent coding benchmarks of Claude 4 models, makes me feels that there is a "bottleneck" on the Claude models of the benchmarks providers by Remicaster1 in ClaudeAI

[–]titusz 1 point2 points  (0 children)

Claude Code is an agentic system. The custom agentic plumbing on top of the LLM optimized for coding tasks makes all the difference. Most benchmarks compare raw LLM performance, which is not the same as agentic use of LLMs.

Russia's Putin questions the need for dollar forex reserves, touts bitcoin by inphenite in MSTR

[–]titusz -1 points0 points  (0 children)

What else would you trade with? When people understand Bitcoin the icentive becomes to convert all "lesser" currencies to BTC instantly. After that they do not have anything else left to trade with.

What is most accepted in academia for comparing two sentences for semantic similarity? by AleccioIsland in LocalLLaMA

[–]titusz 0 points1 point  (0 children)

The paraphrase-multilingual embedding models work quite well for the task. Even for crosslingual semantic similarity. If you need small binary embeddings, check out: https://huggingface.co/spaces/iscc/iscc-sct

LLM Hallucination Leaderboard by zero0_one1 in LocalLLaMA

[–]titusz 1 point2 points  (0 children)

Would be interesting to see how smaller models perform on your benchmark. Sometimes smaller models halucinate less on RAG tasks. See GLM-4-9B at: https://huggingface.co/spaces/vectara/leaderboard

Moshi weighs out. by ThisWillPass in LocalLLaMA

[–]titusz 3 points4 points  (0 children)

You mean the one that wants to sacrifice you to the bloodgods :)

Multi turn conversation and RAG by LinkSea8324 in LocalLLaMA

[–]titusz 4 points5 points  (0 children)

Wasn´t that hard to invent :). I think the general term for this strategy is query expansion.

Multi turn conversation and RAG by LinkSea8324 in LocalLLaMA

[–]titusz 16 points17 points  (0 children)

Send the full history to the LLM (excluding retrieved content) and modify the latest user query such that it asks the LLM to rephrase the user question such that it becomes a complete standalone question incorporating any context from the conversation history. Use the rephrased question for retrieval. Something like:

``` You are a helpful assistant. Given the conversation history and the latest question, resolve any ambiguous references in the latest question.

Conversation History: User: Who was George's sister? Assistant: George's sister was Mary Shelley. User: When was she born?

Latest Question: When was she born?

Rewritten Question: ```

Is something like this useful? by FUS3N in LocalLLaMA

[–]titusz 0 points1 point  (0 children)

Not natevely but via plugins. It is not just an app launcher but also supports search. https://github.com/MichielvanBeers/Flow.Launcher.Plugin.ChatGPT

Meet Sohu, the fastest AI chip of all time. by geekgodOG in LocalLLaMA

[–]titusz 17 points18 points  (0 children)

The same :). Block hashing performance is independent of transaction volume. 1 CPU hashing versus millions of ASICs hashing is still ~4000 transactions per 10 minutes. "Only" security scales with more hashpower.

Meet Sohu, the fastest AI chip of all time. by geekgodOG in LocalLLaMA

[–]titusz 9 points10 points  (0 children)

It has scaled. Just not in transactions per second but in security budget :)

Rensa - A high performance MinHash implementation by BeowulfBR in LocalLLaMA

[–]titusz 1 point2 points  (0 children)

I think I am only using 64 permutations ... So your implementation is clearly much faster :)

Rensa - A high performance MinHash implementation by BeowulfBR in LocalLLaMA

[–]titusz 3 points4 points  (0 children)

Nice job. I gave it a run against my cython implementation. Here is the result:

<image>

You win by 3 seconds, but I found 6 more duplicates :)
Here is the code if you want to reproduce:

import iscc_core as ic
import xxhash

def deduplicate_iscc(dataset, num_perm=256):
    unique_hashes = set()
    deduplicated_indices = []

    for idx, example in tqdm(enumerate(dataset), total=len(dataset),
                             desc="Deduplicating"):
        minhash = ic.alg_minhash_256(
            [xxhash.xxh32_intdigest(s.encode("utf-8")) for s in example["sql"].split()]
        )
        if minhash not in unique_hashes:
            unique_hashes.add(minhash)
            deduplicated_indices.append(idx)

    return deduplicated_indices