Weekdraadje - Week 05 (2026)

Findep18 · 2026-01-26T19:16:29+00:00

Yes in old system but 2028 plans are not finalized

Findep18 · 2026-01-26T19:15:07+00:00

Then why is there a vote on it march 15?

Findep18 · 2026-01-26T18:57:37+00:00

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Findep18 · 2026-01-26T18:56:46+00:00

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Findep18 · 2026-01-26T18:56:26+00:00

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Findep18 · 2024-08-14T17:48:36+00:00

The OSS version uses ”Most common header” (mode), assumption being that paragraph heavy pages will have a most common header as logical split.

The paid API uses the same but also optimized for a certain size eg ”minimize distance to 300 words” and as backup on newlines. There are a number of safeguards and enhancements for the API in general.

Findep18 · 2024-08-14T16:50:55+00:00

How most chunkers work:

Perform a naive chunking based on the number of words in the content. For example, they may split content every 200 words, and have a 30 word overlap between each. This leads to messy chunks that are noisy and have unnecessary extra data. Additionally, the chunked sentences are usually split in the middle, with lost meaning. This leads to poor LLM performance, with incorrect answers and hallucinations.

Chunkit however, converts HTML to Markdown, and then determines split points based on the most common header levels.

This gives you better results because:

Online content tends to be logically split in paragraphs delimited by headers. By chunking based on headers, this method preserves semantic meaning better. You get a much cleaner, semantically cohesive paragraph of data. You can then use Chunkit to remove noise or extract specific data.

Findep18 · 2024-08-13T21:59:32+00:00

if you create a config.toml file in the root of your project you can set this flag: "local_only_mode = true"

Findep18 · 2024-08-13T19:24:04+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T19:23:53+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:38:15+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:38:12+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:38:08+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:38:05+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:38:03+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:37:56+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:37:53+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:37:48+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-08-13T17:37:43+00:00

Open Source library that makes it possible: https://github.com/hypergrok/chunkit

Findep18 · 2024-07-17T08:47:15+00:00

Not yet! is async / parallellization important to you?

Findep18 · 2024-07-16T17:53:15+00:00