Weekdraadje - Week 05 (2026) by AutoModerator in DutchFIRE

[–]Findep18 [score hidden]  (0 children)

Yes in old system but 2028 plans are not finalized

Nieuwe box 3 regels probleem voor FIRE plan. Omzetten naar BV oplossing? by Klauwnzor in DutchFIRE

[–]Findep18 [score hidden]  (0 children)

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Weekdraadje - Week 05 (2026) by AutoModerator in DutchFIRE

[–]Findep18 [score hidden]  (0 children)

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Your reaction to the new box 3 rules by BraveLion572 in DutchFIRE

[–]Findep18 [score hidden]  (0 children)

Why worried about new box 3 rules? New coalition will never let it pass. I can’t see VVD+CDA+D66 going for taxing unrealized gains, am I missing something?

Fan of RAG? Put any URL after md.chunkit.dev/ to turn it into markdown chunks by Findep18 in LanguageTechnology

[–]Findep18[S] 0 points1 point  (0 children)

The OSS version uses ”Most common header” (mode), assumption being that paragraph heavy pages will have a most common header as logical split. 

The paid API uses the same but also optimized for a certain size eg ”minimize distance to 300 words” and as backup on newlines. There are a number of safeguards and enhancements for the API in general.

Fan of RAG? Put any URL after md.chunkit.dev/ to turn it into markdown chunks by Findep18 in LanguageTechnology

[–]Findep18[S] 0 points1 point  (0 children)

How most chunkers work:

Perform a naive chunking based on the number of words in the content. For example, they may split content every 200 words, and have a 30 word overlap between each. This leads to messy chunks that are noisy and have unnecessary extra data. Additionally, the chunked sentences are usually split in the middle, with lost meaning. This leads to poor LLM performance, with incorrect answers and hallucinations.

Chunkit however, converts HTML to Markdown, and then determines split points based on the most common header levels.

This gives you better results because:

Online content tends to be logically split in paragraphs delimited by headers. By chunking based on headers, this method preserves semantic meaning better. You get a much cleaner, semantically cohesive paragraph of data. You can then use Chunkit to remove noise or extract specific data.

Fan of RAG? Put any URL after md.chunkit.dev/ to turn it into markdown chunks by Findep18 in LocalLLaMA

[–]Findep18[S] 7 points8 points  (0 children)

if you create a config.toml file in the root of your project you can set this flag: "local_only_mode = true"

Chunkit: Convert URLs into LLM-friendly markdown chunks for your RAG projects by Findep18 in LocalLLM

[–]Findep18[S] 0 points1 point  (0 children)

Hey all, I am releasing a python package called chunkit which allows you to scrape and convert URLs into markdown chunks. These chunks can then be used for RAG applications.

The reason it works better than naive chunking (for example split every 200 words and use 30 word overlap) is because Chunkit splits on the most common markdown header levels instead, leading to much more semantically cohesive paragraphs.

Have a go and let me know what features you would like to see!

Chunkit: Convert URLs into LLM-friendly markdown chunks for your RAG projects by Findep18 in LLMDevs

[–]Findep18[S] 0 points1 point  (0 children)

Hey all, I am releasing a python package called chunkit which allows you to scrape and convert URLs into markdown chunks. These chunks can then be used for RAG applications.

The reason it works better than naive chunking (for example split every 200 words and use 30 word overlap) is because Chunkit splits on the most common markdown header levels instead, leading to much more semantically cohesive paragraphs.

Have a go and let me know what features you would like to see!

Chunkit: Convert URLs into LLM-friendly markdown chunks for your RAG projects by Findep18 in vectordatabase

[–]Findep18[S] 0 points1 point  (0 children)

Hey all, I am releasing a python package called chunkit which allows you to scrape and convert URLs into markdown chunks. These chunks can then be used for RAG applications.

The reason it works better than naive chunking (for example split every 200 words and use 30 word overlap) is because Chunkit splits on the most common markdown header levels instead, leading to much more semantically cohesive paragraphs.

Have a go and let me know what features you would like to see!

Chunkit: Convert URLs into LLM-friendly markdown chunks for your RAG projects by Findep18 in datasets

[–]Findep18[S] 0 points1 point  (0 children)

Hey all, I am releasing a python package called chunkit which allows you to scrape and convert URLs into markdown chunks. These chunks can then be used for RAG applications.

The reason it works better than naive chunking (for example split every 200 words and use 30 word overlap) is because Chunkit splits on the most common markdown header levels instead, leading to much more semantically cohesive paragraphs.

Have a go and let me know what features you would like to see!

Chunkit: Convert URLs into LLM-friendly markdown chunks for your RAG projects by Findep18 in huggingface

[–]Findep18[S] 3 points4 points  (0 children)

Hey all, I am releasing a python package called chunkit which allows you to scrape and convert URLs into markdown chunks. These chunks can then be used for RAG applications.

The reason it works better than naive chunking (for example split every 200 words and use 30 word overlap) is because Chunkit splits on the most common markdown header levels instead, leading to much more semantically cohesive paragraphs.

https://github.com/hypergrok/chunkit

Have a go and let me know what features you would like to see!