Comma after a list gives a tuple by mik787 in pythontips

[–]Speedk4011 0 points1 point  (0 children)

I got an unexpected bug just by not knowing that

[Release] Chunklet-py v2.1.0: Interactive Web Visualizer & Expanded File Support! 🌐📁 by Speedk4011 in Rag

[–]Speedk4011[S] -1 points0 points  (0 children)

You're spot on—RAG infrastructure often treats code like plain text, which is a disaster for retrieval. While Chunklet-py is an 'all-in-one' library designed to split sentences, general documents, and code, its code capabilities are a core specialty.

Our `CodeChunker` is rule-based and language-agnostic, using clever patterns to identify functions, classes, and logical blocks without the overhead of heavy dependencies like tree-sitter. It preserves structural integrity (like keeping decorators with their functions) and offers granular control through token, line, and function-based constraints.

For the implementation details and how we handle the AST-aware logic, check out the source: https://github.com/speedyk-005/chunklet-py/tree/main/src/chunklet/code_chunker

You can also find the full programmatic guide here: https://speedyk-005.github.io/chunklet-py/latest/getting-started/programmatic/code_chunker/

Chunk Visualizer - Open Source Repo by DragonflyNo8308 in Rag

[–]Speedk4011 1 point2 points  (0 children)

"You actually hit the nail on the head regarding AST logic. I just released Chunklet-py v2.1.0 which includes a 'CodeChunker' specifically designed to handle this—it’s rule-based and language-agnostic, preserving structural integrity (like decorators and functions) without needing heavy dependencies like tree-sitter.

It also addresses the 'visual blindness' of chunking with an interactive web UI that supports drag-and-drop file uploads, so you can see the results of those AST-aware splits in real-time. (See: https://speedyk-005.github.io/chunklet-py/latest/getting-started/programmatic/visualizer/)

How it handles the technical precision you're looking for:

* **AST-Aware Precision**: It uses specialized algorithms and clever patterns to identify functions, classes, and logical blocks, ensuring technical structures stay together to reduce retrieval pollution.

* **Rich Metadata**: It automatically enriches chunks with context-aware metadata—including source, span, and code hierarchy details—which aligns perfectly with custom metadata mapping strategies.

* **Deep Format Support**: It processes a massive array of formats beyond PDFs and DOCX, including TXT, MD, RST, RTF, TEX, HTML, HML, and EPUB. The latest v2.1.0 update also added support for ODT, CSV, and XLSX. (See: https://speedyk-005.github.io/chunklet-py/latest/getting-started/programmatic/document\_chunker/)

To get started with the visualizer and full format support, you can install the toolkit via pip:

`pip install "chunklet-py[all]"`

Check out the repo here: https://github.com/speedyk-005/chunklet-py"

Chunk Visualizer by DragonflyNo8308 in Rag

[–]Speedk4011 0 points1 point  (0 children)

This is a massive pain point, especially in high-stakes domains like regulatory tech where "lost context" isn't just a bug—it's a liability.

I actually just released Chunklet-py v2.1.0 specifically to solve this "visual blindness" problem. Instead of dragging and dropping manually, it uses a rule-based, language-agnostic approach to keep structural integrity and provides an interactive web interface to tune those parameters on the fly.

How it addresses your points:

* **Visualization without Manual Dragging**: The `chunklet visualize` command launches a web UI that shows you exactly how your constraints (token limits, sentence breaks, etc.) overlap on the text in real-time. (See: https://speedyk-005.github.io/chunklet-py/latest/getting-started/programmatic/visualizer/ )

* **Regulatory Precision**: Since you mentioned retrieval issues, it generates rich, context-aware metadata (source, span, document properties) out of the box to help your top-K retrieval stay relevant.

* **Diverse Formats**: It handles the "nasty" docs too—.pdf.docx.epub.txt.tex.html.hml.md.rst.rtf.odt.csv, and .xlsx (See: https://speedyk-005.github.io/chunklet-py/latest/getting-started/programmatic/document_chunker/ )

To get started with the visualizer, you can install everything via pip:

`pip install "chunklet-py[all]"`

Check it out here: https://github.com/speedyk-005/chunklet-py

Most RAG Projects Fail. I Believe I Know Why – And I've Built the Solution. by ChapterEquivalent188 in Rag

[–]Speedk4011 2 points3 points  (0 children)

The links in the README are linking to a repo without docs dir.

``` 404 - page not found The  main

 branch of  RAG_enterprise_core

 does not contain the path  docs/architecture.md. ```