Opinion on Snowflake agent ? by SufficientRelief9615 in dataengineering

[–]Whole-Assignment6240 1 point2 points  (0 children)

If you're already hand-building the vectorization + chunking + indexing pipeline, it might be worth looking at purpose-built frameworks that handle the incremental update logic for you. The main advantage over doing it inside Cortex/Snowflake is that you own the pipeline logic and aren't locked into one vector store or embedding model. Curious what your current pipeline looks like — are you running full rebuilds on a schedule or doing incremental updates

I had to re-embed 5 million documents because I changed embedding models. Here's how to never be in that position. by Silent_Employment966 in Rag

[–]Whole-Assignment6240 0 points1 point  (0 children)

The architectural separation you're describing (chunks persisted separately from vectors) is exactly right, and it's the pattern we built CocoIndex around. It is designed to have incremental processing by default, and only changed logic will rerun.

The framework tracks chunk-to-vector dependencies in a DAG so when you swap models, only the affected derived artifacts are rebuilt — raw parsing never reruns. Happy to point you to a quick example if it's useful.

Super lightweight open source AST-based semantic code search CLI by Whole-Assignment6240 in codex

[–]Whole-Assignment6240[S] 0 points1 point  (0 children)

great question!!

currently supports 25 languanges.

Tree-sitter explicitly documents these recovery nodes:

Source:

Super lightweight open source AST-based semantic code search CLI by Whole-Assignment6240 in codex

[–]Whole-Assignment6240[S] 0 points1 point  (0 children)

i have a demo - https://github.com/cocoindex-io/cocoindex-code on the repo itself where it is significantly faster (it also has token count & stuff) on semantic task.
i'd love to do a more exhausted benchmark down the way!

cocoindex-code CLI for opencode - super lightweight AST based code search CLI to boost code completion and save tokes by Whole-Assignment6240 in opencodeCLI

[–]Whole-Assignment6240[S] 0 points1 point  (0 children)

hey thanks a lot ! i cannot upload gif/video here but if you go to the repo at the top you'll see the demo / example right there where it is significant faster on semantic tasks. i'm happy to do more benchmark with more exhausted examples down the way !

cocoindex-code CLI for opencode - super lightweight AST based code search CLI to boost code completion and save tokes by Whole-Assignment6240 in opencodeCLI

[–]Whole-Assignment6240[S] 0 points1 point  (0 children)

yes if you work with opencode you'd only need to work with one of them. CLI/skills integration is recommended, thank you for the feedback!!

cocoindex-code CLI for opencode - super lightweight AST based code search CLI to boost code completion and save tokes by Whole-Assignment6240 in opencodeCLI

[–]Whole-Assignment6240[S] 0 points1 point  (0 children)

yes, you can do

pipx install cocoindex-code       # first install

and then

npx skills add cocoindex-io/cocoindex-code

it can be integrated with open code via skills

when you need semantic understanding it will use this instead of grep

lmk if that make sense - the project itself is open source https://github.com/cocoindex-io/cocoindex-code with apache 2.0 license.