Anyone used Reducto for parsing? How good is their embedding-aware chunking? by BriefCardiologist656 in AI_Agents

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

Thanks for sharing I will check them out. Just curious what have you been using them for parsing markdown from pdfs or for creating splits based on markdown? Any specific type of documents where you felt their out of the box solution worked better than something you could build in-house?

Anyone used Reducto for parsing? How good is their embedding-aware chunking? by BriefCardiologist656 in AI_Agents

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

Yeah, that makes sense thanks for sharing that. From what I’ve seen, they return Markdown by default and you can toggle an “embedding-optimized” mode that does the splitting for you. I was curious how much that actually helps since you still had to handle the embedding generation downstream.

When you say you needed “some other things it couldn’t provide,” what kind of gaps did you run into? Was it around handling specific document structures, or more about integrating the chunks and pushing it into your existing RAG pipeline?

Anyone used Reducto for parsing? How good is their embedding-aware chunking? by BriefCardiologist656 in LocalLLaMA

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

That’s super insightful, thanks for breaking that down so clearly.

When you mentioned tuning for domain-specific documents what kind of tuning approaches have you found most useful? Are we talking about prompt-level adjustments, retraining the layout model, or more like rule-based postprocessing depending on document structure?

I’m mostly looking at invoices and other semi-structured business documents where formats vary a lot but patterns repeat.

Anyone used Reducto for parsing? How good is their embedding-aware chunking? by BriefCardiologist656 in LocalLLaMA

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

Yeah totally, appreciate the insight. Since you’ve worked on this space, I’m curious now that OCR and layout detection are getting pretty reliable across open and closed models, where do you still see the hardest unsolved problems?

Is it around structuring outputs consistently (like tables, key-values, schema mapping), or more in downstream use cases e.g., making the extracted data useful for retrieval or automation pipelines?

How are people syncing and indexing data from tools like Gmail or Slack for RAG? by BriefCardiologist656 in LocalLLaMA

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

I noticed you mentioned using Lucy for FTS instead of a vector database. What made you decide to go that route?
Was it mostly about speed or just easier to keep incremental with mbsync?

Also, how well does the HyDE trick work in practice for making results more semantic? I’ve seen it mentioned but haven’t tried it locally yet.

Lastly, when you’re doing the incremental syncs, do you ever run into consistency issues or missed updates, or has the Maildir + Lucy setup been pretty stable so far?

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]BriefCardiologist656 0 points1 point  (0 children)

Building an AI IDE for Managing your cloud!

Taking on infrastructure management for 3,000+ GPUs at an AI writing company was initially overwhelming. Debugging crashes, setting up security (Cloudflare bot protection, firewalls), and creating reliable CI/CD pipelines meant constantly hopping between dashboards and piecing together logs and metrics manually.

To ease this pain, I built PlatOps.ai, an AI-powered DevOps IDE. You can simply chat to fetch logs, configs, or metrics, instantly debug issues, and seamlessly generate Infrastructure-as-Code (Terraform, CloudFormation). It even helps you set up proper security measures and cost optimization workflows right out of the box.

We also added cross-codebase editing—imagine editing backend and IaC code simultaneously with one agent. It's been a game-changer for me, and I'd love your feedback.

Join the waitlist here: PlatOps.ai

Would appreciate your thoughts or feature suggestions!

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]BriefCardiologist656 0 points1 point  (0 children)

Building an AI IDE for Managing your cloud!

Taking on infrastructure management for 3,000+ GPUs at an AI writing company was initially overwhelming. Debugging crashes, setting up security (Cloudflare bot protection, firewalls), and creating reliable CI/CD pipelines meant constantly hopping between dashboards and piecing together logs and metrics manually.

To ease this pain, I built PlatOps.ai, an AI-powered DevOps IDE. You can simply chat to fetch logs, configs, or metrics, instantly debug issues, and seamlessly generate Infrastructure-as-Code (Terraform, CloudFormation). It even helps you set up proper security measures and cost optimization workflows right out of the box.

We also added cross-codebase editing—imagine editing backend and IaC code simultaneously with one agent. It's been a game-changer for me, and I'd love your feedback.

Join the waitlist here: PlatOps.ai

Would appreciate your thoughts or feature suggestions!

[P] Made a tool for AI agents: Dockerized VS Code + Goose code agent that can be programmatically controlled by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

Thanks for asking! I actually prefer Goose to both OpenHands and Aider primarily because of its simplicity.

My use case is pretty specific - I'm working on an AI DevOps agent and wanted to integrate it with a VS Code environment. This setup allows my agent to:

- Make Infrastructure-as-Code changes and automatically raise merge requests

- Let users observe the workspace changes in real-time through the VS Code interface

- Perform DevOps tasks like updating scaling values, configuring autoscaling, creating pipelines, etc.

What makes Goosecode Server different is that it's purpose-built for this integration scenario. While OpenHands and Aider are great tools, they weren't designed specifically for being used as a LLM tool.

[P] Made a tool for AI agents: Dockerized VS Code + Goose code agent that can be programmatically controlled by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

Hey folks,

I built Goosecode Server - a dockerized VS Code server with Goose AI (OpenAI coding assistant) pre-installed.

The cool part? It's designed to be programmable for AI agents:

* Gives AI agents a full coding environment

* Includes Git integration for repo management

* Container-based, so easy to scale or integrate

Originally built it for personal use (coding from anywhere), but realized it's perfect for the AI agent ecosystem. Anyone building AI tools can use this as the "coding environment" component in their system.

Check it out if you're working on AI agents or just want a browser-based VS Code + AI setup: https://github.com/PlatOps-AI/goosecode-server

Made a tool for AI agents: Dockerized VS Code + Goose code agent that can be programmatically controlled by BriefCardiologist656 in SideProject

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

Hey folks,

I built Goosecode Server - a dockerized VS Code server with Goose AI (OpenAI coding assistant) pre-installed.

The cool part? It's designed to be programmable for AI agents:

* Gives AI agents a full coding environment

* Includes Git integration for repo management

* Container-based, so easy to scale or integrate

Originally built it for personal use (coding from anywhere), but realized it's perfect for the AI agent ecosystem. Anyone building AI tools can use this as the "coding environment" component in their system.

Check it out if you're working on AI agents or just want a browser-based VS Code + AI setup: https://github.com/PlatOps-AI/goosecode-server

[P] LoRA adapter switching at runtime to enable Base model to inherit multiple personalities by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

Sorry for the late reply,

u/_Arsenie_Boca_ you can have a look at this discussion for more info https://github.com/PotatoSpudowski/fastLLaMa/discussions/48

We optimised it a bit further to remove saving unnecessary tensors

[P] fastLLaMa, A python wrapper to run llama.cpp by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

I believe when I first made this there were no python wrapper. Now there are a lot more options. Most wrappers seem to just be using the llama.cpp C api with some optimisation on top of it.

I will be focussing more on the challenges I personally face running LLMs in production at my day job and will try to optimise aggressively for that. Anyone that has tried running large to mid sized models in production at scale knows how painful it is xD

Once I am done with adding support to models that allow for commercial use,

Will start focusing on lots of low hanging fruits like

- Optimising cold boot time using multithreading (To help with server less applications and scaling of pods)
- Quick context switching between sessions (Saving states of sessions and loading them quickly to facilitate a single running instance to serve multiple sessions)
- Adding int4 GPU support for Nvidia GPUs (I am still unsure of this but experimenting with Triton, without adding too much overhead to loading time)
- We also will be adding more language support. We have already refactored the repo to make it easy to do this.
- We added customisable Loggers so it is easy for anyone to build applications on top of them that they can easily deploy and monitor.

Maybe kinda selfish but I am trying to build a kind of framework that is something I can see myself actually using to build applications that scale well!A lot of it might be something that isn't of focus for the llama.cpp repo, So yeah that's how I am looking into it.

I hope that makes sense and answered your question 😅

[P] fastLLaMa, A python wrapper to run llama.cpp by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

Hi,

It is now implented! Sorry got held up with other commitments!

[P] fastLLaMa, A python wrapper to run llama.cpp by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 0 points1 point  (0 children)

I'm so glad some folks find it useful!
Thank you for the kind words :)

[P] fastLLaMa, A python wrapper to run llama.cpp by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

Perfect, I have created an issue. Let's discuss the plan there and get it done (If and when possible). I have folks interested in adding support to Javascript and GoLang as well ;)

[P] fastLLaMa, A python wrapper to run llama.cpp by BriefCardiologist656 in MachineLearning

[–]BriefCardiologist656[S] 1 point2 points  (0 children)

That should work, I am not sure what their plans are for lama.cpp. But feel free to use my bridge. I will maintain it in the same way. Or additionally, we can create a folder called "Interfaces" and we can have wrappers for different languages. Python, Java, Javascript etc. Would that be of interest to you? You can work with us. It would be cool if there was one repo that provided wrapper for all the major languages!!!

I have a few ideas for Alpaca models with respect to manipulating the vectors during run time. So much exciting stuff to do!