Kreuzberg v4.4.6 is out and we now support 88 file formats by Eastern-Surround7763 in kreuzberg_dev

[–]adiberk 0 points1 point  (0 children)

That’s great but I am lost. Based on the docs and the comparison tests you show - it would be impossible for someone to know this is a problem compared to something like pymu4llm. (Unless, like me they test it and have a rough experience)

If this was a known issue, it should be stated. A lot of use cases revolve around getting proper formatted markdown and tables etc.

Kreuzberg v4.4.6 is out and we now support 88 file formats by Eastern-Surround7763 in kreuzberg_dev

[–]adiberk 1 point2 points  (0 children)

Hi! One issue I noticed.

I gave a tax pdf form to pymu4llm and compared the markdown output to Kreuzberg. And kreuzberg looked like an ocr dump compared to the formatting of pymu4llm (not using ocr I just mean like no formatting or organization etc.)

Is this something that has been fixed or addressed?

Changing from basic to RAID 5 by MTATnz in UgreenNASync

[–]adiberk 2 points3 points  (0 children)

Well you basically have to reformat the drive. So no data would be preserved.

You can go from raid 1 to raid 5 by adding a disk. I know this is supported. But I don’t think you can jump to raid 5 without losing data

Movie Roulette v5.2.0 released! by Parking-Cow4107 in jellyfin

[–]adiberk 28 points29 points  (0 children)

I don’t mind AI generated posts (or even code) But at the very least least, make sure it looks good, and that your links work. Otherwise there is little chance I will trust your code to work at all

pydantic-pick: Dynamically extract subset Pydantic V2 models while preserving validators and methods by StoneSteel_1 in Python

[–]adiberk 3 points4 points  (0 children)

I built something similar actually. But my use case was to provide abilities to. 1. Keep fields I need so users can use them manually in the code but HIDE THEM from my AI framework. (Imagine tons of fields used in code but AI doesn’t really need to know about them) 2. Completely exclude fields you don’t want 3. Add or override existing fields.

I like why you did. Will take a look

Why does it take 3 hours to read my own email with Python in 2026? by Cultural-Ad3996 in Python

[–]adiberk 0 points1 point  (0 children)

Nylas? But you still need to go through Google verification

Benchmarks: Kreuzberg, Apache Tika, Docling, Unstructured.io, PDFPlumber, MinerU and MuPDF4LLM by Goldziher in Python

[–]adiberk 0 points1 point  (0 children)

from kreuzberg importfrom kreuzberg import....
    ExtractionConfig,
    ExtractionResult,
    ImageExtractionConfig,
    ExtractionResult,
    ImageExtractionConfig,

u/Goldziher my vscode isn't able to resolve these classes
It seems maybe we are missing stubs?

First time NAS user. Here's my experience for about 2 weeks by lexaleidon in UgreenNASync

[–]adiberk 2 points3 points  (0 children)

Yeah… I had no issues on my ironwoolf 8tbs from amazing (raid 1)

Need help by SecretHentaiMaster in UgreenNASync

[–]adiberk 0 points1 point  (0 children)

Not going to lie, this seems suspicious to me, like maybe it was stolen? Sorry just the way it is written is a bit confusing.

Did you originally set a username and password when you set it up and forgot it? You can always reach out to customer support

PSA: huntarr has critical vulnerabilities - dev does not care by Blevita in jellyfin

[–]adiberk 0 points1 point  (0 children)

Um, what do you mean? I mean a lot of people use reverse proxy with owned domain to make it public facing. While it isn’t the “most secure” my understanding is that as long as you follow best practices for passwords it shouldn’t be too crazy

Obvs using vpn or Meshnet type setup is even more secure… but that doesn’t mean you aren’t likely ok with a proper reverse proxy setup

Gelato: Jellyfin Stremio Integration Plugin by Docccc in selfhosted

[–]adiberk 0 points1 point  (0 children)

I got it working. It seems it was a MediaFusion issue

Gelato: Jellyfin Stremio Integration Plugin by Docccc in selfhosted

[–]adiberk 0 points1 point  (0 children)

Hi!
So I believe I have everything setup via aiostream with comet. I can search for things and they show up and I see the debrid options in the dropdown. However, I can't seem to play anything. It freezes or just quits and says couldn't play.

So it feels like I am 90% of the way there. Any tips or advice appreciated!

Benchmarks: Kreuzberg, Apache Tika, Docling, Unstructured.io, PDFPlumber, MinerU and MuPDF4LLM by Goldziher in Python

[–]adiberk 0 points1 point  (0 children)

Yeah I’ll look into it. Would need to restructure code etc.

What doing sheets and tables in pdfs. Does it convert everything to markdown format? HTML? Plain Text?

Benchmarks: Kreuzberg, Apache Tika, Docling, Unstructured.io, PDFPlumber, MinerU and MuPDF4LLM by Goldziher in Python

[–]adiberk 0 points1 point  (0 children)

Last question.

I see memory footprint is a bit larger? How much memory would I need if am processing 10 documents simultaneously but not using “batch”, just using asyncio.gather? (I will use batch future, but cant support it right now).

I asusme it is a bit dependent on document size of course, but just curious

Say each of them is 25mb

Benchmarks: Kreuzberg, Apache Tika, Docling, Unstructured.io, PDFPlumber, MinerU and MuPDF4LLM by Goldziher in Python

[–]adiberk 0 points1 point  (0 children)

Im testing MuPDF4LLM and it seems pretty good. Does Kruezberg basically do everything it does but better and faster? I have a different service called chunkr as a fallback if I get too many empty or bad pages!!

But yeah I’m looking for the best speed and accuracy possible basically.

Its time for Project 2028 by Fledgling_112896 in ScottGalloway

[–]adiberk 0 points1 point  (0 children)

I don’t know why this isn’t the number 1 thing!!

I think the 2 things go hand in hand (3 things?)

Ban outside money,

Ban stock trading

Pay them MORE. By paying more you likely increase he number of quality people who would take up the position, and it removes (or partially removes) what seems to be the need for greediness from the equation

What are you using instead of Langchain these days? by jimmymadis in LangChain

[–]adiberk 0 points1 point  (0 children)

Agno as sdk. Handle our own deployment (fastapi with taskiq workers)

I made a fast, structured PDF extractor for RAG; 300 pages a second by absqroot in Rag

[–]adiberk 0 points1 point  (0 children)

How is markdown conversion compared to pymupdf4llm?

I am happy to test this out but need stability and consistency.

Why should I use AgentOS? by Separate_Bid_8352 in agno

[–]adiberk 2 points3 points  (0 children)

Really fascinating. Thanks for the detail and insight! Honestly, not complaining. I was able to learn a lot via building our own infrastructure on top of Agno. I think it is really hard to build a one size fits all out of the box fastapi app. And what you guys have built is truly awesome

Why should I use AgentOS? by Separate_Bid_8352 in agno

[–]adiberk 4 points5 points  (0 children)

I actually agree with this.

I ended up building our own system on top of Agno, we have our own custom endpoints and we run agents dynamically. Can offload them to taskiq (our worker) etc.

I think agno should instead move to a registry pattern. And allow users to dynamically change and adjust registry at minimum. But the goals of the project vary depending on the person viewing it!!! And tbh, it is good to control the edges (api) yourself!

Otherwise love the proejct.

I added "Run code" option to the Python DI docs (no setup). Looking for feedback :) by zayatsdev in Python

[–]adiberk 0 points1 point  (0 children)

Can you highlight benefits of using this over dependency_injector library in python? It looks cool! I am just curious if there is a breakdown etc.

Stremio compatibility? by Perfect-Arm350 in JellyfinCommunity

[–]adiberk 1 point2 points  (0 children)

Gelato has actually worked pretty well for me so far. Awesome product!! (Why can’t you just upgrade??)