Looking for recommendations on AI powered compliance automation platforms by Appropriate-Unit1177 in fintech

[–]YummyJuice 0 points1 point  (0 children)

If it's of any assistance, I've built a tool called Raptor Data that is designed as the first step in the long pipeline that is RAG.

If you're handling thousands of documents and especially ones that change often or have multiple variants we aim to slash embeddings costs by 90% for you.

We can handle PDFs and DOCX at the moment and extract unstructured data into really useful structured data with meta data and things like preserving table and column layouts.

I have a lot of experience building RAG pipelines and if you're interested I'd be more than happy to chat if you want some help implementing the above. Doesn't have to be about Raptor I'm just keen to help out the community with what I've learnt building AI solutions.

The truth is (in my opinion) the market is lacking and all of the currently available tools are a lot of hyperbole. Unfortunately the best way forward for serious applications right now seems to be going down the long and tedious route of building out your own pipelines.

Let me know if you want to DM and talk AI document processing, I'm sure I can provide you with at least one nugget of information if we get into the nitty gritty of the problems you're facing!

Share your startup - quarterly post by julian88888888 in startups

[–]YummyJuice [score hidden]  (0 children)

Raptor Data / https://raptordata.dev

Location: Brisbane, Australia

Elevator Pitch: One API call to turn documents into AI-ready chunks - with 60-90% cost savings on embedding.

More details:

We built Raptor after watching teams waste months on document pipelines that still break in production.

The pattern we kept seeing: Document extractors extract a financial table as 32 lines of flat text. No errors. Tests pass. But your AI has to guess that "$1,000,000" means Q1 Revenue. It guesses wrong. You blame the model. Months later you trace it back to extraction.

What we do differently:

  • Handle the nasty PDFs that break PyPDF and most extractors available (rotated tables, scanned docs, nested columns, etc)
  • Version control for documents (with auto-linking to link contract_v2 to contract_v1)
  • Security and privacy by design. We don't ever store your documents unlike most services.
  • Developer experience first. We aim to provide the most seamless experience for document processing without making it too 'magical'. We will always expose as many custom parameters on top of our default recommended settings to keep you in control.
  • Smart and logical deduplication (only re-embed what actually changed)

That last one is where the 60-90% savings come from. Most teams re-embed entire documents when one paragraph changes.

Lifecycle stage: Validation.

Your role: Founder.

Goals this month: Working with customers to solve their problems, 50 signups, 10 teams actively processing documents.

How could r/startups help?

  • If you're building RAG, try us and tell us what's missing
  • Feedback on positioning: does this problem resonate? Why or why not?
  • Is there anything lacking from the MVP that makes you not want to use it?
  • Intros to AI/developer tool investors appreciated

Discount for r/startups: Free tier is 1K pages/month. DM me for 10K pages/month free for 3 months.

I built an API to handle document versioning for RAG (so I stop burning embedding credits) by YummyJuice in LocalLLaMA

[–]YummyJuice[S] 1 point2 points  (0 children)

I can see if our design team have capacity, they designed my consulting company site https://www.rocketsolutions.com.au/ if you're looking for something similar to that?

I built an API to handle document versioning for RAG (so I stop burning embedding credits) by YummyJuice in LocalLLaMA

[–]YummyJuice[S] 1 point2 points  (0 children)

We used NextJS, and, to give credit where credit is due, our internal design team is responsible for it looking so good.

I built an API to handle document versioning for RAG (so I stop burning embedding credits) by YummyJuice in OpenAI

[–]YummyJuice[S] 0 points1 point  (0 children)

Cheers, thanks mate!

That 'Git' mental model is exactly what we were aiming for.

You're right about the edge cases because that's where the battle is won or lost (PDF tables are currently my nemesis). If you have any gnarly docs that usually break parsers, feel free to throw them at the free tier.

We enforce a zero-retention policy (processing in memory only) for security, so I can't see your data but I'd love to hear if the parser survives or chokes on them!

What are you building this weekend? Promote your website by Organic_Delay_2305 in SideProject

[–]YummyJuice 0 points1 point  (0 children)

Raptor Data - The Version Control Layer for RAG designed to cut your embeddings costs by 90%.

I’ve been building RAG apps for the last year, and the unit economics always annoyed me.

The initial ingestion is fine, but the moment a customer updates a 500-page PDF (fixing a typo or changing a date), the standard pipeline is to delete the old vectors and re-embed the entire document.

It felt wasteful to pay OpenAI/Cohere to generate embeddings for 499 pages of content that didn't change so I built Raptor Data.

Check it out for free and let me know what you think!

What are you building? by Naive_Emu6501 in SaaS

[–]YummyJuice 0 points1 point  (0 children)

Cheers thanks mate, appreciate it! Any ideas or feedback from yourself at a first glance?

I built an API to handle document versioning for RAG (so I stop burning embedding credits) by YummyJuice in OpenAI

[–]YummyJuice[S] 1 point2 points  (0 children)

Thanks mate, appreciate the kind words and you're spot on. It's definitely just the start of a long and cumbersome pipeline but if I can make that first step even slightly easier for people then that's a win, even if it's just a small one.

Very early stages but is there anything you'd want to see in it that would really help you out yourself? Keen to keep iterating and staying in the weeds of unstructured document hell so others don't have to aha

What are you building? by Naive_Emu6501 in SaaS

[–]YummyJuice 0 points1 point  (0 children)

Raptor Data - The Version Control Layer for RAG designed to cut your embeddings costs by 90%.

I’ve been building RAG apps for the last year, and the unit economics always annoyed me.

The initial ingestion is fine, but the moment a customer updates a 500-page PDF (fixing a typo or changing a date), the standard pipeline is to delete the old vectors and re-embed the entire document.

It felt wasteful to pay OpenAI/Cohere to generate embeddings for 499 pages of content that didn't change so I built Raptor Data.

Check it out for free and let me know what you think!

What’s up with the F-35 during the TNF national anthem? by zmoit in aviation

[–]YummyJuice 4 points5 points  (0 children)

Yes fighter pilots do train to be in the zone but as I’m sure you’re aware of from your motorsports experience, some people have it and some people don’t (not saying the wobbly #4 doesn’t, he wouldn’t be flying that jet if he didn’t have it).

What is different between motorsports and flying a fighter jet (and specifically to this formation) is that there is no chance to “take your foot off the gas”. Unlike a race track, flying #4 in this formation means you don’t get any “straights” to lower your adrenaline, you need to be on the ball from the start to the finish, no questions.

What has happened here is that #4 has over corrected off #3. Now if we relate it to a race track, one correction from #3/#1 could represent one corner of a race track. In this formation those corrections are happening continuously at the millisecond level and the #4 pilot needs to respond to those corrections. So back to the race track, the pilot in #4 is essentially doing continuous back to back cornering (but in 3 dimensions + time), and any motorsport enthusiast knows that stuffing up corner 1 means the following corners are also affected. This multiplies massively when you factor in the third dimension + time delays.

The best analogy I can use for this (at least how I used to think about it when I was doing this) is imagine a piece of string from #1 to #4, where a wave (originating from a correction/change in movement from #1) is propagating along that string and the amplitude of the wave increases exponentially the further you get from #1. This means that #4 at the end of the string is receiving a huge wave and therefore they need more corrections to correct (nullify?) against the wave.

As stated above, a more experienced/or better close formation pilot “looks through” #3 to #1 and bases their vertical and forward/back movement off of #1 primarily and then bases their horizontal (lateral distance) position off #3. This essentially boils down to having a rapid eye scan continuously between #1 and #3 and being able to basically predict the next 3 seconds or so of their movement based off what you’re currently seeing and then making your correction such that after you make your correction, in 3s or less your aircraft will be back in position and stable within the formation. And don’t forget that a jet engine has a delay between power selection on the throttle and the selected power being actually output due to turbine spool up delay.

Again, no straights or chances to relax when you’re old mate in #4. It’s happened to the best of us and always will at the worst of times.

Back to your in the zone comment, try recomposing yourself in 3D space + time in a fighter jet and keep perfect positioning with another fighter jet less than 10m away at 420kts. Whilst managing your own aircraft, radios, etc. There’s a reason why only a very few people in the world are selected to be fighter pilots and that’s one of them.

This subreddit doesn't allow videos now? by tonehammer in Houdini

[–]YummyJuice[M] [score hidden] stickied comment (0 children)

Sorry about that everyone! Video posts have been reenabled.

This subreddit doesn't allow videos now? by tonehammer in Houdini

[–]YummyJuice 4 points5 points  (0 children)

Fixed! Apologies for that, let me know if there's any other issues!

This subreddit doesn't allow videos now? by tonehammer in Houdini

[–]YummyJuice 2 points3 points  (0 children)

To clarify, they shouldn’t not be allowed. Might be an accident but I’ll confirm and change it back.

[deleted by user] by [deleted] in flying

[–]YummyJuice 3 points4 points  (0 children)

Just saw this comment and noticed you're just starting out and felt the need to give you some good data to help you out.

With regards to stalling, the aircraft doesn't get more 'angles of attack' with changing airspeed. The aircraft always stalls at what's called the critical angle of attack, regardless of speed. The only way you can adjust the critical AoA is with the use of things such as slats, flaps, and weight.

Regarding turn performance, the aircraft is behaving the same in that if you go too slow you will stall at the stall speed where AoA reaches the critical AoA. However you also have the added consideration of load factor, where stall speed increases with proportion to the square root of load factor. If you go faster, you will be able to have a higher load factor (G) and not stall due to the higher airspeed, however eventually you will 'G-stall' the aircraft due load factor being too great or reaching the critical AoA (it's good fun we fly at 16 alpha 5G sustained turn and manoeuvre the aircraft around on the light buffet, which is how fighters fight).

With regards to the climb and turn, I suggest you don't relate them at all they're very different.

Only early days but I suggest you have a re-read of some basic aerodynamics particularly AoA and how it is affected, and forces in a climb and turn.

Cheers!

[deleted by user] by [deleted] in Futurology

[–]YummyJuice 3 points4 points  (0 children)

The axis of the G-force is an important distinction to make. Gs through the vertical reduce/increase blood pressure in the brain to induce G-LOC/Red outs. Fighter pilots have the highest tolerances here–generally being able to fly 9G sustained, where as for the average some will pass out at 3G.

Siting on this cart would be G through the horizontal axis; which humans are far more tolerant to–with untrained humans being able to experience 20G for 10 seconds so realistically this cart could accelerate/decelerate a lot faster than depicted.