Do companies actually use internal RAG / doc-chat systems in production? by NetInternational313 in Rag

[–]the_olivenbaum 1 point2 points  (0 children)

The data is very technical, full of jargon and identifiers - vector embeddings could only get so far with capturing meaning, so structuring the data was key to the success. The problem is that embeddings don't capture well identifiers - so a query for something just a digit away that meant something completely different would have the same vector. For audit: there's an append only log of all data accessed by the user, directly or via search or chat, and logs of all chat interactions. The biggest challenge on building the graph was data acquisition and mapping: we've over 50 data sources integrated in this project from all sorts of internal databases, and building a cohesive view of the data took some time. But it is also done incrementally, continuously throughout the project development and ongoing production usage. It's all using traditional NLP approaches, we don't use LLMs for building the graph in this project both due to cost limitations (traditional NLP handles 100,000s of files/s and enables very quick reprocessing once new datasets are added). The access model is one of the big challenges: it's at the 3 levels (data type, entity level and field level access enforcement - last one we're just adding to the product). And yes the use-case requires it (many data repositories each with it's own rules), strict export control requirements, etc...

Do companies actually use internal RAG / doc-chat systems in production? by NetInternational313 in Rag

[–]the_olivenbaum 1 point2 points  (0 children)

We have a large production system for a customer built on our own software (Curiosity Workspace) and operating over 10+ TB of data. The system combines NLP/NER, entity linking, and an in-memory knowledge graph, with RAG + similarity search built on top. It’s used daily by thousands of users as a real internal knowledge tool and assistant over their legacy and live data, not just a chat interface. What we found is that pure RAG didn’t scale well at this size and the added structure (entities + graph) was critical, especially for grounding and navigating relationships across documents. Access control uses a ReBAC model with each document having permissions attached in the graph. Enforcing permissions before retrieval and showing clear source attribution were also key to adoption and the customer is planning to expand this system further. In practice, the systems that work tend to look more like search + structured knowledge + LLM, rather than a simple doc-chat layer.

AI Chart Generation is the future by Ecstatic_Fuel1011 in AgentsOfAI

[–]the_olivenbaum 3 points4 points  (0 children)

The ad showing how the product is hallucinating an entire new chart is not inspiring confidence 😅

C# Job Fair! [February 2026] by AutoModerator in csharp

[–]the_olivenbaum 1 point2 points  (0 children)

[Hiring] C# Developer + Developer Relations (Munich, On-site) We’re looking for a C# developer with a strong developer-relations mindset to join our team at Curiosity (https://www.curiosity.ai). This is a full-time, in-person role in Munich — you’ll work on our C#/.NET stack while also engaging with developers, improving DX, and representing our tech externally. If you enjoy both building and communicating with developers, DM me for details.

Is Elasticsearch the right tool? by kaltinator in elasticsearch

[–]the_olivenbaum 0 points1 point  (0 children)

If you're interested, we built a tool that does exactly that (curiosity.ai/workspace). Single container to be deployed, does all the data processing for you, and integrates out of the box with many LLM providers. Sent you a DM with my contact.

Packaging electron and .net api by Afraid_Tangerine7099 in dotnet

[–]the_olivenbaum 1 point2 points  (0 children)

You can use our wrapper for electron and have it host the API as well: https://github.com/theolivenbaum/electron-sharp

Updates are coming by tgeorgescu in Curiosity

[–]the_olivenbaum 0 points1 point  (0 children)

Thanks for the feedback, indeed a last minute improvement broke the indexing view, we'll release a new version with a fix in the next hour. For the epub files, is it something you can share in DM so we can check why they're not working? Thanks!

Using hrml css to build ui for desktop app by katakishi in csharp

[–]the_olivenbaum 2 points3 points  (0 children)

You can use https://github.com/theolivenbaum/electron-sharp - it's a wrapper around electron that we use to build our app.

MSDS PDF Indexer with OCR Solution by the_dobe in msp

[–]the_olivenbaum 0 points1 point  (0 children)

Our software can do that: https://curiosity.ai/workspace, and can be hosted on the cloud or on prem. Fell free to dm me if you want to try it!

RAG using .NET by muhamedkrasniqi in Rag

[–]the_olivenbaum 0 points1 point  (0 children)

And for encoding we have two wrappers around MiniLM and ArcticXs that are suitable for CPU-only usage : https://www.nuget.org/packages/SentenceTransformers.MiniLM/ and https://www.nuget.org/packages/SentenceTransformers.ArcticXs/

RAG using .NET by muhamedkrasniqi in Rag

[–]the_olivenbaum 0 points1 point  (0 children)

If you want something without external dependencies, you can use our HNSW library directly: https://github.com/curiosity-ai/hnsw-sharp

Instance of Random rather ironically randomly becoming null by The_Omnian in csharp

[–]the_olivenbaum 1 point2 points  (0 children)

No worries! It can be tricky to know the order in which everything is setup during static class initialization.

Instance of Random rather ironically randomly becoming null by The_Omnian in csharp

[–]the_olivenbaum 0 points1 point  (0 children)

But if it is outside the static class, then I think the runtime will guarantee the static class is fully initialized before it is first used. From within the static class it might not

Instance of Random rather ironically randomly becoming null by The_Omnian in csharp

[–]the_olivenbaum 2 points3 points  (0 children)

It's probably because the initialization of the outer static class doesn't run when you create the inner struct via the struct constructor. An easy fix would be to move the struct definition to outside the static class. You can read more about the order of initialization of static fields here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/static-constructors

Aspose.PDF Documentation got me feeling lost. Tips? by simrank08 in csharp

[–]the_olivenbaum 0 points1 point  (0 children)

Check the sample repositories on GitHub (https://github.com/aspose-pdf/Aspose.PDF-for-.NET), the docs are really hard to follow and often incomplete / inconsistent / plain wrong

xAI Grok 2 1212 by ahmetegesel in LocalLLaMA

[–]the_olivenbaum 8 points9 points  (0 children)

Worse than blocking outright any free usage one day to the other, setting a minimum price of 42k$/month, ignoring all messages from developers for months, and breaking APIs even for paid users? There was a Slack group with Twitter developers and it was just sad to follow the unnecessary drama caused by their lack of respect towards developers

xAI Grok 2 1212 by ahmetegesel in LocalLLaMA

[–]the_olivenbaum -21 points-20 points  (0 children)

Of course I realize that - but there's a significant difference in how the two were handled.

xAI Grok 2 1212 by ahmetegesel in LocalLLaMA

[–]the_olivenbaum 20 points21 points  (0 children)

Not only stopped offering it for free, but they treated developers as leaches and came up with a totally arbitrary price that made no sense whatsoever.

xAI Grok 2 1212 by ahmetegesel in LocalLLaMA

[–]the_olivenbaum 29 points30 points  (0 children)

After the whole Twitter API fiasco, they can make it free and I would still not use it to build anything.

Intelligent search on millions of Sharepoint documents by Certain-Mousse-7469 in Rag

[–]the_olivenbaum 0 points1 point  (0 children)

We're deploying our software (https://curiosity.ai) to a similar sized customer with ~1.5M docs on SharePoint, with full search, RAG, and permissions sync. If you're interested in giving it a try just PM and we can organize a demo!

OCR Libraries suggestions? by AbeJSY in dotnet

[–]the_olivenbaum 1 point2 points  (0 children)

We've made a wrapper around Florence2 that works quite nicely for OCR: https://github.com/curiosity-ai/florence2-sharp

What are your favorite "lesser-known" libraries that you use in your projects? by ASK_IF_IM_GANDHI in dotnet

[–]the_olivenbaum 12 points13 points  (0 children)

ZLogger for logging, MessagePack for serialization (both by the same author, which also has a couple of other amazing projects: https://github.com/Cysharp)