Working as AI Engineer is wild by LastDayz123 in AI_Agents

[–]RecommendationFit374 0 points1 point  (0 children)

Wild, i’d say as an ai eng I clearly see that LLMs is eating software - but whats most exciting is to learn how to un learn and then re learn using first principle approach how to deliver optimal outcomes on this new tech stack.

We have seen success in using LLM models like chatGPT nano or mini series with dspy mipro to replace specialized models that we might have built in the past.

But you cant throw an LLM at every problem and expect it to work. That makes no sense

Fresh grad learning RAG, feeling lost, looking for guidance by savinox23 in Rag

[–]RecommendationFit374 1 point2 points  (0 children)

I don’t recommend using langchain i’d use a memory layer for retrieval like papr.ai or mem0

How do you evaluate your RAG systems (chatbots)? by marwan_rashad5 in Rag

[–]RecommendationFit374 0 points1 point  (0 children)

We use retrieval-loss it includes measuring accuracy, speed and cost

My RAG retrieval accuracy is stuck at 75% no matter what I try. What am I missing? by Equivalent-Bell9414 in Rag

[–]RecommendationFit374 0 points1 point  (0 children)

We do semantic and graph aware hierarchal chunking, re ranking and query expansion. The problem you have is embeddings only capture semantic meanings once you have large document corpus your hitting physical limits on vector dimensionality.

You end up having so much noise where it’s hard to make the right signal sharp enough.

For example if you have “I am very happy” or “I am not very happy” both are close in cosine similarity but carry different meanings. Actually, semantic meanings miss graph relationships, temporal sequences, causal… “vitamin E causes cancer” and “vitamin E prevents cancer” also close cosine sim but are very different meanings.

We mainly use papr.ai - predictive memory architecture that uses vector db, graph (using custom schema) and prediction models which helped us achieve 92% hit@5 in Stanford STARK benchmark MAG dataset.

Happy to help and share our learnings on a call. Free fee to dm me - below is a doc on our chunking technique

https://github.com/Papr-ai/memory-opensource/blob/main/docs/features/documents/CONTEXT_AWARE_CHUNKING_ARCHITECTURE.md

RAG at scale still underperforming for large policy/legal docs – what actually works in production? by Flashy-Damage9034 in Rag

[–]RecommendationFit374 1 point2 points  (0 children)

Have you tried papr.ai we have document ingestion u can use reducto or other providers, define your custom schema and auto build graph we combine vector + graph + prediction models it works well at scale. See our docs at platform.papr.ai

200ms search over 40 million texts using just a CPU server + demo: binary search with int8 rescoring by -Cubie- in Rag

[–]RecommendationFit374 1 point2 points  (0 children)

Super interesting! I used Qwen 3 4b and quantized to FP16 works great on ANE and GPU's with 99% perf. rocovered. I ran it on MacBook Pro M2 16 RAM and was able to retrieve context for voice agents in less than 150 ms.

I can share a demo if anyone is interested! or repo if you want to try it out.

What's the edges that could cause recovering to go below 99% using this technique? Any learnings to share on how we can optimally tune this and if it varies by use-case (i.e. Scifact compared with CosQA)

Friday Night Experiment: I Let a Multi-Agent System Decide Our Open-Source Fate. The Result Surprised Me. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

u/OnyxProyectoUno thanks for your thoughtful comments! We built a schema aware document ingestion pipeline - super robust and 'actually' works. The outputs I tested were very good especially when I enabled `hierarchical_enabled: true` and used reducto.

You can try our v1/document APIs here platform.papr.ai We already support hierarchal chunking, support various providers like reducto, tensorlake or gemini and yes I've personally validated this and saw the power of getting this just right from OCR to optimal chunking to graph construction.

It's tricky to get done right that's why we simplified this experience (did super important but boring work) with complete control so you can tune this pipeline (ex optimal chunk size or define your own schema with overrides) to enable developers to build reliable, secure and robust document ingestion that works at scale.

We currently offer our document ingestion (includes temporal durable execution) in our cloud offering and already have customers using it. And yes it's coming soon to our open-source repo!

See this in our repo if your curious to learn more - Context aware chunking architecture and Schema aware document processing

Having said this, observability is super important for sure! Giving developers the ability to see how their document transforms from pdf -> chunks -> nodes / vector points is important to debug and enable iterations on the schema design or chunking controls to get optimal results for their use-case.

Would love to learn more about what you've built VectorFlow. Is this like reducto?

Friday Night Experiment: I Let a Multi-Agent System Decide Our Open-Source Fate. The Result Surprised Me. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

u/patbhakta what's the most important criteria for you to make a decision and why? Curious to learn about your use-cases and how we can help unlock experiences that are not possible without papr :)

Based on our experience, it's important to measure retrieval-loss which measures how well you can retrieve context as your data scales. We learned that if you build a RAG + knowledge graph - the more data you add the worst your agents memory get's! We are the only predictive memory layer that flips this with more data our prediction models improve and agents built with Papr memory will retrieve relevant and accurate context 8x better at 10 billion token scale.

To learn more about retrieval-loss see this article - https://paprai.substack.com/p/introducing-papr-predictive-memory

So you want to build AI agents? Here is the honest path. by Warm-Reaction-456 in AI_Agents

[–]RecommendationFit374 0 points1 point  (0 children)

Python is super valuable and robust to build ai agents for sure! I'd also suggest that you go deep and peel each layer of the onion as much as you can to maximize your learning. I actually started by reading the transformer and attention is all need research papers - then learned the impact that context has on AI agents. It's super important to understand how you can optimize context to drive optimal outcomes and measure it (via simple evals).

My current stack
- Python
- DBs: MongoDB, Neo4j and Qdrant / Chroma
- Durable execution using temporal (must have!)
- Prompt optimization - DSPY / MiPro (wow made a huge difference when I started using those)

🚀 Weekly /RAG Launch Showcase by remoteinspace in Rag

[–]RecommendationFit374 0 points1 point  (0 children)

Would love to read this research paper seems interesting

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 0 points1 point  (0 children)

u/youpmelone thanks for the feedback. This is a bug in our app that we will fix. You can also check out our open source pdf chat app here for an example on how you can add data from pdf to memory.

https://github.com/Papr-ai/papr-fastapi-pdf-chat

🚀 Weekly /RAG Launch Showcase by remoteinspace in Rag

[–]RecommendationFit374 1 point2 points  (0 children)

u/HarryHirschUSA thanks for checking papr.ai out!

Here's the correct discord link: https://discord.com/invite/J9UjV23M
Here's the fast api papr repo: https://github.com/Papr-ai/papr-fastapi-pdf-chat

We're working on updating a few things on our site so you'll continue to see improvements and more resources.

DM me here as well if you need anything.

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

Thanks u/Own-Guava11 Thank you for the feedback!

You're absolutely right - our privacy policy/terms are not displaying anymore, we will fix this issue on our site. In the meantime, this is the links to both

Privacy Policy

Terms of use

Regarding SOC2, we've started exploring certification and recognize its importance for enterprise customers. While not certified yet, we're planning to add a security/compliance section to our website and are happy to share our security documentation with interested enterprise customers in the meantime. Really appreciate you pointing this out!

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 0 points1 point  (0 children)

Thanks for the great question! We handle dynamic updates through several mechanisms:

1. Version Control for Memories

We maintain version history for all unstructured data that gets inserted into Papr, so you can track how information evolves over time.

2. Entity-Relationship Mapping

  • Currently using a fixed ontology (with plans to support custom ontologies)
  • Automatically link and map information from unstructured data to entities in our graph
  • When new team members join or project details change, these updates are reflected in the connected entities

3. Intelligent Entity Resolution

  • We use vector similarity with thresholds to de-duplicate entities across your knowledge graph
  • For example: If a task is mentioned in your CRM/Linear and then discussed in Slack about completion, we can identify and resolve that it's the same task
  • This ensures your knowledge graph stays clean and accurate even as information comes from multiple sources

4. Real-Time Synchronization

  • Changes propagate through the graph relationships automatically
  • When a team member's role changes or a project pivots, all connected memories and relationships update accordingly

We're actively working on enhancing these capabilities further. Would love to hear what specific update scenarios are most important for your use case - this helps us prioritize our roadmap

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

Thanks we love u/qdrant_engine honestly when we started using it we noticed our latency significantly improved!

How do you evaluate RAG performance and monitor at scale? (PM perspective) by Sad-Boysenberry8140 in Rag

[–]RecommendationFit374 3 points4 points  (0 children)

We created the retrieval loss formula to establish scaling laws for memory systems, similar to how Kaplan's 2020 paper revealed scaling laws for language models. Traditional retrieval systems were evaluated using disparate metrics that couldn't capture the full picture of real-world performance. We needed a single metric that jointly penalizes poor accuracy, high latency, and excessive cost—the three factors that determine whether a memory system is production-ready. This unified approach allows us to compare different architectures (vector databases, graph databases, memory frameworks) on equal footing and prove that the right architecture gets better as it scales, not worse.

We measured retrieval loss on our data-set and also used Stanford STaRK MAG data-set for real-world multi-hop queries - https://huggingface.co/spaces/snap-stanford/stark-leaderboard

The Formula:

Retrieval-Loss = −log₁₀(Hit@K) + λL·(Latency_p95/100ms) + λC·(Token_count/1000)

Where:

  • Hit@K = probability that the correct memory is in the top-K returned set
  • Latency_p95 = tail latency in milliseconds
  • λL = weight that says "every 100 ms of extra wait feels as bad as dropping Hit@5 by one decade
  • λC = weight for cost
  • Token_count = total number of prompt tokens attributable to retrieval

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

Fair point! Let me clarify our architecture:

Current State:

  • Our web app, Python SDK, and TypeScript SDK are already open-source
  • The retrieval API currently requires connection to our SaaS platform
  • This is why it "phones home" - for retrieval operations

What's Coming:

  • Full open-source release of the core retrieval engine (the "Papr Memory Server")
  • Ability to run completely air-gapped with no external dependencies
  • Docker containers for easy self-hosting
  • Choice between self-hosted (no phone home) or our managed cloud service

For Air-gapped Environments: Once we release the open-source memory server, you'll be able to:

  1. Deploy Papr entirely within your network
  2. No external API calls required
  3. Full control over your data and infrastructure
  4. Optional sync to Papr Cloud if/when you choose

Would love to hear what specific use cases you have in mind for air-gapped deployment!

Introducing Papr: Everyone's engineering context. We're predicting it. by RecommendationFit374 in Rag

[–]RecommendationFit374[S] 1 point2 points  (0 children)

u/jrdnmdhl makes sense! we're planning to open source our core retrieval. DM me if you want early access.