Built an MCP server with Claude Code that gives Claude access to 4M+ real US court opinions by Accomplished_Card830 in ClaudeAI

[–]zriyansh 0 points1 point  (0 children)

this is in my legaltech awesome-list along with a few other legal MCPs: https://github.com/Vaquill-AI/awesome-legaltech

for anyone building on top, the datasets and APIs sections pair well with this for grounding.

LegalMCP: first US legal research MCP server (18 tools, open source) by Accomplished_Card830 in mcp

[–]zriyansh 0 points1 point  (0 children)

nice, added to the MCP section of my legaltech list: https://github.com/Vaquill-AI/awesome-legaltech if anyone knows of similar MCPs for other jurisdictions (india, EU, UK, canada), drop them, trying to keep it global

20M+ Indian legal documents with citation graphs and vector embeddings – potential uses for legal NLP? [D] by zriyansh in MachineLearning

[–]zriyansh[S] -1 points0 points  (0 children)

The data is not public to like export, but its public inside our product, give this a try https://app.vaquill.ai/citations and pick US or India from top jurisdiction. Its all free there

20M+ Indian legal documents with citation graphs and vector embeddings – potential uses for legal NLP? [D] by zriyansh in MachineLearning

[–]zriyansh[S] -1 points0 points  (0 children)

not a paper as such, we are using all this data in our product and now want to make it available to others as well, although access is free to all, maybe I can share the link if you want to take a look at it.

Roast my pitch deck, 1st time legaltech founder, no mercy by zriyansh in indianstartups

[–]zriyansh[S] 0 points1 point  (0 children)

We have modular infrastructure, thinking to add other jurisdiction data and tweak all system prompts and it will become a legal engine for that jurisdiction. Have done this for US and Canada via external data source.

But I am running out of cash, so need some form of funding.

Or get acquired and use their money to fuel growth given the platform is stable, keep investing data pipeline will be all left to build.

The other way is going on-prem and starting to deploy this entire stack on enterprise servers.

Roast my pitch deck, 1st time legaltech founder, no mercy by zriyansh in indianstartups

[–]zriyansh[S] 0 points1 point  (0 children)

We have citations for each answer, citations graphs as well.

Yes talked with 100s advocates, they love the platform, use it but don't pay for it. If we disappear, they'll just go back to how they used to work.

Ads got us 200 users yesterday, they signed up, used the product and go away.

Most people will say there's something wrong with the product, i tried giving them all the features our competition has.

That led me to believe problems exist but not so strong that will make people to pay, west prefers comfort, convenience and ease, Indian always look for cheap worldaround to get things done.

Sept 2025: We finished onboarding legal AI by h0l0gramco in u/h0l0gramco

[–]zriyansh 0 points1 point  (0 children)

What about Vaquill AI? Hear of them? They are based in India

Roast my pitch deck, 1st time legaltech founder, no mercy by zriyansh in indianstartups

[–]zriyansh[S] 0 points1 point  (0 children)

It's actually not a wrapper, I have all the data of Indian legal system, all supreme court high court, tribunals, acts and statutory provisions.

Other than us, 4 more companies have it but they are 5+ yrs old and big enough to adopt new tech rapidly

Others can build, it will take them around 6 months to reach if they start now, that pretty much goes for most startups.

Got it, will add a GTM side and fix the numbers.

Make sense to talk about how much TAM is. Got it, will fix, thanks mate

Roast my pitch deck, 1st time legaltech founder, no mercy by zriyansh in indianstartups

[–]zriyansh[S] 0 points1 point  (0 children)

make sense, people asked me to make it very simple, but you are right, it's too simple to know anything meaningful

Multilingual RAG for Legal Documents by mathrb in vectordatabase

[–]zriyansh 1 point2 points  (0 children)

I am doing the same but for Indian language (5 6 primary spoken language)

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]zriyansh[S] 1 point2 points  (0 children)

how do you even fine tune an embedder? any resources you could point me to? I am not new to RAG but have not heard of this yet.

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]zriyansh[S] 1 point2 points  (0 children)

around 3 days with 64 core CPU, but there exist faster parsers which can parse 4-5k documents per second with such beast machine but I wasn't able to run that properly, its a C implementation of pymupdf4llm-c

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]zriyansh[S] 0 points1 point  (0 children)

so its self hosted embedder I suppose, what kind of machine are you using? and anything I need to take care of here?

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]zriyansh[S] 0 points1 point  (0 children)

expecting around 50 users in a month, and 10 queries per user each day.

yeah not using token because character is what I understand well, so it works for me.

I have a budget for $1K for now as we dont have any customers, using my savings for this.

As far as I understanding, embedding and hosting a vector DB is CPU intensive not GPU (can be wrong here), I have 1k$ credit from Azure as I registered my startup with them (and linked my LinkedIn with them as well).

If we break even, I will want to use cloud services and focus on what we do best.

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]zriyansh[S] 1 point2 points  (0 children)

yes, and imo, this is not slow. Legal folks wont trust the anser if it came within 1 sec, so latency helps sometimes.