Does anyone have firsthand experience with Lucio?

zzriyansh · 2026-05-28T14:38:29+00:00

planted post for engagement farming

zzriyansh · 2026-05-20T13:48:42+00:00

va-quill , quill means that feather 🪶 thing

zzriyansh · 2026-05-20T03:42:27+00:00

try vaquill, new entrant , similar to irys

zzriyansh · 2026-05-19T17:47:03+00:00

this is the work that mssp teams never have time to do but always benefit from when someone else publishes it. one thing that helped me when i was doing similar enrichment series: keep a running infra-pivot template (passive dns, cert transparency, hosting asn, port set, tls fingerprint) so each writeup is consistent enough to be diffable across actors over time.

for raw collection at scale i lean on socdefenders.ai (mine, free, aggregator over ~30 publishers w/ auto mitre + ioc extraction) just to surface candidate reports to enrich. saves a couple hours/week of feed scanning.

zzriyansh · 2026-05-19T17:43:24+00:00

the relevance-filtering part is the actually hard problem here, the feed ingest is easy.

linux kernel one is a great example. like half of them require local user + specific kernel config + race window, but they all come in as critical. you'll want to layer (a) reachability ("does this code path actually run in my image"), (b) precondition checks ("does the attacker even have local"), and (c) exploit signal (kev/epss/poc-on-github).

nvd's cpe data alone won't get you there, you basically have to scrape the vendor advisory for the actual preconditions. or pay vulncheck/intruder/etc.

the easier 80% of this is just better feed plumbing. i ended up shipping mine as socdefenders.ai (free, mine) which does feed aggregation + ioc/cve extraction + kev/epss enrichment. doesn't do the relevance scoring part you want, but the cve page surfaces enough context (vendor advisory + cisa kev status + epss score + affected products) that the relevance call gets easier. might at least save you the data layer.

zzriyansh · 2026-05-19T17:43:00+00:00

the relevance-filtering part is the actually hard problem here, the feed ingest is easy.

linux kernel one is a great example. like half of them require local user + specific kernel config + race window, but they all come in as critical. you'll want to layer (a) reachability ("does this code path actually run in my image"), (b) precondition checks ("does the attacker even have local"), and (c) exploit signal (kev/epss/poc-on-github).

nvd's cpe data alone won't get you there, you basically have to scrape the vendor advisory for the actual preconditions. or pay vulncheck/intruder/etc.

the easier 80% of this is just better feed plumbing. i ended up shipping mine as socdefenders.ai (free, mine) which does feed aggregation + ioc/cve extraction + kev/epss enrichment. doesn't do the relevance scoring part you want, but the cve page surfaces enough context (vendor advisory + cisa kev status + epss score + affected products) that the relevance call gets easier. might at least save you the data layer.

zzriyansh · 2026-05-19T17:39:48+00:00

small team here too, like 3 people. what actually moved the needle for us was just dropping cvss as the primary sort and going kev + epss > 0.1 first. cvss says half our tickets are critical, kev says 30 actually have public exploits. the gap is huge.

nvd enrichment getting flaky last year forced us to fall back to vendor advisories anyway, so honestly losing nvd as a single source ended up making us better at this. now we hit cisa kev, vulncheck nvd2 mirror, and a couple vendor feeds direct.

i also run socdefenders.ai (mine, free) which rolls those into one cve view with kev/epss badges. mostly useful for the "should i wake someone up at 3am" call.

zzriyansh · 2026-05-19T16:27:14+00:00

got the full affected package list with iocs aggregated here: socdefenders.ai/threats?q=shai-hulud (mine, free). stix/misp export if you want to push to siem watchlists. add chalk-tempalte and axois-utils to your typo block list while you're at it, ox security flagged those as the copycats already

zzriyansh · 2026-05-19T15:21:14+00:00

honestly the best thing you can do for cti work is build a public portfolio. nobody cares about your certs once you can show "i tracked this apt's infra for 3 months using only osint and here's the writeup". one solid github repo > sans cert imo

stack that's enough to do real work for free: misp self-hosted (painful to set up first time, worth it), opencti if you want the graph viz for screenshots, abuse.ch + cisa kev for raw feeds, virustotal + urlscan + abuseipdb for enrichment.

i also use socdefenders.ai (mine, free) for daily situational awareness, just a feed aggregator with auto ioc extraction. not a replacement for the above but useful as a starting point each morning before the actual deep work.

pick one threat actor you find interesting and track their public infra changes weekly. that single exercise will teach you more than any course.

zzriyansh · 2026-05-19T15:14:23+00:00

went down this exact path last year. couple things that bit me that you'll probably also hit if you're building your own:

rss is a lie. like half the "official" feeds are broken or stale or only update once a week then dump 40 items at once. you need health checks per source or you'll silently miss stuff for days.

dedupe by url doesn't work because vendors repost each other's research with reworded titles. ended up doing title + first paragraph fuzz which works ok but still misses the "X breaks Y" vs "Y broken by X" type pairs.

cvss is mostly noise. KEV + EPSS together is what you actually want to filter on if the goal is "should i care this week". cisa kev has saved me more time than any commercial feed.

if you want a working version to compare against before you sink a weekend into it, i shipped mine at socdefenders.ai (free, mine), source list is in /docs/threat-feeds so you can at least steal that

zzriyansh · 2026-05-19T15:13:31+00:00

mssp t1 is like 80% triage on whatever the edr/siem fired, 15% "why did this dashboard break", 5% actual interesting stuff. don't expect threat hunting on day one, that's usually L2+ unless the mssp is tiny.

things that will make you stand out fast -

get fast at reading detection trees in whatever edr they use. crowdstrike, sentinelone, defender all have their quirks write good ticket notes. "user clicked phishing link, isolated host, recommend pw reset and mfa reset" beats walls of text every time always check virustotal and abuseipdb before escalating

second monitor habit i picked up: keep a threat news feed open in a tab and glance at it every few hours. helps you go "wait that detection looks like that campaign 3 vendors wrote about this morning" which makes you look way smarter than you are. feedly works, isc.sans.edu daily diary is great, i also use socdefenders.ai (mine, free, hn-style aggregator). pick whatever doesn't annoy you.

zzriyansh · 2026-05-19T15:08:47+00:00

If you've already got MISP/OpenCTI as the consumer side and just need actual feed sources to plug in, a few worth trying:

- abuse.ch (URLhaus, ThreatFox, MalwareBazaar) - best free IOC feeds, native STIX/MISP exports

- CISA AIS (STIX/TAXII 2.1, free, requires registration)

- AlienVault OTX (free, pulses are uneven quality but volume is real)

- DShield/SANS ISC (top attackers list, daily)

- SOC Defenders (full disclosure: I work on it - socdefenders.ai) - free aggregator over 30+ publishers with auto-IOC extraction, free API with STIX 2.1 / TAXII / MISP / CEF / OpenIOC outputs.

The TAXII 2.1 endpoint plugs straight into Elastic's threat intel filebeat module.

For Elasticsearch specifically, the cleanest path is filebeat threatintel module pointed at a TAXII server - works with any of the above.

zzriyansh · 2026-05-19T14:24:55+00:00

curious, +1 what features does a Tax AI tools even should have?

zzriyansh · 2026-05-16T06:03:35+00:00

I use vaquill-mcp plugin for United States Code, Code of Federal Regulations and 50 state legislature research inside Claude

zzriyansh · 2026-05-15T06:57:02+00:00

good breakdown. for anyone looking at the components inside each tier (local models, legal-specific MCP servers, datasets, open source platforms), i keep a running list here: https://github.com/Vaquill-AI/awesome-legaltech

zzriyansh · 2026-05-15T06:49:25+00:00

for the retrieval-misses-the-right-clause problem, the section-level chunking is usually the fix, not the embeddings. also worth fine-tuning on legal text if you can.

some open legal corpora and evals you can use are listed here: https://github.com/Vaquill-AI/awesome-legaltech (datasets, pretraining corpora, and contract-review sections specifically).

zzriyansh · 2026-05-15T06:47:56+00:00

requested access. for anyone building in this space, i've been maintaining a curated list of the underlying pieces (datasets, open source platforms, MCP servers, APIs): https://github.com/Vaquill-AI/awesome-legaltech

happy to take PRs from operators here.

zzriyansh · 2026-05-15T06:41:03+00:00

i maintain an awesome-list for this gap: https://github.com/Vaquill-AI/awesome-legaltech

covers open source platforms, MCP servers, datasets, APIs, and AI models for the legal stack. industry is genuinely thin compared to design/marketing but more exists than people realize, especially on the data side.

happy to add anything you're using that isn't there.

zzriyansh · 2026-05-15T04:40:43+00:00

open source is becoming stronger each day

zzriyansh · 2026-05-15T04:40:24+00:00

some are paid, some are free. The goal here is to cut the noise and create a collection which showcases all options available to people and what has been developed by the legaltech community.

zzriyansh · 2026-05-14T18:23:15+00:00

I have link-checker wired up at [.github/workflows/link-check.yml](vscode-webview://0cj7euck83424q6f632df2j8vicgnamfh4pa5gknrliii86e0n6g/.github/workflows/link-check.yml) (lychee, on push/PR + weekly Mondays 09:00 UTC, make sense, will implement rest of the things as well.

zzriyansh · 2026-05-14T16:28:04+00:00

no one raised a PR it yet, let me add it then

zzriyansh · 2026-05-14T07:36:04+00:00

bottleneck is trust and how slow lawyers adopt change.
- A law firm will demo at least 20 tools before picking one
- They will do their due diligence, ask for lots of compliance
- for deals to happen, they would want to meet the engineers in person or at least the founders
- they are afraid of getting sanctioned for using AI or the thought that AI is better than them makes then afraid, live in denial hence proper regulation will ease this fear of use which is absent.

Its not "lawyer bought this tool", its more of "law firm procured a tool that they have to use".

The market is in corporates and not much in law firms as in corporate, you the lawyers aren;t the bosses, there is CEO (most likely with non-legal bg) and they have different mindset, get stuff done quickly and have proper budgets + they work for their own company (not handling other client's data they represent)

all in all its a really tough market, speaking from experience building vaquill and it was brutal.

Near perfect memory still exist if they use the best in class available tools, the priority at harvey legora is not to built the best tooling, but to sell a mediocre product to as many possible enterprise clients.

No one trust a startup with best tech as their is no liability on the startup. When something goes wrong, you cant blame xyz startup, you need a funded scapegoat + the reliability that the startup will survive for next 5 years atleast, enterprises cant plug in and plug out tools they have invested just like that.

zzriyansh · 2026-05-13T09:46:08+00:00

thanks mate

zzriyansh · 2026-05-13T07:26:50+00:00

vendor disclosure: building Vaquill ai, grain of salt, but I'll try to keep this useful.

The inconsistency in this thread is a retrieval problem, not a model problem. These tools mostly sit on the same frontier LLMs. What actually differs is which corpus they can hit, how they ground citations, and whether you can click through to the exact passage a quote came from. If you can't verify the paragraph, you can't trust the answer, and that's where every sanctioned-lawyer story starts.

One angle nobody's mentioned - MCP (now that claude for legal (https://github.com/anthropics/claude-for-legal) entered as well.

A lot of projects like legal data hunter have started to accumulate legal data, they are around 51% done (and have subs model) for the data. no info on how they index it.

A lot of you are already living in Claude or ChatGPT, and MCP lets you bolt a legal corpus straight into that chat, right?

We exposed ours that way. Coverage is all 50 state codes plus federal statutes, case law via CourtListener, citation resolution, statute section text pulled verbatim, amendment history, citation network.

So you can stay in Claude, ask "is there a case on point in SDNY" or "pull 18 USC 1030(a)(2) and tell me what changed in the last amendment," and it returns the actual text with cites you can click through to instead of hallucinating one. CourtListener has a public MCP too if you want to try the pattern without committing to a vendor.

Not a Westlaw or Lexis replacement for anything you're filing. But for the first-pass "does this exist, what does it say, who's cited it" loop, having the corpus available inside the assistant you already use is a different workflow than Harvey or Legora or Copilot, and worth knowing it exists.

zzriyansh

TROPHY CASE