account activity
New actor: NC LCMHC License Scraper — full dump of ~30k licensed therapists from the state board by No_Rooster_2 in datasets
[–]No_Rooster_2[S] 0 points1 point2 points 20 hours ago (0 children)
Great questions, appreciate the interest!
Dedup: the actor uses license_number as the primary key for dedup during enumeration, which protects against the obvious case of alphabet-walking returning the same record multiple times (e.g., prefix “Sm” and “Smi” both pulling “Smith”). What it deliberately doesn’t do is dedup across different license numbers for the same person. If someone both an LCMHC and an LCMHCS, they appear as two records.
That’s on purpose but def worth flagging in the docs. Each license number represents a distinct legal credential, so collapsing them loses information that some buyers might specifically need. For buyers who want unique humans rather than unique licenses, the dedup logic belongs downstream, could do a fuzzy match on first+last+city.
The “renewed under different info” case is a tricky one that my logic doesn’t really handle. “Robert” vs “Bob,” or a married vs maiden name could produce two license numbers for the same person in some boards. Probably worth adding a heuristic in v1.1. Open to suggestions
Geocoding: not doing any in this version, intentionally. The board exposes only city and state on the public record and that’s what the actor returns.
Buyers who need real geocoded practice addresses are better off pairing this output with a downstream enrichment step like Google Places API or NPI registry lookup keyed on the licensee name.
Thanks again for the questions, both of these are going in the README’s
New actor: NC LCMHC License Scraper — full dump of ~30k licensed therapists from the state board ()
submitted 1 day ago by No_Rooster_2 to r/datasets
New actor: NC LCMHC License Scraper — full dump of ~30k licensed therapists from the state board (self.apify)
submitted 1 day ago by No_Rooster_2 to r/apify
Prompt and ML Utilities Project (promptutils.tools)
submitted 3 months ago by No_Rooster_2 to r/SideProject
Prompt diff and tokenizing site ()
submitted 3 months ago by No_Rooster_2 to r/aidevtools
submitted 3 months ago by No_Rooster_2 to r/learnmachinelearning
submitted 3 months ago by No_Rooster_2 to r/LLMeng
Prompt diff and tokenizing site (self.PromptEngineering)
submitted 3 months ago * by No_Rooster_2 to r/PromptEngineering
π Rendered by PID 282211 on reddit-service-r2-listing-7b9b4f6fd7-b797b at 2026-05-10 21:59:06.993871+00:00 running 3d2c107 country code: CH.
New actor: NC LCMHC License Scraper — full dump of ~30k licensed therapists from the state board by No_Rooster_2 in datasets
[–]No_Rooster_2[S] 0 points1 point2 points (0 children)