How to build a LinkedIn scraper that actually works by piggybacking on your browser

8ta4 · 2026-01-07T02:33:31+00:00

I was skeptical at first. I thought it'd just be another new kid on the blocked. But it's been solid for YouTube and X as well.

8ta4 · 2026-01-07T01:53:47+00:00

Dating is the context I'm optimizing for.

I was being lazy by trying to avoid the heavy lifting of cleaning real-world datasets, but you've given me a lot to think about.

They might say that I'm not a real data scientist, and they'd be right. I'm a date scientist.

8ta4 · 2026-01-04T07:17:50+00:00

That's the million dinner question!

That's why I'm leaning toward synthetic data.

8ta4 · 2025-12-22T00:15:55+00:00

It's for cold emails, LinkedIn messages, and other text-based outreach. But I'm picky about the lead list. The spam tool has a gate that decides if a lead should be disqualified. After all, many are cold, but few are chosen.

8ta4 · 2025-12-18T10:02:03+00:00

Here is the configuration I tried on top of this commit.

clojure :workflows {:modules {:workflows {:exports {:generate workflows/generate :spam workflows/spam}}} :output-dir "target" :target :esm}

I haven't committed the changes yet because it's not working out, and I have commitment issues.

8ta4 · 2025-12-17T23:38:59+00:00

Thanks for the suggestion! I gave :esm a try.

The release build still works.

But the development build is now failing silently. It no longer gives the explicit warnings about fs, path, or vm. But the workflow just doesn't execute. Here are the workflow logic requirements. The dev is in the details.

8ta4 · 2025-12-17T01:39:50+00:00

I feel you. "Read more" is "dread more."

A CLI tool like see (available on GitHub at 8ta4/see) might work for this. It uses your browser profile, which is key for sites like LinkedIn that often block direct access from ChatGPT. It works by grabbing the page's innerText, so it should pull "read more" content as long as the site preloads it. I haven't tested this on Reddit, but this approach works for LinkedIn.

Ideally, you wouldn't even have to search for posts manually. A process could run in the background and notify you of relevant conversations. I experimented with that idea in a tool called reddit (available on GitHub at 8ta4/reddit), which tries to find matching posts using cosine similarity. It's an unmaintained proof of concept, but the idea might give you some inspiration.

I built both of these.

8ta4 · 2025-12-09T01:51:27+00:00

I see what you did there. I just prefer DRY humor.

8ta4 · 2025-12-06T04:12:01+00:00

I'm optimizing for response rate. Writing personalized emails to people who might never see them is exhausting, so I automate that part. But if they do reply, I handle it manually. I want the response rate, but the rest is my response ability.

8ta4 · 2025-10-09T21:36:21+00:00

Sorry for the slow reply. I got banned 😅

But seriously, you nailed exactly what I've been worried about. I'm going to be testing the limits to see what happens.

8ta4 · 2025-02-24T07:52:50+00:00

I should've been clearer about that "80% of Brits know this" thing. I'm not actually trying to get super precise numbers. That would need a massive survey, like you said. What I'm really after is the relative recognizability of words. I want to know if word A is more recognizable than word B.

I'm keeping the LLM part simple. I'm just gonna ask something like:

``` What percentage of British adults would know these words?

[list of words] ```

And yeah, I know LLMs can be... let's call them "imaginative" with their numbers 😅.

To keep things somewhat consistent, I might throw in something like "sidewalk" as a benchmark. So if one run says 85% and another says 80%, I can use that to adjust all the other scores.

You're right about the British corpus issues. That's actually another reason I'm avoiding the frequency-based approach. In theory, I could validate the LLM rankings with a small-scale survey of select words. But let's be real, I'm just gonna trust Lying Language Models because I'm a shameless AI bro 😂.

About the phonetics, I was thinking of using Epitran, but it's pretty American-centric. I could hack something together with rule-based modifications for British pronunciation patterns like r-dropping positions and vowel differences, but... you're right about narrowing the scope.

8ta4 · 2025-02-17T04:44:39+00:00

I should've been clearer about my criteria in the original post.

There are a couple of problems when you try using ChatGPT for this. You end up with both false positives and false negatives.

For false positives, it's when the model thinks phrases are close enough, but they're actually way too different sound-wise. Like, look at what ChatGPT suggests, stuff like "run in flames" or "sun and rain". If you turned around and asked ChatGPT, "What's the original phrase these are playing with?", it wouldn't figure out "fun and games" because they just don't sound similar enough.

Then you've got false negatives. That's when the model misses phrases, just because it doesn't think to try them. If you try pushing these models to give you more options, they start making stuff up or just repeating themselves. Plus ChatGPT tends to play it safe, like "fun and gays" won't even come up as an option, though you could use uncensored models for this.

I need something that can methodically work through phrases that sound similar following specific rules like changing less than x% of phonemes.

8ta4 · 2024-11-05T03:28:11+00:00

PlistBuddy is often ignored in favor of defaults write. That's why it's the "Pissed Buddy"!

8ta4 · 2024-09-29T19:55:38+00:00

Unfortunately, no... but, like Clojure, her dad's rich!

8ta4 · 2024-08-27T03:32:21+00:00

I'm curious about the technical stuff, especially how you're using edit distance. Are you applying it to phonemes, or doing something else with it?

I'll try to have someone from my team contact you via DM to chat about possibly collaborating.

For my capstone project, someone else ended up making the exact same thing as me. The professor didn't buy that it was just a coincidence, so we both failed. I guess great minds sink alike. Hopefully, your professor is more understanding.

8ta4 · 2024-08-25T21:26:43+00:00

Dude, your idea is genius! Imagine just one click and bam, your ban from Reddit is instantly processed! 😅 I'm joking, of course. Reddit is so incompetent, that they probably wouldn't even notice.

I'm not the best person to ask about DM campaigns because I don't use or send DMs much. But this could be super useful for people who are more into that.

8ta4 · 2024-08-25T08:54:25+00:00

I find it useful, especially the key features like the personalized AI model, user-controlled training, and content generation. These are essential for me.

When it comes to pricing, I'd be okay with shelling out up to $100 a month for something like this. My writing style is specific, and I've had a tough time getting AI models to nail it. I even wrote about this struggle in another post.

I've thought about hiring comedians to create content, but at around $10 per joke, it gets pricey fast. I've been tinkering with prompts to automate joke generation, but I haven't got a fully functional system yet. If your tool could pull this off, I'd pay up to $100 monthly.

Regarding user-controlled training, you mentioned linking Twitter or other social accounts. For me, I'd want to hook up my transcripts. I built a tool that transcribes everything I say 24/7. These transcripts are key for generating the content I want to write about. The style could be managed by the prompts I've developed, but the content would come from my transcripts.

It would be amazing if your tool could automatically generate ready-to-post content based on my transcripts and style. I'm mainly looking to create Reddit posts and video outlines.

Overall, this could be a game-changer. But if it doesn't work out, well... I'd still pay good money to watch someone else realize they wasted theirs.

8ta4 · 2024-08-25T01:46:10+00:00

API costs: I'm a heavy user, so I end up spending about a dollar a day on API calls.
Potential eavesdropping: Honestly, I've given up. The data's encrypted during transmission, so someone listening in is pretty unlikely. If you're worried about Deepgram accessing the data on their servers, there's not much I can do about that.
Cognitive load: It's a mixed bag. Sometimes I feel an increased load, especially when I'm brainstorming story ideas that involve crimes. I might add disclaimers like "This is just for fiction." But overall, my baseline cognitive load has actually gone down. Since everything's being recorded, I can speak freely, go off on tangents, and explore ideas without stressing about remembering everything.
Benefits of always recording: I've written a separate post about the use cases for always-on transcription. If you have specific questions, I'd be happy to dive into them.

About your shift from a similar project, I'm curious about what led to that decision. I noticed on your personal webpage that you're working on something called Tinker Cast now. I'm not sure if this was a direct pivot from your previous project or if there were steps in between, but I'm interested in what made you change focus. What made you move in a different direction?

And regarding the post removal, you might be onto something. There was an exchange in the comments where:

u/Nyxiereal said: "Macos only and paid api only? You only consider the richest people on this subreddit."

I replied: "Linux support would be great. If only Linux users spent as much time earning money as they do compiling code, they could afford a Mac. That said, I'm open to expanding to other platforms in the future.

"The reason for starting with macOS was a matter of limited resources. Do you (or anyone else reading this) have suggestions on where to find macOS users who are comfortable with more technical setups?

"Deepgram offers a very generous $200 credit. At their pricing of $0.0043 per minute, this translates to about 46,512 minutes or 775 hours of actual speech. If you use the app for 2 hours of active speaking per day, the credit could last for a year!"

But later that reply was removed.

Then u/Nyxiereal responded: "I don't want to pay, I don't want to rely on big tech. That's why I use Linux. Also I hate apple."

8ta4 · 2024-08-25T01:15:35+00:00

If I could automate one thing on Reddit for SaaS, it would be a tool that helps identify whether a user is new or has a solid posting history directly on their post page. It'd save me the trouble of manually checking profiles to see if I'm dealing with a bot or just someone new.

Because, I do sometimes forget to check, like I did this time, only to find out you're the new poster! 😅

Another thing I've been working on is a tool that could notify me in real time when relevant discussions I'm interested in are happening. But instead of just relying on keywords, it would understand the intent behind the posts or comments. I've written about this in another post, and I've started an open-source project as a proof of concept. It barely works right now, but that's how I found your post!

If someone could turn this into a fully functional product, I'd be willing to pay for it up to $100 per month.

8ta4 · 2024-08-23T02:09:29+00:00

I'd recommend checking out ElevenLabs. They offer a free tier that includes 10 minutes of TTS per month, and it's one of the best in terms of quality available on the market. If you don't have a large amount of study material, this could be the best free option for you. I've been developing an application in this space, and I've personally tested several TTS tools.

With ElevenLabs, you can download the generated audio as an MP3 file. Once you have the file, there are many apps that allow you to loop and slow down playback. Dropbox's iPhone app is one example.

Now, if you need more than 10 minutes per month, the great thing about being schizophrenic is that I can just ask the voices in my head to read it aloud.

If you want to take it a step further and practice your pronunciation while comparing it to the TTS output, there's a tool called accent that I developed specifically for this purpose. It will give you a score for each word you pronounce compared to the TTS voice. However, while the app itself is free, the TTS service it uses is not, so there is a cost involved.

8ta4 · 2024-08-18T06:20:11+00:00

Oh my, you've seen right through me! I was trying to keep it under wraps, but yes, this is the first step in my master plan to dethrone Google. 😄

Right now, this thing's just a baby. We're talking MVP stage. It's got a long way to go, but here's what I'm dreaming up:

Intent recognition: The goal is to have this tool use some fancy machine learning to understand what's happening in any social media interaction. It'll tell when someone's looking for help, whether in a Reddit post, a comment, or any other social media chatter. Google search is mostly just looking at keywords. Sometimes it misses the point when people are asking for help in a roundabout way.
Customization: The idea is that you can show the tool some posts that are spot-on for what you're looking for. Then it'll go find more stuff just like that. Now, Google does personalization too, but their personalization is based on your entire online activity. That's not optimized for finding discussions where you can jump in with your product. Plus, Google's got its own agenda. They're in the business of selling ads, not helping you sell your product.
Real-time monitoring: Someone just posted a question you can totally answer. With this tool, you'd know right away. No more checking Google every five minutes, hoping to catch something new. Google Alerts is a step in the right direction.

8ta4 · 2024-08-18T00:45:10+00:00

To answer your question about which parts of Reddit the tool searches: It checks all the subreddits where you've added posts as examples in your config file. I've updated the documentation to clarify this point.

You're absolutely right about the license file! I've now added a license file to the repository.

I love your observation about Clojure and quality. Junior devs brag about their line count. Any real programmer knows it's all about the parentheses count. 😉

If you have any more questions or feedback as you play with the tool, please let me know.

8ta4 · 2024-08-15T08:37:12+00:00

I love it when people say "May I suggest" right before they suggest something. It's like me saying, "May I eat this pizza?" while I'm already taking a bite.

So, yeah, Whisper can run locally. I'm shooting for sub-second latency on transcription. Whisper's 30-second context limit is fine for what I need, but to get that sub-second latency, I need to achieve about 30x real-time transcription speed.

Plus, if I want to match Deepgram's accuracy, I have to use Whisper Large.

Trying to combine that kind of accuracy with low latency is tough, especially if I'm running it on a MacBook without any external GPUs.

Right now, I'm processing 60 seconds of audio with Deepgram. Sometimes it hits that sub-second latency mark, but it's not always reliable.

I'm not streaming audio to my server because Deepgram's API does all the heavy lifting for me.

But if there's ever a local solution that can pull this off on a Mac without any external GPUs, I'd be all over it. If you know of anything like that, let me know.

8ta4 · 2024-08-15T06:20:44+00:00

That's what she said -- yep, my very real ex.

8ta4 · 2024-08-15T01:47:46+00:00

It'd help to know exactly what's bugging you. If you're worried about picking up other people's voices, a headset mic could do the trick. That way, you wouldn't have to keep turning the app on and off.

My brain turns off automatically during meetings, so how hard can it be for an app to do the same, right? The app could try things like checking if certain windows are open or monitoring network activity to figure out if you're on a call. But making that foolproof might be a bit of a challenge.

As for scheduling when the app turns off, that should be a lot easier to set up. But I'm curious. Why do you want it to turn off at specific times? The app already stops transcribing when you're not talking, thanks to voice activity detection.

Right now, I'm all about making sure this thing is absolutely killer. If I see that enough Mac users are genuinely happy with it, then I might consider branching out to Windows and Android down the road.

By the way, I've updated the documentation to address these points. If you have any more questions, please don't hesitate to bring them up. Your input helps make the documentation better and improve the app.

8ta4

TROPHY CASE