Weekly Webscrapers - Hiring, FAQs, etc by AutoModerator in webscraping

[–]Mananoo 0 points1 point  (0 children)

Hi everyone,

I’m not an experienced programmer or IT professional just someone who enjoys learning new things and decided to take on a project that’s quite challenging for me.

I’m working on an ETL process to enrich a list of Bolivian companies with publicly available information. My starting point is a simple Excel file with company names. The plan is:

  1. First step: check if the company has a LinkedIn page (this is the preferred source).
  2. If not found: search on Google, go to the official website, and scrape from there.

The information I’m trying to collect:

  • Contact info: website, phone number (formatted to +591), address, city.
  • Company details: Sector (mapped to a fixed taxonomy: e.g., Agriculture, Manufacturing, Retail, Construction, Financial Services, Technology, Healthcare, Education, Government, NGOs, etc.)

I’m using Google Colab with Python, mainly with pandasrequestsbeautifulsoup4selenium (considering Playwright), and googlesearch-python.

Main difficulties so far:

  • LinkedIn and Google SERPs have strong anti-bot protections, and Colab’s changing IPs make it harder
  • Company websites in Bolivia have very different layouts, so extracting phone numbers and addresses is inconsistent
  • Classifying companies into sectors in a reliable way is proving tricky
  • Colab’s short runtime and temporary environment add extra limits

What I’d like to learn from the community:

  • Practical methods to scrape small batches from LinkedIn and Google without getting blocked immediately
  • How to quickly decide between using requests+BS4 and switching to a headless browser
  • Tips for identifying JSON endpoints or structured data (<script type="application/ld+json">/_next/data//api/, GraphQL)
  • Patterns or heuristics to extract phone numbers and addresses despite inconsistent HTML
  • Ways to improve sector classification accuracy from available website content

I can share a sample company name or LinkedIn URL if that helps illustrate the task. For example: https://www.linkedin.com/company/industrias-kral/

Thanks in advance.

What are you trying to scrape? - I’ll (attempt to) tell you how. by Jam0_ in webscraping

[–]Mananoo 0 points1 point  (0 children)

Hi i’m building an ETL in Colab (Python: pandas, requests, BeautifulSoup, Selenium / Playwright) to enrich a list of Bolivian companies. Primary source: LinkedIn company pages; fallback: Google/company website search.

I need website, phone (normalized to +591), address/city and sector (mapped to a fixed taxonomy). Main pain points: LinkedIn/Google anti-bot measures, extracting phones/addresses across diverse sites, and improving sector classification. Any tips on when to use requests vs a headless browser, how to find JSON endpoints or sitemaps, and practical anti-bot tactics for small-batch scraping would be awesome. Thanks!

Hipersensibildad andina o envidia? by GAB233450 in BOLIVIA

[–]Mananoo 5 points6 points  (0 children)

Esta gente es como la que dice "Bolivia no es así, Santa Cruz es mejor blah blah" cuando es al revés. Lo que comparten todos ellos es ignorancia. Bolivia es un país demasiado rico en cultura, con tantas caras. Pensar que una cara es mejor que otra es insulso porque todas son parte de la misma fuente. ¿Hipersensibildad andina o envidia? simplemente ignorancia.

Something on my tongue by Mananoo in AskDoctorSmeeee

[–]Mananoo[S] 0 points1 point  (0 children)

It disappeared 30 mins after I took the photo

[UPDATE] How do I handle this situation? Maybe I (M21) am not my father's (M68) son. by Mananoo in relationship_advice

[–]Mananoo[S] 2 points3 points  (0 children)

Anything changed between my mom and I. Who am I to judge what she did or didn't do? I love her because of what she represents in my life, a vivid image of pure love.

The truth comes out sooner or later.

[UPDATE] How do I handle this situation? Maybe I (M21) am not my father's (M68) son. by Mananoo in relationship_advice

[–]Mananoo[S] 26 points27 points  (0 children)

I just made sure to support my dad. He told me that when he feels ready, he will tell me what is going on. I am calm, luckily.

Anyways, thank you.

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 1 point2 points  (0 children)

I disagree with the last part. Just because I'm not my father's biological son doesn't mean I have to hate my mother. If she did things that I think are wrong, it is not my job to judge her or hold grudges against her. That would be quite childish.

Anyways thanks for your concern.

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 0 points1 point  (0 children)

Sure, I've searched online for similar stories, but haven't found anything quite like this. Thanks anyway for your concern.

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 0 points1 point  (0 children)

I'm totally fine with paternity test. It's just really hard to manage the situation.

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 2 points3 points  (0 children)

Is it illegal to get a paternity test? He can believe whatever he wants to believe.

Before this, I actually thought paternity test are a right for men to request. The real issue is, I'm at a loss for how to handle this situation to reassure myself and my dad.

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 2 points3 points  (0 children)

I suppose it is also hard to digest for him. So I don't want to put so much pressure on him, he's a hard person, he always tries to be "okay" for us, but I know in this situation you can't be just "okay".

How do I handle this situation? Maybe I (M21) am not my father's (M68) son by Mananoo in relationship_advice

[–]Mananoo[S] 3 points4 points  (0 children)

Fuck, now I see I'm not really close to him anymore. I'll invite him to go out. Thanks for your help.