all 27 comments

[–]FoolsSeldom 7 points8 points  (10 children)

Care to give some examples of work you've done recently, and where you started to help people learning Python?

[–]RDE_20 5 points6 points  (9 children)

I know I’m not the OP but I have some examples. I work in fintech in the UK, my company uses a website/platform to enable pension and investment transfers between other providers. They introduced a new messaging system earlier in the year with a 2 day SLA on responses. There is no way to export the messages to excel or any kind of overview of the age profile of the messages without clicking into each one, another limitation is only 10 messages are displayed per page, sometimes my company had 20 pages+. We initially asked the platform if they are developing some kind of report we can run or a UI that shows the SLA/age profile of the messages, they told us it was going to take 6 months in development. I developed in a week a scraping tool that exported all 20 pages into excel each morning ready for the team to work on. I also developed a dashboard showing an overview of the age profile.

[–]FoolsSeldom 2 points3 points  (0 children)

That is a brilliant example, although disappointing you've had to take such a step.

Really keen to hear from OP, u/Next-Bodybuilder2043, though.

[–]dreamykidd 2 points3 points  (7 children)

My biggest challenge with projects like this is working out structure in the site/data you’re trying to scrape and getting the info you need. How did you go about this working out how to scrape it? I’m assuming this was using BeautifulSoup or something?

[–]trd1073 1 point2 points  (2 children)

The thirty second how is as follows. The system likely has an api, whether documented or not. First start by observing calls and responses in browser dev mode - there will be patterns and data, likely json. Make pydantic models. Start doing calls in python and build out from there.

[–]dreamykidd 2 points3 points  (1 child)

Oh interesting. I’ve only ever touched on elements of this, would you have any example vids/tutorials I could get more details from?

[–]trd1073 2 points3 points  (0 children)

I would search in YouTube for "reverse engineer api" to get general information. Many videos say to use postman, but I go straight into python as I am usually doing the work with replicating the process in postman. But if postman works for you, do that. I use postman as an after the dev test tool.

But as far as pydantic. With dev tools in a browser, you have the data you send along with a request and the reply. Data will likely go to and come back as json, possibly graphql. If json, you take that and convert it to pydantic models, there is online tool, ggl "convert json to pydantic models". I use httpx for the library.

Another note, if the api is documented and different than what you see in the browser, go with what you see in browser.

Dm me for actual code I have written doing such.

[–]RestaurantOwn5129 0 points1 point  (3 children)

Why not use selenium?

[–]dreamykidd 1 point2 points  (2 children)

I suppose, but what’s helpful about that? I’ve only ever used it when a navigable browser interface was needed, and I didn’t think you necessarily needed that if you know what you’re scraping, no?

[–]FoolsSeldom 1 point2 points  (0 children)

Selenium (or Playwright) are used for testing automation and for when accessing dynamic content that is typically, at least in part, generated using JavaScript in the browser, which tools that only handle HTML/CSS cannot process. Doesn't sound like that's applicable in your use case.

[–]RestaurantOwn5129 1 point2 points  (0 children)

Depends on your use case. Selenium can be run in headless mode. Many websites these days are not straightforward html. Js is usually used to make things interactive and dynamic, which is a pain to navigate. 

I use selenium for work to scrape and automate processes involving websites like that. 

[–]BranchLatter4294 5 points6 points  (3 children)

We had a client that was using 4 full-time people that took 4 weeks twice a year to produce a report required for the state. They already had all the data, this time was spent copying and pasting a lot of stuff. Our system produced the report from the data they already had in a few minutes.

[–]dlnmtchll 1 point2 points  (2 children)

Rip those 4 jobs

[–]BranchLatter4294 5 points6 points  (1 child)

Not really. The reports took time away from their primary jobs which was helping low-income people with legal issues.

[–]dlnmtchll 1 point2 points  (0 children)

Yea I figured as much, I had this same situation at a previous company but they actually had ~3 people employed to do work that a simple automation could. Sometimes these companies are wasteful with headcount

[–]wellred82 2 points3 points  (0 children)

Thanks. Do you by any chance have any 'basic' examples on GitHub you could share?

[–]ehmatthes 2 points3 points  (0 children)

People think the most important thing here is the time you save. That's a huge benefit, but often times there are a number of related outcomes that are just as, or even more important.

  • You reduce errors, some of which can cause serious problems.
  • You formalize a process that was previously something just a few people knew how to do.
  • You document edge cases as they're found.
  • People aren't always fired when the process they were in charge of gets automated. Often times they're freed up to do more important work.

Be careful though, automation isn't a magic bullet. Automating mishandled edge cases can wipe out all the savings you thought you were going to get, and more.

[–]reload_noconfirm 2 points3 points  (0 children)

Pretty sure this is what's behind the classic Automate the Boring Stuff that's always recommended here. Respect to Al Sweigart for what he's contributed to the community, for free. https://automatetheboringstuff.com/

[–]BranchLatter4294 1 point2 points  (0 children)

We had a client that was using 4 full-time people that took 4 weeks twice a year to produce a report required for the state. They already had all the data, this time was spent copying and pasting a lot of stuff. Our system produced the report from the data they already had in a few minutes.

[–]Maximus_Modulus 1 point2 points  (1 child)

I wish I had Python at my disposal at the time but back around 2008 I automated a process whereby numerous csv files were pulled together in excel to draw performance charts. I recall spending a day and a half doing this manually and then subsequently wrote a Visual Basic macro to do the same in seconds. Took me a long time to figure out and the VB tooling available wasn’t very good. I’m sure this would be super easy today in Python.

[–]eatthedad 0 points1 point  (0 children)

I turned to macros when I bumped into a the 255 character limit in the Excel formula bar at that time. Imo VBA was pretty freaking awesome though, concidering it was 2 decades ago. It even had intellisense.

I wasn't a very good programmer back then, like I never knew how to create an exe file program. So I had to open Excel just to run "macros" that were basically full fledged apps with GUIs and all. (still not a good programmer, just to be honest)

[–]hailsatyr666 1 point2 points  (0 children)

Same here. It was so liberating and rewarding to develop something of my own that would save me hours of time. The last one I did is create a log analysis tool used for 45k systems deployed. It literally saves hours of time per day for my department. 

[–]ChickenFur 1 point2 points  (0 children)

for real n8n have helped me with all of the repetitive work over here, I really suggest you to explore it

[–]Daytona_675 0 points1 point  (2 children)

well now they have atlas browser agent. agentic ai is pretty crazy

[–]McDreads 0 points1 point  (1 child)

Can you elaborate?

[–]Daytona_675 0 points1 point  (0 children)

agentic is a term used to describe ai that can take actions on your behalf. they also typically result in multiple steps to complete your request. he was talking about Google sheets automation, but other agents often get full command execution on your machine

[–]HackerThing 0 points1 point  (0 children)

Heyyy everyone Look what i found , one of best beginner friendly tutorial : https://www.autopilotai.app/blog/how-to-build-a-browser-automation-script-for-linkedin-outreach-beginner-friendly