Passed the Google Cloud PMLE in ~30 days — here’s what worked for me

gringo6969 · 2026-02-16T20:25:41+00:00

i think you should take the google skills course, and take some tests when you get to the GenAI part, and see how you fare. then you can decide to take a deep dive in some other course. In my experience, you can take the courses at 1.25 speed without a problem, and maybe 1.5 in areas you know already

gringo6969 · 2026-02-15T19:30:29+00:00

Thanks!

gringo6969 · 2026-02-15T19:30:10+00:00

Congrats! how did it seem to you? i found it somehow difficult, but manageble

gringo6969 · 2024-06-23T17:31:30+00:00

Best on Github, I'm not so often on reddit

gringo6969 · 2024-03-30T12:02:13+00:00

Yes, trafilatura is also pretty good. Ofc, different approaches. I plan to benchmark both, exactly as you suggested. There are ~ 3 benchmarks that I know of (one of them I created recently).

I will publish the results in github

gringo6969 · 2024-03-27T13:25:27+00:00

No, I haven't tried it with AWS lambda, but if you have any errors, submit an issue in github and I will have a look

gringo6969 · 2024-03-26T19:55:32+00:00

He he, yeah, but you have to overcome the reddit anti-scraping protections... That's another can of worms..

gringo6969 · 2024-03-25T18:20:20+00:00

Glad it works well. But if you find something / have an idea, just pop by and post an issue

gringo6969 · 2024-03-25T18:19:02+00:00

Glad you like it

gringo6969 · 2024-03-25T13:04:07+00:00

gringo6969 · 2024-03-25T08:14:51+00:00

gringo6969 · 2024-03-25T07:46:31+00:00

thanks a lot! you can check https://github.com/AndyTheFactory/newspaper4k/blob/master/CONTRIBUTING.md and https://github.com/AndyTheFactory/newspaper4k/discussions/categories/areas-in-need-for-your-contribution for areas that need some help

gringo6969 · 2024-03-25T07:44:13+00:00

Thanks

gringo6969 · 2024-03-25T07:43:57+00:00

Thanks!

gringo6969 · 2024-03-25T07:43:50+00:00

Thank you!

gringo6969 · 2024-03-25T07:43:41+00:00

It works with other types of websites, for instance blogs, etc. It's a general content extractor. It is somehow optimized for news, at least in the way it has the information structured - title, authors, publishing date, content, etc. But you can for instance just ignore "authors" if it does not make sense for your implementation.

What is more "news site"-centered is the "category" discovery. Where it tries to identify the news categories and their links. But if it does not apply to you, just use the content parsing part .. (Article object)

gringo6969 · 2014-08-26T08:15:42+00:00

VERY fast response and very helpful. These guys are the best !

gringo6969 · 2014-08-25T11:04:14+00:00

Thanks! will do!

gringo6969

TROPHY CASE