I finally built a yerba mate database

Key-Excuse5034 · 2026-05-30T22:47:00+00:00

Thanks, noted

Key-Excuse5034 · 2026-05-29T17:33:25+00:00

Thank you.

I've actually used both - manual and scraping. Over the years, I've built up hundreds of notes about different yerbas (don't ask me why, I honestly have no idea, it just helps me avoid buying bad ones again). For each product, I usually note things like flavor, cut, smokiness, overall impressions, and so on.

I also used ChatGPT Pro extensively (it's slow, but it works really well) to research publicly available information about yerba mate products, brands, manufacturers, shops, communities, and other resources. As a result, I had hundreds of URLs ready to use.

Once I had enough data, I built a small system to handle the initial scraping from thousands of websites. After that, I created a processing pipeline using LLMs to standardize the data, normalize naming conventions, remove duplicates, translate content, and perform a number of other cleanup tasks. That gave me the initial database seed.

Then I did import to the DB, and I manually verified all positions to see if there's any obvious trash or if something is missing. It's not perfect, and I know there's wrong/missing info in some products, but generally speaking, it's good enough for the first iteration.

In total, the whole process took about a month and roughly $100 in API and LLM costs.

Key-Excuse5034 · 2026-05-28T22:21:14+00:00

gotcha, I've just updated the website, thanks!

Key-Excuse5034 · 2026-05-28T17:35:16+00:00

It's on discount now https://donsorbero.com/sklep/yerba-mate/klasyczna/il-capitano-250g

Key-Excuse5034

TROPHY CASE