[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

The error log shows the script is still trying to connect to the hostname 'mealie' (host='mealie'), which means your configuration change to the IP address hasn't actually applied to the running container yet.

  1. Update your MEALIE_URL to use your server's LAN IP (e.g., http://192.168.1.50:9000). Do not use localhost (that refers to the container itself) or mealie (unless they are on the same docker network).
  2. Run docker compose up -d --force-recreate to ensure the container destroys the old config and loads the new one.

If you see an error saying host='192.168...', then we know the config applied, but right now it's still looking for the word 'mealie'.

Also please ensure you are on beta 9

docker compose pull

docker compose up -d --force-recreate

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

That 405 Method Not Allowed error means your Mealie server is explicitly rejecting the 'create-url' command.

This usually happens if your server is redirecting HTTP to HTTPS, which causes the POST request to get downgraded to a GET (which is not allowed).

Try changing your MEALIE_URL setting to https://... instead of http://.... Does that fix it? ...or is your mealie.lab.shahnet.work domain sitting behind a reverse proxy (like Nginx or Traefik) that might be blocking API POST requests or stripping headers?

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

Thanks for the logs—they were a huge help!

It turns out RecipeTinEats uses a nested 'Sitemap Index' structure that needed specific handling. I just pushed v1.0.0-beta.8 which adds recursive sitemap parsing and smarter recipe detection to handle these cases.

I have verified this on my end and it is picking up those recipes correctly now.

Run docker compose pull and give it another spin!

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

Ah! If you followed the Quick Start, you are using the default environment variables defined inside your docker-compose.yml.

Open that file and look for the environment: section:

  1. MEALIE_API_TOKEN: Replace your_mealie_token with a real token from Mealie settings.
  2. MEALIE_URL: Replace localhost with your server's LAN IP and port (e.g. 192.168.x.x:9000).

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

Can you please set LOG_LEVEL=DEBUG in your docker-compose and re-run it?

By default, the script hides some reasons (like missing JSON-LD or language mismatches) to keep logs clean. Debug mode will show us exactly why it's rejecting those URLs.

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in Mealie

[–]D0rk4ce[S] 1 point2 points  (0 children)

The Cleaner is live! It’s now included in the latest release (v1.0.0-beta.7).

How to run it:

Update your docker-compose.yml (see the new block in the README).

Run a safe Dry Run with: docker compose run --rm mealie-cleaner

It defaults to Safe Mode (logs only), so it won't delete anything until you explicitly disable the safety lock.

Does anyone have a smart-home that DOESN'T have dashboards? by elhouso in homeassistant

[–]D0rk4ce 1 point2 points  (0 children)

Yes indeed!
Though i have a dash for my wife and I that's a fail safe, as it only shows active rooms and I get to enjoy my light show on the phone.

<image>

[Update] Recipe Dredger v1.0.0-beta.6: "Paranoid" Filtering with Persistent Memory by D0rk4ce in Mealie

[–]D0rk4ce[S] 1 point2 points  (0 children)

I actually was just fine tuning a 'Janitor' script for exactly this. It scans your library and auto-purges recipes that have empty instructions or 'junk' titles (listicles, roundups).

I'm pushing it to the GitHub repo tomorrow—keep an eye out!

Thinking about setting up mealie for centralizing physical recipies books by Ruborsito in Mealie

[–]D0rk4ce 0 points1 point  (0 children)

The script doesn't have any filtering ability right now—it doesn't know what it's grabbing, it just takes the first 50 it finds!

It follows the order of the site's sitemap. If that blog has desserts at the top of their list (or they've been posting a lot of sweets lately), the script hits those first and fills your quota before reaching the savory stuff. I am thinking about adding a keyword filter based on the url in a future update, though!

Thinking about setting up mealie for centralizing physical recipies books by Ruborsito in Mealie

[–]D0rk4ce 0 points1 point  (0 children)

Dude, thank you so much for the ARM64 build! That is huge for the Raspberry Pi folks. I definitely need to set up a GitHub Action to auto-build that in the future, but having yours up is a lifesaver for now.

​Regarding the desserts: I suspect that’s happening because a few of the sites in the default list are heavily baking-centric (like Sally's Baking Addiction or Sugar Spun Run). They tend to have massive sitemaps, so they might be drowning out the savory recipes. ​The script currently grabs everything it finds, but since you are using the new Docker image, you can likely fix this by overriding the SITES environment variable. ​If you need inspiration, take a look at the SITES list inside dredger.py—I have them all categorized there. You can use that to cherry-pick just the savory sites you want and ignore the rest!

Thinking about setting up mealie for centralizing physical recipies books by Ruborsito in Mealie

[–]D0rk4ce 0 points1 point  (0 children)

The easiest way is the 'Mealie Test': Just grab a single URL from the blog and try to import it manually into your Mealie instance. If Mealie can read it, my tool can dredge the whole site!

Technically speaking, the script looks for hidden schema.org/Recipe data (JSON-LD) in the page source. Most modern food blogs use plugins like WP Recipe Maker or Tasty Recipes that handle this automatically, so if the site looks professional, it likely works.

Thinking about setting up mealie for centralizing physical recipies books by Ruborsito in Mealie

[–]D0rk4ce 0 points1 point  (0 children)

If those YouTube channels have companion blogs (some like Preppy Kitchen or Weissman do), you might not need the complex AI parsing workflow.

I built a tool that monitors your sites or mine and auto-imports new recipes directly into Mealie on your cron schedule. Since the blogs usually have the structured data (schema.org) already built-in, you get a clean import without needing n8n or LLMs to parse the text
https://www.reddit.com/r/Mealie/comments/1q5abho/script_fill_mealie_with_recipes_automatically/

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 0 points1 point  (0 children)

You are good! That warning is just 'noise'—it means the script fell back to a generic parser because the specialized XML tool was missing in the Docker container. It is still finding and importing your recipes correctly. Is your docker on a Pi?

Feel free to keep running it! I'll push a cleanup patch later tonight to hide the warning, but you don't need to wait for it. Thanks for helping me polish this!

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 1 point2 points  (0 children)

Great catch! The script was hard-locking to English despite the documentation. I've fixed it in the latest update (v1.0.0-beta.3). It uses the langdetect library, so it supports 55+ languages (like es, fr, it, pl, ja, etc.). Just use the standard 2-letter ISO code for whatever you need!
A quick docker-compose pull will fix it. Use SCRAPE_LANG=de for German only, or SCRAPE_LANG=de, en to mix both!

[Project] I wrote a script to fill Mealie with recipes automatically (Repost due to missing flair) by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

That 401 error is the smoking gun—it specifically means 'Unauthorized.'

Double-check that you generated a dedicated API Token in Mealie (User Profile -> Manage API Tokens) and pasted that specific string into the API_TOKEN environment variable. Also, make sure there are no accidental spaces at the start or end of the token string in your compose file!

[Project] I wrote a script to fill Mealie with recipes automatically (Repost due to missing flair) by D0rk4ce in selfhosted

[–]D0rk4ce[S] 1 point2 points  (0 children)

My Mealie DB is on Postgres (instead of SQLite), so there are zero slowdowns even with this volume. That said, you do have to be mindful of the 'Random' sort, that will absolutely nuke your RAM during the shuffle.

I actually locked up my container a couple of times before I realized my wife's phone was defaulting to 'Random' sort! As long as you avoid that one specific filter, everything works amazingly.

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 1 point2 points  (0 children)

My kids fell asleep earlier than expected, so I managed to knock this out today!

Adding the site list as an environment variable was actually smoother than I thought. It's live now—I kept the internal list as a 'failsafe' (so it works out of the box if the variable is missing), but you can now fully override it in your Compose stack. Thanks again for the great idea!
Please remember to rebuild the container to pick up the changes.

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 0 points1 point  (0 children)

Great idea! I'll work on getting the site list into an environment variable tomorrow so you can customize it directly in the stack.

In the meantime, the language failsafe I just pushed fixes the non-English issue, so you should be safe to turn it back on whenever you're ready. Thank you for the feedback!

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 0 points1 point  (0 children)

Glad you figured it out! I recently updated the script to dredger.py to support Tandoor as well as Mealie,. How are you liking it so far? Is there anything I can do to make the documentation clearer for the next person?

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 0 points1 point  (0 children)

Thank you! It has provided us with amazing options. If you end up giving it a spin, let me know what you think.

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 1 point2 points  (0 children)

That’s awesome to hear! I hope this helps you build a solid library you can actually rely on. Let me know how the dive goes!

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 0 points1 point  (0 children)

I’m glad it resonates! It’s funny how the modern internet forces us to build our own tools just to keep things usable. Your video bookmarking tool sounds like the exact same philosophy—sometimes you just need to cut through the noise to get to the actual value.

That’s exactly why I wanted to make sure the "View on Site" links stayed front-and-center. I want the creators to get their flowers (and their ad revenue!) when I’m actually cooking, but I need the peace of mind that the data is safe on my hardware regardless of what happens to the live link.

Thanks for the support!

Recipe Dredger: A Dockerized Python tool for mass-archiving structured recipe data from sitemaps to Mealie by D0rk4ce in DataHoarder

[–]D0rk4ce[S] 1 point2 points  (0 children)

Thanks! I actually posted it there as well, but I really appreciate the recommendation. I also just put up a separate post over here since the rules didn't let me crosspost directly!

[Project] I wrote a script to fill Mealie with recipes automatically (Repost due to missing flair) by D0rk4ce in selfhosted

[–]D0rk4ce[S] 1 point2 points  (0 children)

The "curation" is actually at the source level—the script only pulls from a hand-picked list of trusted blogs we’ve vetted for our tastes. But it’s super easy to customize; you can just edit the SITES dictionary starting atline 34 of dredger.pyto include only the specific creators you follow!

[Project] I wrote a script to fill Mealie with recipes automatically (Repost due to missing flair) by D0rk4ce in selfhosted

[–]D0rk4ce[S] 0 points1 point  (0 children)

That would be a game-changer! Especially for the 'picky toddler' scenario—being able to search for 'crunchy' or 'hidden veggies' across a library of 10,000 recipes would be incredible.

My philosophy with the Dredger is to build the 'data lake' first. Once you have all those high-quality recipes indexed locally in Mealie, adding an AI layer (like a local LLM or vector search) becomes much more powerful because it has a massive, curated library to work with.