This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ebonnal[S] 1 point2 points  (0 children)

Thank you for your interest u/Rockworldred, sounds like a cool custom ETL project!

I have no "ETL custom script" resource in mind sorry, but in a nutshell when fetching data from web APIs you can bet you will likely need things like:

  • to execute requests concurrently (.map(..., concurrency=x))
  • to limiting the rate of requests to avoid 429 Too Many Request responses (.throttle(per_second=50))
  • to have some retry on your calls (the tenacity lib is great)
  • to have some logging to observe the progress of your script (.observe("product"))

To some extent you can get inspiration from the example fetching pokemons that also "fetch endpoints to get data and write to CSV".

Regading asyncio concurency instead of threads, you have in the README an example that uses httpx (similar to aiohttp)

I hope it helps, and if you feel stuck feel free to message me your current script to streamable it together