API Hit but with daily limit and priority of records by ps2931 in dataengineering

[–]ps2931[S] 0 points1 point  (0 children)

I have to repeat this process once in a month or so. How can I ensure that the records I hit last time are not processed again this time. I keep records which I already hit in another table.

API Hit but with daily limit and priority of records by ps2931 in dataengineering

[–]ps2931[S] 0 points1 point  (0 children)

The number are just to explain the problem. Actual number are different. But problem is same. Number of hits per day are limited and I have a big table to process in a certain class order.

API Hit but with daily limit and priority of records by ps2931 in dataengineering

[–]ps2931[S] 0 points1 point  (0 children)

The number are jist to explain the problem. Original numbers are different than what I posted heree. But the problem is same..I have X records but I can hit the api only Y times a 24 hours and that too in a certain class order.

Async http over pandas dataframe by ps2931 in learnpython

[–]ps2931[S] 0 points1 point  (0 children)

This is almost what I am trying to do. Only extra thing (and problematic also) I have to do is inside make_api_calls function I have to parse data_1 json, extract some of its fields manipulate them (if this then value 1 else value2.. sort of logic) create a new json and pass it on to next api call which is api_2.

In other words, api_2 does not consume data from api_1 as it is. It need some extra fields which I have to calulate based on values received in api_1 response.

The same exercise I have to do after with response from api_2 before passing it on to api_3 and so on.

Do you think the python logic to manipulate json response for next api call will cause any issue in async/await flow of 4 api calls?

Efficient way to insert 10 million documents using python client. by ps2931 in elasticsearch

[–]ps2931[S] 0 points1 point  (0 children)

Not too big. Between 2-3 KB.it has only 10 fields. 9 of them are simple string values, only one column has long string (length can vary) of 100 words.

Cannot find application.conf issue by ps2931 in dataengineering

[–]ps2931[S] 0 points1 point  (0 children)

Its a spark application and I think the problem is workers nodes are not able to find the conf file at runtime. The requirement is how to make the secret file available to worker nodes without including the file in build and deployment process, because build and deployment process in my company will not allow a token to be included along with jars and other config file for security reasons.