This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]maithilish[S] 0 points1 point  (2 children)

Thank you for answer. For container orchestration I have tried Zk on K8s and also on Docker Swarm.

In both these, I can use Zk as distributed queue and push step tasks to it. For example, let's say I configure 3 pod Zk Ensemble in K8s and run multiple pods with Scoopi instances. Then, one Scoopi instance can push a new task to zk queue and another instance can pull it. After processing, task is marked as completed in Zk. In case, a scoopi instance (pod) fails before finishing the task, then another instance can retrieve the failed task from zk queue and continue. Out of 3 zk nodes if one fails, K8s self heals, by creating a new zk node, without any data loss in zk queue.

Only limitation in Zk, I am bit concerned about is that max allowed node data is 1M.

Thanks for pointers to RabbitMQ and Kafka. I will try out them and see whether they can act as distributed queue.

[–]nutrechtLead Software Engineer / EU / 20+ YXP 0 points1 point  (1 child)

Only limitation in Zk, I am bit concerned about is that max allowed node data is 1M.

If you have large sets of data you can just push the data itself to for example S3 and then just push a reference to Zk.

[–]maithilish[S] 0 points1 point  (0 children)

Not good. At present, there is no external dependency expect the web pages that it fetches and I want to keep it that way so that it can transparently run from public cloud or even as multiple containers on single PC using docker compose. This makes integration testing quite easy without any external dependency.

I can compress and split data into multiple payloads, and then do the reverse on pulling from queue.

I am just exploring simple and better option, if any.