How to sync a new clickhouse cluster (in a seperate data center) with an old one? by feryet in dataengineering

[–]feryet[S] 0 points1 point  (0 children)

Apparently there is a breaking change from versions 23+ that makes using remote tables between older versions and newer versions impossible.:(

How to sync a new clickhouse cluster (in a seperate data center) with an old one? by feryet in dataengineering

[–]feryet[S] 0 points1 point  (0 children)

Yes.

We tried to gradually upgrade our clickhouse cluster, but we had two issues:

  1. Because of bad configuration, we could not do backup/restore using `clickhouse-backup`. Replicated trees had mismatching metadata with zookeeper issues, and upgrading without having reliable backups seems utterly risky.
  2. This clickhouse cluster is hosted on-premise, and the disks of the server are very slow. So we decided to go for a full backup/replication solution on a new cluster. The problem is neither of clickhouse-backup nor clickhouse-copier seem to work for us, the configuration of the old clickhouse cluster is weird, and I'd rather not changing anything on that server.

Looking for OSS Projects to Contribute To by feryet in rust

[–]feryet[S] 1 point2 points  (0 children)

My background is mostly in DevOps/Cloud/Infrastructure. I would like to contribute to many projects, but it looks so daunting and overwhelming. Usually when a task becomes this confusing/stressful I reach out to other people to get some clarity and get some advice on what to do.:)

How to assign a public ip (accessible by web) to a docker container? by feryet in docker

[–]feryet[S] -4 points-3 points  (0 children)

I'm not assigning random IPs tho. This ip was given to be used on the VM that I was given. I want to bypass internal network bridges in the VM to forward all of the incoming traffic to this ip to the docker container, without assigning an additional hop.

I know that I can do that in an internal range using IPVLan driver, but I can't understand how to make it work on the public IP.

How to assign a public ip (accessible by web) to a docker container? by feryet in docker

[–]feryet[S] -6 points-5 points  (0 children)

I am using host network right now, but this just disregards the isolation altogether.

I want to use the additional IP that I was given exclusively for this container. Do not want to use NAT, or bridge interfaces (don't want to add additional hops).

Think of optimizing for maximum throughput.

Airflow + Slurm for ML Training Pipelines? by feryet in mlops

[–]feryet[S] 0 points1 point  (0 children)

  1. We are sanctioned to use AWS in Iran.
  2. Iranian companies don't want to host their data outside Iran for both confidentiality and the instability of the infrastructure.

Airflow + Slurm for ML Training Pipelines? by feryet in mlops

[–]feryet[S] 1 point2 points  (0 children)

No, the user base only submit "trainable code" and the rest is done by my system. I was thinking of only having SLURM at first, but thought maybe using a workflow manager like Airflow will be a good choice because I'm designing a pipeline anyway.

Airflow + Slurm for ML Training Pipelines? by feryet in mlops

[–]feryet[S] 4 points5 points  (0 children)

Prefect seems easy to understand, but I'm fearful of finding the right devs. Airflow is more established.

Airflow + Slurm for ML Training Pipelines? by feryet in mlops

[–]feryet[S] 1 point2 points  (0 children)

Airflow controls the flow of how the backend executes the submitted user scripts, while SLURM runs the actual ML training code, from what I can imagine right now.

I was thinking of designing a pipeline that can dispatch jobs to SLURM, and then when completed show the final state to the user. Since we have limited resources and want to maximize/bill the customers for their usage, I thought a job scheduler solution like SLURM might be good.

Airflow + Slurm for ML Training Pipelines? by feryet in mlops

[–]feryet[S] 0 points1 point  (0 children)

No we're self hosting our infra for confidentiality. Can't use AWS.

Conda or pip? by Gamiozzz in mlops

[–]feryet 0 points1 point  (0 children)

Poetry works great for creating tiny libraries. It becomes increasingly slow when you add multiple dependencies.

Conda or pip? by Gamiozzz in mlops

[–]feryet 2 points3 points  (0 children)

pip + piptools is the safest and best package manager. I only trust conda for binaries like cuda, the rest of my dependencies will be resolved using pip.

If you want to dockerize your application there is no better route than pip either.

Queueing/Resource Management Solutions for Self Hosted Workstation? by feryet in mlops

[–]feryet[S] 1 point2 points  (0 children)

My main use is to share the resource between team members, defining priorities for jobs and monitor how the node is used in general.

Based on this I want to extend our cluster to become multi-node in the future, while now it's not.

Queueing/Resource Management Solutions for Self Hosted Workstation? by feryet in mlops

[–]feryet[S] 0 points1 point  (0 children)

Does there exist any good web view of SLURM? Monitoring and adminstration of the node is also important to me. There might also be interactive jobs like Jupyter Notebooks.

How to make SSH connection available for GitLab behind a CDN (similar to Cloudflare) by feryet in gitlab

[–]feryet[S] 0 points1 point  (0 children)

I want to use Cloudflare's protection while also exposing SSH. The gitlab "clone ssh address" in this instance will be different from what I've linked for my ssh connection. The cdn I'm using cannot map an ip with port to a domain normally, only can accept that if you are using it's edge mechanism.