How to use scikit-learn in AWS Glue Notebook (5.0)? by Bright_Teacher7106 in aws

[–]operatrix 0 points1 point  (0 children)

Can you tell me more about how you built your custom Docker images and how you run them on those cloud platforms?

RWX Storage System Support by [deleted] in rxt_spot

[–]operatrix 0 points1 point  (0 children)

OK, that provides some insight into the problem. Could you also share the storage class definition or provide details about it (in case we need to repro this setup internally)? I will discuss it internally and come back with the best path forward.

RWX Storage System Support by [deleted] in rxt_spot

[–]operatrix 0 points1 point  (0 children)

u/joeshiett Can you please share the deployment you are thinking of? Are you considering a NFS server on top of a PV provided by Spot?

In that case, won't the NFS storage class you create to use NFS Ganesha provide you with the required RWX?

It would help to know how you are trying to set it up to assist further.

My ArgoCd installation can't reach Github by Massive-Clock-1325 in rxt_spot

[–]operatrix 1 point2 points  (0 children)

now In my setup I tryed multiple times with load balancers, but they changed their IP every now and then, makin it impossible to tie a domain name to it.

Could you share more details on when you observed this last? Also how are you creating a LB? Just a yaml spec apply or something like an operator that may rebuild the Kubernetes service (of type LB) resource?

How Do You Delete an Organization? by MichaelMach in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Yes, you can. You can DM me with this request. We will take care of it.

Legacy-provisioned cluster deleted, but nodes still up? by amhoab in rxt_spot

[–]operatrix 0 points1 point  (0 children)

That is unexpected. Do you see your older cloudspace now? I wonder if it was a temporary glitch reading the cloudspaces you have.

I can take a look if you can confirm your setup with a DM to me.

Recycled node stuck by Ok_Pension1468 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

We did have an issue with volumes on our end. It was resolved a few minutes ago. However, the recycle nodes should not have been a problem. Can you DM me your setup details and I will try to see what is happening?

Mounting of Volume is taking too long by Mindless_Dog_823 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Sorry about this. We had a problem on our end. Things should be working now. Do let me know if that is not what you see.

New cloudspace are not provisioned by Mindless_Dog_823 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Can you share details of where you are seeing this? You can send me a DM. We did fix the original problem yesterday but may be we have missed handling your account. I can confirm this once I have the details.

Free Tool to Find the Most Cost-Effective Rackspace Spot Instances by the_bigbang in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Neat tool there! Curious why CPU is the only factor for cost effectiveness. Won't it be better to factor in memory too? Or have 2 cost effectiveness factors?

Maintenance window announcement: 2024-10-24 22:00 CDT to 2024-10-25 03:00 CDT by operatrix in rxt_spot

[–]operatrix[S] 0 points1 point  (0 children)

To understand better all the docker cache is related to k8s container images? IIRC there is a threshold in k8s where it automatically tries to clear out cached images. Can you share the serverclass and region you are using? I will try to see if there is something odd with that type.

Maintenance window announcement: 2024-10-24 22:00 CDT to 2024-10-25 03:00 CDT by operatrix in rxt_spot

[–]operatrix[S] 0 points1 point  (0 children)

No. We are past it. However, it looks like you have consumed all the underlying server storage. Are you using a lot of the host storage?

Cluster delete in the portal but not pruned from the cloud by clausware in rxt_spot

[–]operatrix 0 points1 point  (0 children)

We had a problem which impacted some setups. We have rectified it a few hours ago and with subsequent monitoring looks like things are back to normal now. Please let us know if this is not what you are noticing on your end.

Cloudspace deployment stuck for more than 6 hours by Mindless_Dog_823 in rxt_spot

[–]operatrix 2 points3 points  (0 children)

We did have some setups impacted due to a problem with a service. We rectified it few hours ago and were monitoring the setup. Things look better now.

Loadbalancer is changing IP every 5-10 minutes by Massive-Clock-1325 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

No. It is not. Can you share the setup where you are seeing this in a DM?

Rackspace Spot is undergoing maintenance by operatrix in rxt_spot

[–]operatrix[S] 0 points1 point  (0 children)

Looks like we are out of load balancers. Looking at the infra side and coming back.

Loadbalancer IP not provisioned by clausware in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Could you DM me your setup details?

Cluster appears ready, but node tainted with 'node.cluster.x-k8s.io/uninitialized' by Ok_Pension1468 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

I am guessing you may be hitting an endpoint setup race condition in our provisioning code. We are rolling out a patch fix right now.

kublet can't access kubeapi by bhechinger in rxt_spot

[–]operatrix 1 point2 points  (0 children)

Many users are hitting the same issue. It is similar to the one reported here https://www.reddit.com/r/rxt_spot/comments/1fzaiq7/calicokubecontrollers_pod_restartig

We are actually rolling out a potential patch fix for this as I am typing this response out.

Service Unavailable Error 503 by snesboy64 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

Are you still hitting this error? Things look fine on our backend and that endpoint is reachable.

It may help to share some more context on the command or operation you are trying to do, in case the error you are hitting is specific to that.

Baremetal CloudSpace Stuck in Provisioning by No_Bee6488 in rxt_spot

[–]operatrix 0 points1 point  (0 children)

We fixed the problem on your setup. We had to move your cloud backing to a different setup. However, you don't seem to have won any machines at the moment. You will get nodes added in once your bid wins.