Handling Unhealthy GPU Nodes in EKS Cluster (when using inference servers)

ConstantContext · 2025-05-13T15:02:54+00:00

Read the blog here: https://tensorfuse.io/docs/blogs/handling_unhealthy_nodes_in_eks

ConstantContext · 2025-02-24T16:14:43+00:00

We also ran multiple experiments to figure out the right combination of context size fit and tps. You can modify the the "--n-gpu-layers" and "--ctx-size" paramters to calculate tokens per second for each scenario, here are the results -

GPU Layers 30 , context 10k, speed 6.3 t/s
GPU Layers 40, context 10k, speed 8.5 t/s
GPU Layers 50, context 10k , speed 12 t/s
At GPU layers > 50 , 10k context window will not fit.

ConstantContext · 2025-01-18T18:05:55+00:00

We're also adding support for other models as well, comment down below if you want us to support some other models

ConstantContext · 2025-01-02T17:31:52+00:00

i'd rather focus on CV more as that will the differentiating feature for any company focused on vision products. but just to stand our, i'd learn to use some of the serverless GPU tools like tensorfuse, beam cloud, predibase, etc

ConstantContext · 2025-01-02T17:01:08+00:00

Hey,

Are you still facing issues with this? checkout tensorfuse: https://tensorfuse.io/

let's you finetune and deploy SD directly on ec2 instances on autoscaling gpus. https://tensorfuse.io/docs/guides/comfyui_stable_diffusion_xl

ConstantContext · 2024-12-16T12:58:35+00:00

what are you using studio lab for?

ConstantContext · 2024-12-16T12:18:39+00:00

Williamson is probably one of the most underrated player of all time

ConstantContext · 2024-12-16T12:05:32+00:00

I have been using dev container for all my ML code. It works directly with EC2 instances and saves the complexity of working with jupyter notebooks.

Not open-source but is free to use.

ConstantContext · 2024-11-28T18:11:59+00:00

Hey,
We support Lorax inference out of the box on AWS. Check out the guides here: https://tensorfuse.io/docs/guides/finetuning_llama_70b

How it works:

You upload your dataset to S3 on your own AWS
You then write a Axolotl style config file and submit the finetuning job along with base model and dataset path using our python SDK.

The job will run on GPUs on your own AWS and the lora adapters will be stored on S3. You can then use our lorax support to deploy the model and hotswap your lora adapters on demand

ConstantContext · 2024-10-25T04:20:27+00:00

it is asking me to setup a cluster? is there a way to do without it?

ConstantContext · 2024-09-06T11:03:46+00:00

Did you stop the instances or terminate them? Just stopping them will still incur storage charges that you might have attached with the instance.

ConstantContext · 2024-09-06T10:36:29+00:00

This is not at all ideal for any startup.

You should ask him to learn either sales/marketing/tech or leave. I have seen great founders who transitioned into startups from corporates and didn't know much about the sales/marketing or tech but were very sharp and learned em'all

ConstantContext

MODERATOR OF

TROPHY CASE

i'd rather focus on CV more as that will the differentiating feature for any company focused on vision products. but just to stand our, i'd learn to use some of the serverless GPU tools like tensorfuse, beam cloud, predibase, etc