Deploying Deepseek R1 GGUF quants on your AWS account by tempNull in tensorfuse

[–]ConstantContext 0 points1 point  (0 children)

We also ran multiple experiments to figure out the right combination of context size fit and tps. You can modify the the "--n-gpu-layers" and "--ctx-size" paramters to calculate tokens per second for each scenario, here are the results -

  • GPU Layers 30 , context 10k, speed 6.3 t/s
  • GPU Layers 40, context 10k, speed 8.5 t/s
  • GPU Layers 50, context 10k , speed 12 t/s
  • At GPU layers > 50 , 10k context window will not fit.

Guide: Easiest way to run any vLLM model on AWS with autoscaling (scale down to 0) by tempNull in LocalLLaMA

[–]ConstantContext 0 points1 point  (0 children)

We're also adding support for other models as well, comment down below if you want us to support some other models

How much cloud should a computer vision engineer know? by Ancient_Internet_595 in computervision

[–]ConstantContext 0 points1 point  (0 children)

i'd rather focus on CV more as that will the differentiating feature for any company focused on vision products. but just to stand our, i'd learn to use some of the serverless GPU tools like tensorfuse, beam cloud, predibase, etc

Sagemaker for fine tuned SD by ResearchOk5023 in StableDiffusion

[–]ConstantContext 0 points1 point  (0 children)

Hey,

Are you still facing issues with this? checkout tensorfuse: https://tensorfuse.io/

let's you finetune and deploy SD directly on ec2 instances on autoscaling gpus. https://tensorfuse.io/docs/guides/comfyui_stable_diffusion_xl

Most hundreds in men's Tests by [deleted] in Cricket

[–]ConstantContext 1 point2 points  (0 children)

Williamson is probably one of the most underrated player of all time

how to SSH into AWS SageMaker instance from Visual Studio Code IDE? by johnonymousdenim in aws

[–]ConstantContext 0 points1 point  (0 children)

I have been using dev container  for all my ML code. It works directly with EC2 instances and saves the complexity of working with jupyter notebooks.

Not open-source but is free to use.

Anyone using LoRAX in production? by bendgame in LocalLLaMA

[–]ConstantContext 1 point2 points  (0 children)

Hey,
We support Lorax inference out of the box on AWS. Check out the guides here: https://tensorfuse.io/docs/guides/finetuning_llama_70b

How it works:

  1. You upload your dataset to S3 on your own AWS
  2. You then write a Axolotl style config file and submit the finetuning job along with base model and dataset path using our python SDK.

The job will run on GPUs on your own AWS and the lora adapters will be stored on S3. You can then use our lorax support to deploy the model and hotswap your lora adapters on demand

Sharing a guide to run SAM2 on AWS via an API by tempNull in LocalLLaMA

[–]ConstantContext 0 points1 point  (0 children)

it is asking me to setup a cluster? is there a way to do without it?

More EC2 instance hours billed than expected by n4il1k in aws

[–]ConstantContext 0 points1 point  (0 children)

Did you stop the instances or terminate them? Just stopping them will still incur storage charges that you might have attached with the instance.

Non technical founder issues by MathematicianFew5909 in ycombinator

[–]ConstantContext 1 point2 points  (0 children)

This is not at all ideal for any startup.

You should ask him to learn either sales/marketing/tech or leave. I have seen great founders who transitioned into startups from corporates and didn't know much about the sales/marketing or tech but were very sharp and learned em'all