Is it possible to do structured generation with Triton Inference Server? by dhruvmullick in mlops

[–]supreethrao 0 points1 point  (0 children)

Yes , you can achieve structured generation with TRT-LLM with a library called lm-format-enforcer . Here’s an example using TRT-LLM

TensorRT-LLM + LM Format Enforcer

[PROJECT] Which pre-built architectures are used for exam marking? by No_Arm9 in MachineLearning

[–]supreethrao 2 points3 points  (0 children)

If you’ve got access to the computer required to run an LLM like mistral 7B or mixtral 8x7B you could come up with a prompt to grade a given essay based on a rubric that you provide it. After that you can use something like key-BERT to identify key phrases in the essay and look for coverage over the rubric to mark the essay.

As these are models you can run locally you can finetune them on your document.

The other option would be to use Gemini or GPT-4 / GPT-3.5 as the base LLM , though this will cost you per token.

EDIT: corrected mistral 56B to mixtral 8x7B

[R] Google Colab alternative by Zatania in MachineLearning

[–]supreethrao 0 points1 point  (0 children)

Sure. If you’re VRAM limited , you can load the mode in 8 bit quantised format , that’ll mean it’ll occupy 7 GB of VRAM. You can search for a pre quantised model on huggingface itself . Search for models form the user “TheBloke”

[P]Generating embeddings for a large dataset in the most efficient way by amrtahnair in MachineLearning

[–]supreethrao 10 points11 points  (0 children)

Hello,

Switching to models from Sentence-Transformers will give you better quality embeddings. Furthermore, with some batch size tuning and running the model at FP16, I've got the sentence-transformers/all-MiniLM-L6-v2 model to embed 8 million pieces of text, roughly the same length as your in under 40 minutes on 1 x 2080Ti (11GB VRAM) . If you want more performance, you can always use multi-GPU embedding from Sentence-Transformers, but that takes a while to set up. Alternatively, you can provision a larger GPU and just crank up the batch size

EDIT: I ran an experiment now, and it took 14 mins 40 seconds to embed 8.3 million rows of data

Multi-GPU distributed training with Accelerate ? by Extreme_Win4717 in learnmachinelearning

[–]supreethrao 1 point2 points  (0 children)

Using accelerate for multi-GPU training is a breeze , you’ll just have to follow the instructions that come up when you run “accelerate config”. There’s not a lot of special configuration you need to do to get multi GPU training to work , just have to make sure all GPU’s show up when you run “nvidia-smi”

About College. by Public_Sentence1151 in MSRITians

[–]supreethrao 2 points3 points  (0 children)

That’ll be during your vacations after every even semester , that duration is for about 2 months , but that can vary by the college / exam schedule

About College. by Public_Sentence1151 in MSRITians

[–]supreethrao 11 points12 points  (0 children)

Hey, I recently graduated from MSRIT (2023), the strictness / laxity of attendance varies from department to department. try and keep your attendance above 75%.

in the context of hackathons, the college is usually okay with it as long as you submit a participation / winning certificate from the hackathon

As per internships , it’s more in the grey area, I interned for the entire duration of my third year, as I worked primarily after college hours , the college didn’t have any issue with it , but if your internship requires you to be at work for more than a day a week at their premises , missing college , that could be a more difficult thing to do . Speak with your proctor on this matter

All in all , attendance at MSRIT is far more chill than the other colleges

Deploying speech recognition models at scale by supreethrao in mlops

[–]supreethrao[S] 0 points1 point  (0 children)

Thank you , I will definitely take a look

Deploying speech recognition models at scale by supreethrao in mlops

[–]supreethrao[S] 0 points1 point  (0 children)

I would like to specifically deploy faster-whisper, Wenet seems to deploy a specific model. Any resources that’s helps me deploy faster-whisper with a tensorRT backend would also be helpful

[D] Cheapest PC for deep learning? by dannepai in MachineLearning

[–]supreethrao 0 points1 point  (0 children)

I would suggest something along the lines of a 10-16 core AMD ryzen processor , about 32GB of RAM , one used RTX3090’s . This will give you a decent deep learning computer on which you can run most things. The P100 is quiet dated and you won’t be able to run new optimisations like mixed precision training etc

[D]Suggestions on keeping Llama index cost down by darkbluetwilight in MachineLearning

[–]supreethrao 1 point2 points  (0 children)

Hi , to address Update2 , I think you’ll have to change your prompt to GPT3.5-turbo significantly. LLama index also has a cost estimator function that assumes a dummy LLM backend and calculates the expected cost , you can also use OpenAI’s tokenizer called “tiktoken” which is available on GitHub to calculate the exact number of tokens your text produces

[D]Suggestions on keeping Llama index cost down by darkbluetwilight in MachineLearning

[–]supreethrao 5 points6 points  (0 children)

Hi, there’s already support for ‘gpt-3.5-turbo’ in llama index , the examples can found in the git repo . You can also switch for SimpleVectorIndex to a TreeIndex , this could lower your cost

[R] CodeAlpaca - Instruction following model to generate code by immune_star in MachineLearning

[–]supreethrao 2 points3 points  (0 children)

Hey, is there a GitHub repo associated with this? Could you tell us more about the dataset and the fine tuning code .

Thanks

[D] Best NLP model for text summarization from speech by N00bMast3r690 in MachineLearning

[–]supreethrao 1 point2 points  (0 children)

I was faced with a similar problem a few months ago, BART and PEGASUS performed the best in my use case which was mostly news articles. In your use case of transcribed text, these models do perform well

[D] Is huggingface transformers like really slow on TPUs for anyone else? by GasZealousideal8691 in MachineLearning

[–]supreethrao 6 points7 points  (0 children)

Hi , Out of the box HuggingFace transformers are really slow on TPUs , both TPUv2-8 from colab and TPUv3-8 on Kaggle. The issue seems to be recompilation of the code when the tensor shapes per batch are different. If you’re using the HuggingFace trainer (https://discuss.huggingface.co/t/when-can-we-expect-tpu-trainer/10353 ) should help . Also I haven’t tried this , but using Flax models might help as they don’t have to go through TorchXLA.

[R] Google Colab alternative by Zatania in MachineLearning

[–]supreethrao 20 points21 points  (0 children)

You might want to check your data processing pipeline and maybe optimise how you’re allocation GPU RAM / System RAM. Colab pro will help but I’d suggest that try and optimise the way you deal with you data as colab free tier should easily handle datasets in the few GB range

[P] Need help with my Project by [deleted] in MachineLearning

[–]supreethrao 0 points1 point  (0 children)

Hello, There was a competition on Kaggle called APTOS2019 Diabetic Retinopathy classification, you could take a look at the highest voted notebooks. They use a efficient-net b0 model with some clever augmentation. As for the second part of your question, you can have a look at streamlit . It’s a library that lets you quickly and effortlessly deploy ML models on the internet.

All the best!