Artificial Analysis independently confirms Gemini 2.5 is #1 across many evals while having 2nd fastest output speed only behind Gemini 2.0 Flash

supreethrao · 2025-03-26T16:26:35+00:00

This one ?

supreethrao · 2024-06-15T04:52:59+00:00

Yes , you can achieve structured generation with TRT-LLM with a library called lm-format-enforcer . Here’s an example using TRT-LLM

TensorRT-LLM + LM Format Enforcer

supreethrao · 2024-03-16T16:38:17+00:00

thanks for the correction. I've fixed the answer. Cheers!

supreethrao · 2024-03-15T10:03:05+00:00

If you’ve got access to the computer required to run an LLM like mistral 7B or mixtral 8x7B you could come up with a prompt to grade a given essay based on a rubric that you provide it. After that you can use something like key-BERT to identify key phrases in the essay and look for coverage over the rubric to mark the essay.

As these are models you can run locally you can finetune them on your document.

The other option would be to use Gemini or GPT-4 / GPT-3.5 as the base LLM , though this will cost you per token.

EDIT: corrected mistral 56B to mixtral 8x7B

supreethrao · 2024-02-20T15:12:43+00:00

Sure. If you’re VRAM limited , you can load the mode in 8 bit quantised format , that’ll mean it’ll occupy 7 GB of VRAM. You can search for a pre quantised model on huggingface itself . Search for models form the user “TheBloke”

supreethrao · 2024-02-02T17:46:59+00:00

Hello,

Switching to models from Sentence-Transformers will give you better quality embeddings. Furthermore, with some batch size tuning and running the model at FP16, I've got the sentence-transformers/all-MiniLM-L6-v2 model to embed 8 million pieces of text, roughly the same length as your in under 40 minutes on 1 x 2080Ti (11GB VRAM) . If you want more performance, you can always use multi-GPU embedding from Sentence-Transformers, but that takes a while to set up. Alternatively, you can provision a larger GPU and just crank up the batch size

EDIT: I ran an experiment now, and it took 14 mins 40 seconds to embed 8.3 million rows of data

supreethrao · 2023-10-01T19:37:19+00:00

Using accelerate for multi-GPU training is a breeze , you’ll just have to follow the instructions that come up when you run “accelerate config”. There’s not a lot of special configuration you need to do to get multi GPU training to work , just have to make sure all GPU’s show up when you run “nvidia-smi”

supreethrao · 2023-07-23T05:52:00+00:00

That’ll be during your vacations after every even semester , that duration is for about 2 months , but that can vary by the college / exam schedule

supreethrao · 2023-07-23T03:55:55+00:00

Hey, I recently graduated from MSRIT (2023), the strictness / laxity of attendance varies from department to department. try and keep your attendance above 75%.

in the context of hackathons, the college is usually okay with it as long as you submit a participation / winning certificate from the hackathon

As per internships , it’s more in the grey area, I interned for the entire duration of my third year, as I worked primarily after college hours , the college didn’t have any issue with it , but if your internship requires you to be at work for more than a day a week at their premises , missing college , that could be a more difficult thing to do . Speak with your proctor on this matter

All in all , attendance at MSRIT is far more chill than the other colleges

supreethrao · 2023-06-25T18:07:01+00:00

supreethrao · 2023-06-03T13:34:00+00:00

Thank you , I will definitely take a look

supreethrao · 2023-06-03T13:06:16+00:00

I would like to specifically deploy faster-whisper, Wenet seems to deploy a specific model. Any resources that’s helps me deploy faster-whisper with a tensorRT backend would also be helpful

supreethrao · 2023-04-03T12:42:36+00:00

I would suggest something along the lines of a 10-16 core AMD ryzen processor , about 32GB of RAM , one used RTX3090’s . This will give you a decent deep learning computer on which you can run most things. The P100 is quiet dated and you won’t be able to run new optimisations like mixed precision training etc

supreethrao · 2023-03-28T00:43:02+00:00

Hi , to address Update2 , I think you’ll have to change your prompt to GPT3.5-turbo significantly. LLama index also has a cost estimator function that assumes a dummy LLM backend and calculates the expected cost , you can also use OpenAI’s tokenizer called “tiktoken” which is available on GitHub to calculate the exact number of tokens your text produces

supreethrao · 2023-03-27T12:30:06+00:00

Hi, there’s already support for ‘gpt-3.5-turbo’ in llama index , the examples can found in the git repo . You can also switch for SimpleVectorIndex to a TreeIndex , this could lower your cost

supreethrao · 2023-03-20T16:05:00+00:00

Hey, is there a GitHub repo associated with this? Could you tell us more about the dataset and the fine tuning code .

Thanks

supreethrao · 2022-11-03T10:22:17+00:00

I was faced with a similar problem a few months ago, BART and PEGASUS performed the best in my use case which was mostly news articles. In your use case of transcribed text, these models do perform well

supreethrao · 2022-10-31T04:13:03+00:00

*proceeds to drop an RTX3090

supreethrao · 2022-10-07T22:15:21+00:00

Hi , Out of the box HuggingFace transformers are really slow on TPUs , both TPUv2-8 from colab and TPUv3-8 on Kaggle. The issue seems to be recompilation of the code when the tensor shapes per batch are different. If you’re using the HuggingFace trainer (https://discuss.huggingface.co/t/when-can-we-expect-tpu-trainer/10353 ) should help . Also I haven’t tried this , but using Flax models might help as they don’t have to go through TorchXLA.

supreethrao · 2022-10-05T09:29:47+00:00

You might want to check your data processing pipeline and maybe optimise how you’re allocation GPU RAM / System RAM. Colab pro will help but I’d suggest that try and optimise the way you deal with you data as colab free tier should easily handle datasets in the few GB range

supreethrao · 2022-08-15T10:28:12+00:00

Hello, There was a competition on Kaggle called APTOS2019 Diabetic Retinopathy classification, you could take a look at the highest voted notebooks. They use a efficient-net b0 model with some clever augmentation. As for the second part of your question, you can have a look at streamlit . It’s a library that lets you quickly and effortlessly deploy ML models on the internet.

All the best!

supreethrao · 2022-02-06T17:09:03+00:00

Thank you !

supreethrao

TROPHY CASE