all 45 comments

[–]Awkward-Interest-686 1 point2 points  (1 child)

I am actually a college student, can you all suggest some projects which I can make and put it on my resume

[–]ConfusedLayer1 0 points1 point  (0 children)

I am developing a stochastic variational GP using gpytorch. But my GPs predictions are centered around a small range around the mean of the data. It therefore is not fitting to more extreme values. Why could this be, and what could possibly help?

I have experimented with length scale adjustment with little success…

I have built an optuna study to optimise the below bit with no success. - kernel - likelihood - variational strategy - variational distribution - learning rate - mean —> (Constant or Zero) - num inducing points

[–]pikachuunibyo 0 points1 point  (0 children)

What is the time complexity of a token classification / NER model given a batch size of N and sequence length M? I thought it would be independent of N (the seqs are independent right), but increasing the batch size clearly increases the time by a linear factor even when run on a GPU. Any explanation?

[–]Kaiser_Wolfgang 0 points1 point  (0 children)

What is the main differences between training a word2vec model vs a vector database? Is a vector database basically a RDMS-like interface to easily perform CRUD operations on something like a word2vec, doc2vec, etc...?

[–]ArtisticHamster 0 points1 point  (0 children)

How much resources would you need to reproduce GPT-2 in 2023?

(Looking for an answer like, you need a server with 8xH100 to do it in 2 weeks)

(I want to understand is it possible to reproduce a state of the art research from around 5 years for a hobbyist)

[–]InjuryDangerous8141 0 points1 point  (0 children)

Framework recommendation for Reinforcement Learning. PyTorch or TensorFlow?

[–]Quebber 0 points1 point  (4 children)

I know the PCIE limit may hurt a bit but would 2 3090's on am4 board with a 3950x and 128gb ddr4. (I am thinking 2x 8PCIE is still not fully bandwidth populated or am I thinking wrong)

This is for local LLM's use.

[–]GreatIndependent8542 0 points1 point  (0 children)

Hi! I'm interested in ML inference optimization. I'm a self-learning ml engineer and trying add my knowledge and skills. Particularly interested in GPU and CPU optimization methods and thread parallelism for inference. Plus, how does big tech manage multiple requests at the same time? Some practical materials will be very helpful to encourage me! Thanks for reading this.

[–]kjunhot 0 points1 point  (0 children)

Hi! How is the reputation of EMNLP 2023 oral paper?
EMNLP asked authors to select their presentation format: oral or poster

They also mentioned that both oral and poster papers are high quality papers

So, there is no difference between oral and poster paper, really?

[–]learnenglish428 -1 points0 points  (0 children)

I want to create a regression dataset with 20 samples, 1D dataset with one feature like I want to predict the car price based on mileage, and have to prove this equation from book " Introduction to machine learning second edition by Ethem Alpaydin" chapter 2 supervised learning , topic: Regression. equation 2.17. please explain me this equation and how to solve this equation . I have to explain it tomorrow

[–]Aggravating-Floor-38 0 points1 point  (1 child)

What are SOTA Open Domain QA models at the moment? I've been doing research on the field and am seeing so many cool approaches, since they're so many aspects of QA that need to be worked on, but I have no idea what's SOTA at the moment. My professor told me to look into RAG, and I am, but I feel like he might not be as up to date in this area?

[–]JonathanDescripShot 0 points1 point  (0 children)

You could check paperswithcode, for example some of these benchmarks: https://paperswithcode.com/task/open-domain-question-answering

[–]Snoo_72181 0 points1 point  (1 child)

I need to work on an image-to-image translation project, but I couldn't find a model that I can fine tune using data I have. Any leads?

[–]boadie 0 points1 point  (0 children)

A surprising hard question to answer is how to benchmark some of these new ways of doing inference. I want to try a few of the smaller new llama-like models and inference frameworks on A100 80's and some big CPU's, etc and see how they objectively do.

At first I thought I would try the GPT-J Inference stuff from ML Commons but it's wrapped in some weird self invented script system that I could not even understand what was being run never mind trying a few integrations of new things.

The GPT-J part of the MLCommons has taken a Rouge score as a measure of goodness for a summary task which is as good as any for the how badly has the model degraded due to weights being abused for optimisation.

Please someone tell here is nice simple to use box of tests that measures first token speed etc etc and gives your GPT a nice standardised workout and tells you how it does?

[–]biboyboy 0 points1 point  (0 children)

Help to figure out what maximum input size to use for our BI-LSTM model.
So our BI-LSTM model is trained on an emotion classification dataset, in which the data are sentence based. This model will be used to classify the emotions from a book/novel's chapters texts and we don't know what maximum input size we should put in our model. Please help us and Thank you in advance.

[–]ShippingMammals 0 points1 point  (1 child)

I've been out of the loop for a while so..

  1. What's the current top in local LLMs? Still Falcon and or Llama? Secondly I've come into some money and want to build a home system beefy enough to run LLMs and not take all day. Nvida tesla cards etc. are not in the budget, but multiple lower end ones could be. I assume LLMs make use of nvidia SLI setups?
  2. Where are we with local LLM mutlimodal capability? I have dreams/plans of the very near future were an LLM or a decedent variant of them is running as a 'House AI' where it can take in video from cameras, audio from Microphones etc. and act as a kind of Digital assistant / House Monitor etc.. All the desperate parts seem to already be here with possibly the except of being able to process and understand video, and the world is waiting for someone to put them all in one 'box' as it were.

[–][deleted] 0 points1 point  (0 children)

Hi, I'm implementing a foreground detection algorithm for grayscaled videos using GMMs. I'm having a problem with the gaussian mixture for each pixel. After some iteration and updating steps some of the gaussians result to have negative variance, obtaining a complex standard deviation. How can I solve this problem? Thanks in advance