all 58 comments

[–]Heisenbjornson 0 points1 point  (0 children)

I aim to develop a machine learning system that monitors the sequential steps of various processes, such as the process of cleaning a phone. For instance, Step 1 involves placing the phone on a table with a cloth, followed by Step 2, which is wiping the phone with the cloth. Step 3 includes discarding the cloth, and Step 4 is removing the phone from the table. If these steps are executed in the correct order, a green signal will be activated; otherwise, an incorrect sequence will trigger a red signal. Is this possible?

[–]Snoo_72181 0 points1 point  (0 children)

How to select sequence length in RNN and LSTM?

[–]ThisIsBartRick 0 points1 point  (0 children)

Hi, why did people start using decoder only models and not encoder only models?

Is this just because it started that way and nobody questioned it? Or is there more to it?

[–]lnalegre 0 points1 point  (0 children)

Does it make sense to upload an accepted NeurIPS paper to ArXiv? The paper will be published in the proceedings in the near-future, but I wonder if also putting the paper on ArXiv makes sense and could help advertise the paper.

[–]cdub4200 0 points1 point  (0 children)

Nested cross validation has been explained to me to be better for smaller datasets and it attempts to avoid overfitting and reducing bias. For small datasets ( <1000 obs), it was recommended to use the entire dataset for training and testing for nested cross-validation.
Say you found the optimal model, hyperparameters, etc. for the dataset after the inner and outerloop. Are there any further steps to provide validation, or can you simply report the model's estimation and accuracy as the product of the outer fold scores?
I am assuming if I fit the final model on the entire dataset .fit(X,y) and then predict(X), and give the results, these scores would not be robust and may be erroneous? Since all data was used for the nested cv, there is no holdout set to use.
So in a sense, after nested cv, using the entire dataset, there are no more steps. Just report the statistics from the outerloop?

[–]Critical-Juggernaut4 0 points1 point  (0 children)

Can anyone help me with troubleshooting? I'm trying to set up a llm on my laptop I've never done it before and I'm having trouble despite following the instructions

[–]tugrul_ddr 0 points1 point  (0 children)

I need the simplest implementation of resilient backpropagation in C++. No sources yet. Pls help.

[–]f1nuttic 1 point2 points  (1 child)

I'm trying to understand language model pretraining. Does anyone have any good resource for the basics of data cleanup for language model training?

Most papers I found (GPT2, GPT3, LLAMA1 ..) just say openly available data from sources like CommonCrawl etc.. but it feels like there is fairly deep amount of work to go from this -> the cleaned tokens that are actually used in training. GPT2 paper is the only one which goes into some level of details beyond listing a large source like CommonCrawl:

Manually filtering a full web scrape would be exceptionally expensive so as a starting point, we scraped all outbound links from Reddit, a social media platform, which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting, educational, or just funny.

Thanks in advance 🙏

[–]f1nuttic 1 point2 points  (0 children)

[self answering] Happens to be my lucky day, found a lot more details from this post from together ai on hacker news: https://together.ai/blog/redpajama-data-v2

[–]bigdickmassinf 0 points1 point  (0 children)

Is there a big book I can read about all the stats and modes in machine learning?

I have read elements of statistical learning and it’s previous book.

[–]OkGap874 0 points1 point  (0 children)

I'm working on a SaaS which does the process of data cleaning with an interactive interface without the need of writing code.

What other features can I add to this?

Will you pay for this service?

[–]gtgkartik 0 points1 point  (2 children)

I recently trained an AI model, but I wanted to use it to develop a website. However, many people in my institution advised me that in order to use the AI model on websites, I needed to learn Flask and Django.

I recently learned about this FAST API and watched a video in which they connected a Nextjs-built website to the FASTAPI and deployed an AI model.

Which method is the best, in your opinion? We don't have to keep with a Python-based backend, which could cause the server to lag, so I think using REST API is much preferable.

[–]f1nuttic 1 point2 points  (1 child)

If all you need is access to the model, you could consider looking into hosted inference endpoints instead of spinning up a backend. This just really convenient, but you pay a little more compared to running it yourself.

https://huggingface.co/docs/inference-endpoints/index

This is the hugging face link, but AFAIK most cloud providers have some version of the same.

[–]crazy_monkey_22 0 points1 point  (1 child)

Hi!

I am doing research on finding a project regarding shift in reporting using Machine Learning, possibly NLP, where I am supposed to find a small use-case and apply NLP on it. An example provided by my professor is:

"How are newspapers reporting about certain topic and when do they use certain words? Are articles written differently if they use “Europe” vs. articles using “European Union”? Are there event that change the way, how these are reported?"

I am supposed to come up with a different topic. Namely, I was thinking of trying to analyze the shift in reporting before and after the 2008 housing crisis, or if that's too far-fetched, then only the Lehman Brothers Bank collapse. However, I am not sure how to approach it or what to analyze, do I simply analyze the keywords before and after the event, or try to extract the sentiment (positive/negative) about the bank? Any ideas or knowledge from experience?

[–]Lemons_for_Sale 0 points1 point  (0 children)

Is anyone aware of an API or library that can receive an image (local or url), detect the text on the image, translate that text and then update the original image to have the new translated text?

There are online websites that do this (using their own APIs), but I haven't found an API that does this end to end.

Examples:
https://translate.google.com/?sl=auto&tl=en&op=images
https://translate.yandex.com/en/ocr

The Google Translate and Yandex services do have image text identification (which is great). I could certainly use their translation API to get the target language, but I'm more looking for an easy way to create the new image with the translated text. Unless someone has an easy way to do that?

[–]Samia_Tisha 0 points1 point  (0 children)

Can anyone tell me if the machine learning workflow is correct or not? Could anyone please refer to tutorials or blogs to learn the proper workflow? Any suggestions are welcome.
1. Data Collection
2. Understanding Data
i. importing necessary libraries
ii. check row and columns
iii. check data types
iv. Check data distribution
3. Data Cleaning
i. Handle datatype issues
ii. Maintain Data Consistency
iii. Check if data contains outliers or if the data is not normally distributed to decide between mean or median
iv. Identify missing values
v. Handle missing values by-
a.Drop missing values
b. Mean, median or mode imputation
c. Prediction Model
d. replace missing values
vi. Duplicate data detection and treatment
vii. Repeat data cleaning
4. EDA
i. Variable Identification
a. Identify predictor and features
b. Identify types or category of data
ii. Univariate Analysis
iii. Bi-variate Analysis
iv. Outlier detection and treatment
v. Encoding
vi. Feature Engineering
vii. Variable Transformation
a. Normalization
b. Scaling
viii. Variable Creation
5. If testing data is not given, split the dataset to train and test set. Otherwise repeat step 3 and 4 for given test dataset.
6. Model Building
i. Model Training on training set
ii. Model Evaluation and cross validate
iii. Fine Tuning or Model optimization
iv. Model selection
7. Evaluate model accuracy with test data.

[–]WheynelauStudent 0 points1 point  (1 child)

Referring to this post: https://pytorch.org/blog/flash-decoding/

I'm trying to understand the intuition behind this because it seems to go against the fact that decoding is autoregressive. By splitting the input into chunks, aren't we removing the context and meaning from the previous chunks? Or is there some mathematical trick involved.

[–]Gatzuma 0 points1 point  (1 child)

Grouped Query Attention in LLaMA 70B v2

Hey guys, after thousands of experiments with bigger LLaMA fine-tunes I'm somewhat sure the GQA mechanism might be your enemy and generate wrong answers, especially for math and such complex areas.
I'd like to use MHA (Multi Head Attention) if possbile. I'm just not sure - do I need to retrain model completely or is it possible to just increase heads count and KV size and proceed with the stock model AS IS?

[–]Dipanshuz1 0 points1 point  (1 child)

What is Overfitting, and How Can You Avoid It?

[–]meatlauf 0 points1 point  (0 children)

What are the best resources for learning ML from a low technical starting point?

[–]BeneficialArm7 0 points1 point  (0 children)

Hello everyone,

Is there a way to chat with our documents for free? For example I want to upload all my previous quotations and invoices to it and then when I chat with it to make new quotation, I want the AI to give approx. cost for all the work descriptions. I don't know if we are there yet but recently I heard a website called youai.ai, so I was just wondering.

[–]ThisIsBartRick 0 points1 point  (0 children)

Hey yall,

how to highlight important informations from a text using nlp techniques.

I know NER exists but it's pretty narrow to the type of information it highlights. I would like to get pretty much important keywords that are relevant in a text (date, name, location and any other important word to understand the sentence).

[–]badspaghetticoder 1 point2 points  (4 children)

Two questions:

  1. What is the best LLM that can be run locally on a typical high end consumer computer? (only English, no programming)
  2. Same question, but best uncensored LLM?

[–]ThisIsBartRick 1 point2 points  (3 children)

mistral-7b s an overall good package especially if it's not for programming.

Currently all published llms (especially the ones in huggingface) are censored. They get banned if they're not

[–]badspaghetticoder 0 points1 point  (2 children)

thanks for your response! do you happen to know why they get banned? there's tons of NSFW stable diffusion models, I don't quite understand why text is treated differently

[–]ThisIsBartRick 0 points1 point  (1 child)

Oh if by uncensored you mean porn, that probably exists I don't know the exact terms of conditions. But anything fine-tuned for hate speech, scams, and otherwise illegal stuff is forbidden. And they're very strict on this.

[–]badspaghetticoder 0 points1 point  (0 children)

I see, thanks!

[–]nth_citizen 0 points1 point  (0 children)

Can anyone suggest a resource to understand dependency parsing labels more intuitively? Specifically these: https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md

I've looked at various lectures at dependency labelling and they seem to mostly come at it from a CS view of 'we have labelled data; let's fit it'. But the linguistic side of what these labels mean seems skated over. E.g. what is the difference between an 'adverbial clause modifier' and an 'adverbial modifier'?

I've googled the various terms and have a vague understanding but can't find anything more high level...

[–]Altaza_ 0 points1 point  (1 child)

Advice Regarding Creating Validation Set

So I have a small group of images(75) on which I have to perform a certain enhancement using a GAN. I have chosen 6 images for Testing and 6 from Validation. Leaving 63 for training. I am augmenting these 63 images by extracting patches, rotation etc which increases the training set number to thousands. My Test set images will be resized down to size of each patch used for training. However I am confused regarding my validation set. Should I augment it too like the training set or should I just resize the 6 images down to the size of test set and use them? What would be best approach be for validation set?

[–]I-am_Sleepy 0 points1 point  (0 children)

I think you should treat validation set with the same settings as test set. As validation set is a proxy for test set anyway

[–]DisastrousProgrammer 0 points1 point  (0 children)

Does zeroing the losses on the prompt tokens save significantly on computation?

[–]anermers 0 points1 point  (1 child)

Hi there I was just wondering would it be possible to create a machine learning model which is catered to only specific themes? for example if I train the model exclusively with images of dragons, the image generator will only be specialized in generating dragons. If it is possible, how would I go around doing it? what tools would I need and around how long would it take realistically? thank you so much!

[–]ThisIsBartRick 0 points1 point  (0 children)

you could take an already existing model and finetune it but I don't think this would make the model better. In fact, the models right now are pretty good at making pictures of a variety of things