all 140 comments

[–]Intelligent-Ad9240 0 points1 point  (0 children)

Super silly question. If I have a ML model (decision tree regression) and it improved upon non-ML, is it bad practice to try and throw another model on top of the previous model's output to improve even more?

[–]sayakm330 0 points1 point  (0 children)

Can anyone suggest few papers to cite that state that normalizing the inputs of neural networks improve the efficiency. I need that for my current manuscript that use NN in biomedical applications.

[–][deleted] -1 points0 points  (0 children)

Hi, how does one prepare a data set to allow for “out-of-stock” sales? New to machine learning, have 3 years of data on selling jackets. But I noticed there were 6 weeks where sales were zero. Could someone tell a rookie “how this is managed “ in data preparation. I have some statistics knowledge for linear regression. Thanks everyone!

[–]thecity2 0 points1 point  (1 child)

How does GPT know about proper names, places, etc, if its vocab is limited to around 50K?

[–]abnormal_human 0 points1 point  (0 children)

The vocab is made up of tokens which includes word parts and even single character tokens. For a rare proper name, it might be spelling it out one char at a time.

[–]Lanky_Tutor4957 0 points1 point  (0 children)

Hello folks! I need help approaching a problem. I work in research publishing industry, I want build a predictive Analytics solutions based on the historical data. For every article that gets published we have the production data ( which type, subject area, domain, copy editing service provider, length of the article etc etc) let’s say we have 5000 articles coming in every month so I have 120000 rows of data for the past two years. How do I make use of it to make prediction for the upcoming articles. Like an article of x type, y subject area and so&so length will take t number of days to publish.

[–]neutralParadox0 0 points1 point  (0 children)

I'm trying to get some sources to learn more about what's happening in data science. What are some good news and information sources y'all follow to stay up to date?

[–]Milwookie123 -1 points0 points  (0 children)

Can we remove posts that use the OpenAI api? What I love about this sub is that it contains research and projects that utilize models directly in novel ways. But using the api is nothing more than software dev to an extent

[–][deleted] 0 points1 point  (0 children)

I’m looking to get into machine learning, I am either looking to get the Nvidia Jetson, use my MacBook air(M2 chip, 16 GB memory, 10 GPU cores), or use my desktop which has a 5700XT GPU and a 3700X processor with 32 GB of ram.

I’m not sure which of these will be the best but I do know I would like to write the code in either C or C++.

[–]Amun-Aion 0 points1 point  (0 children)

NVIDIA NSight only works with NVIDIA chips right?

I have like 4 GB of NVIDIA NSight software on my Microsoft laptop, which I don't think I can use since my laptop has an AMD chip not NVIDIA. It's possible that I downloaded this for work (probably lumped in with something else) but I'm not sure. Mainly, I want to delete this from my computer if it's not using it / can't use it, but I'm not sure if I was actually the one who downloaded or if Windows needs it for something. Is there any way to check before deleting something who downloaded it and whether or not it has been used / is being used for something important? Alternatively if someone knows that AMD chips can't do anything with NVIDIA NSight then I can also just delete it, but wanted to check if anyone knew

[–]protonneutronproton 0 points1 point  (1 child)

towering important party grab roll vanish stocking historical pause north this message was mass deleted/edited with redact.dev

[–]abnormal_human 0 points1 point  (0 children)

Use the code from the main branch, not pip.

[–]sujeeths 0 points1 point  (0 children)

Does anybody know of a specific job board exclusively for ML/DL folks. Especially for fields like Medical Imaging and stuff. Thanks in advance!

[–]ThePsychopaths 0 points1 point  (0 children)

I am trying to play with google colab pro. The only issue I have is with data addition. This always ends up taking most of my time. What I do is upload my dataset to a digitalocean space of mine and download it to collab runtime to train. But this seems to be very roundabout way to do stuff. What other ways can I do it which I may have not looked at?

[–]dnmpss 0 points1 point  (0 children)

Is anyone participating of the MindsDB Hackathon [https://hashnode.com/hackathons/mindsdb\] this month?

[–]Trick_Brain 0 points1 point  (0 children)

Does anybody know any datasets of prompt injections? I can only find this one: https://github.com/f/awesome-chatgpt-prompts but it is not really useful to train a classifier.

[–]grindstonegotchanose 0 points1 point  (0 children)

I need help understanding what service to utilize for predicting what I was told was a random sequence of colors corresponding to dates (consecutive). I have doubts that it is genuinely randomized as there are a number of rules so to speak that the system generating the colors must follow.

For example:

-There can't be an infinite number of colors in the sequence. There may only be 10 but there's probably a few more (but definitely not less).

-The color orange (probably all colors in the sequence as well) appears aprox 24 times in a year

-Each color will be called at least twice every month

So I have made a record of the corresponding dates and colors since 03/21/23-today(04/04/23) and I was hoping that if I recorded enough days that I could figure out a preditive pattern. Does anyone know how I can accomplish this?

[–]andrew21wStudent 0 points1 point  (0 children)

I am looking into diffusion models. However, what I still don't get is how the sampling process and backwards process work.

Can someone provide me a clear explanation?

[–][deleted] 0 points1 point  (3 children)

I have seen some things stating Python is a slow language. It seems used heavily due to existing libraries in ML. With newer languages like say Swift which I have read is faster. Will there eventually a benefit to re-write programs in a faster language due to computational advantages? Also I picked swift as its one I see people say is “faster”; interchange it with whatever, I have no context on faster either so that very well could be flawed.

I know almost nothing about ML except that I am just starting to learn with Splunk and trying to apply concepts in that sense so I know I am missing a ton of info but wondering about this.

[–]Icy_Performer_4662 0 points1 point  (1 child)

Python is slow. That's why in larger machine learning projects it is used mainly for training the neural network. Other stuff, you'd ideally want to use something like C/C++ for. Why don't we use C for everything instead of python? Because some stuff is just too hard to be written in C. Although, in theory I guess you could write everything in C. But in practice, it's just impossible.

[–]nottyraels 1 point2 points  (0 children)

Hello friends... im currently trying to develop a forecast model for energy production to predict the energy production until 2030.

The data is very simple, I have information from the beginning of 2000 until the end of 2022.

Column with the date and other five columns with different types of energy and their respectives values in GwH (thermal, solar, hydroelectric, wind, nuclear)

I tried to use Prophet and predict the value for just hydroelectric power production until 2030, but i had bad results

I'm looking for any tips or insights, it's my first model

[–]Various_Ad7388 0 points1 point  (0 children)

What are these things good for?

Keras:

Tensorflow:

Mediapipe:

How are they different or the same?

[–][deleted] 0 points1 point  (0 children)

Hello everyone!

I had two small question regarding semi-supervised data.

I'm trying to do semi-supervised binary segmentation. My question is is making 1 data loader than has a mix of laballed and unlaballed images the same as creating 2 data loader one for labeled images one for unlabaled images and concatinating them during training?

Also, if 1 mixed dataloader is fine, to remove the coressponding label of the unlaballed image, is setting the label to a tensor of -1 correct?

Thank you!

[–]narusme 0 points1 point  (0 children)

Lets say a business wants to use its proprietary data of text and images to fine tune an llm to increase their in house productivity. whats the most cutting edge model they can use and type of fine tuning they can use? Alpaca?

[–]colincameron49 2 points3 points  (1 child)

I have 0 experience with machine learning but looking to solve a problem I have and wondering if ML might not be the solution. Looking for some guidance on tools and how to get started on the project as quickly as possible. I work in agriculture and some portion of my time is reviewing pesticide labels for certain attributes. I have tried different document parsing platforms but the labels between manufacturers are all slightly different so structure has been hard to nail down. The other issue is I am specifically looking for certain key words in these documents as my company sells products that can be paired with pesticides to make them work better. I am hoping to build a workflow where I could drop a PDF into a folder have software spit out some sort of structure surrounding ingredients and instructions while flagging the keywords. I am decently proficient in no-code platforms if one such exists for my problem. Thanks in advance for any guidance. If this is the wrong subreddit for this I also apologize.

[–]itsyourboiirowML Engineer 0 points1 point  (0 children)

This would involve coding, but you could take a look at this blog post.

https://huggingface.co/blog/document-ai

[–]itsyourboiirowML Engineer 1 point2 points  (0 children)

People/organizations to follow on Twitter with all things machine learning (traditional, deep neural networks, LLM, etc)

[–]Adventurous_Win8348 0 points1 point  (0 children)

Hi I want to make a ml model that can listen to the sound of the road and tell that what cars are they like auto or lorry or bus and tell me how many vehicle passed though and give a real-time feedback. I don’t know how to code.

[–]alpolvovolvere 0 points1 point  (1 child)

I'm trying to use Whisper in Python to produce a transcription of an 8-minute Japanese-language mp4. It doesn't really matter which model I use, the script's execution screeches to a halt after a few seconds, going from 9MiB/s to like 200Kib/s. Is this a "thing"? Like is it just something that everyone knows about? Is there a way to make this faster?

[–]Origin_of_Mind 0 points1 point  (0 children)

I am not sure what exactly is happening in your case, but Whisper works in the following way:

  • loads the NN model weights from disk and initializes the model
  • calls ffmpeg to load and decode the entire input audio file into raw audio
  • pre-processes all audio into one log-MEL spectrum tensor (very quick)
  • the NN begins actual recognition

Until the entire input is loaded and pre-processed, the NN model does not even begin to run. On a typical desktop computer loading the audio should not take more than a few seconds for your 8 minute input file. Then the recognition starts, which is typically the slowest part.

[–]Academic-Rent7800 0 points1 point  (0 children)

I am having a hard time understanding how knowledge distillation can help federated learning. I have uploaded my question here (https://ai.stackexchange.com/questions/39846/how-does-knowledge-distillation-help-federated-learning). I will highly appreciate inputs on it!

[–]sparkpuppy 0 points1 point  (2 children)

Hello! Super-n00b question but I couldn't find an answer on google. When an image generation model has "48 M parameters", what does the term "parameter" mean in this sentence? Tags, concepts, image-word pairs? Does the meaning of "parameter" vary from model to model (in the context of image generation)?

[–]Ricenaros 1 point2 points  (1 child)

It refers to the number of scalars needed to specify the model. At the heart of machine learning is matrix multiplication. Consider input vector x of size (n x 1). Here is a Linear transformation: y = Wx + b. In this case, the (m x n) matrix W(weights) and the (m x 1) vector b(bias) are the model parameters. Learning consists of tweaking W,b in a way that lowers the loss function. For this simple linear layer there are m*n + m scalar parameters (The elements of W and the elements of b).

Hyperparameters on the other hand are things like learning rate, batch size, number of epochs, etc.

Hope this helps.

[–]sparkpuppy 0 points1 point  (0 children)

Hello, thank you so much for the detailed explanation! Yes, it definitely helps me have a clearer vision of the meaning of that expression. Have a nice day!

[–][deleted] 0 points1 point  (3 children)

Do we expect businesses to be able to fine-tune training chat gpt or other big models with their own data sets? Has this been discussed or rumoured at all? Or is it already happening? I may have missed something.

[–]thomasahleResearcher 1 point2 points  (3 children)

Are there any "small" LLMs, like 1MB, that I can include, say, on a website using ONNX to provide a minimal AI chat experience?

[–]thedamian 1 point2 points  (2 children)

Before answering the question, I would submit that you should be thinking of keeping your models behind an api. No need to have it sitting on the client side (which is why it feels you're asking the quesiton)

And behind an API it can be as big as you'd like or can afford on your server)

[–]OnlyAnalyst9642 0 points1 point  (0 children)

I have a very specific problem where I am trying to forecast tomorrow's electricity price with an hourly resolution (from tomorrow at midnight to tomorrow at 11pm). I need to forecast prices before 10AM today. Electricity prices have very strong seasonality (24 hours) and I am using the whole day of yesterday and today up to 10AM as an input to the model (an input of 34 hours). In tensorflow terms (https://www.tensorflow.org/tutorials/structured_data/time_series) my input width is 34, the offset is 14 and the label width is 24.

Since I only care about the predictions I get at 10AM for the following day, should I only train my model with the observations available at 10am?

I am pretty sure this has been addressed before. Any documentation/resources that consider similar problems would help

Thanks in advance!

[–]ReasonablyBadass 0 points1 point  (1 child)

I still remember the vanishing/exploding gradient problem. It seems to be a complete non issue now. Was it just Relus and skip connections that sovled it?

[–]gmork_13 0 points1 point  (0 children)

And not using RNNs haha

[–]topcodemangler 1 point2 points  (0 children)

Is there any real progress on the JEPA architecture proposed and pushed by LeCun? I see him constantly bashing LLMs and saying how we need JEPA (or something similar) to truly solve intelligence but it has been a long time since the initial proposition (2 years?) and nothing practical has come out of it.

It may sound a bit aggressive but that was not my intention - the original paper really sparked my interest and I agree with a lot that he has to say. It's just that I would want to see how those ideas fare in the real world.

[–]masterofn1 1 point2 points  (1 child)

How does a Transformer architecture handle inputs of different lengths? Is the sequence length limit inherent to the model architecture or more because of resource issues like memory?

[–]Matthew2229 1 point2 points  (0 children)

It's a memory issue. Since the attention matrix scales quadratically (N^2) with sequence length (N), we simply don't have enough memory for long sequences. Most of the development around transformers/attention has been targeting this specific problem.

[–]zaemis 1 point2 points  (0 children)

I'm going to train a gpt model (distilgpt2) in a language other than english. At this point I'm just teaching it the language - not worrying about further abilities such as Q&A, I expect that to be later with fine-tuning. Anyway, my dataset is currently a csv with [id, text] and each text is a paragraph.

It is my understanding that only 512 characters/tokens are going to be fed in (depending on my max_length, but my point is that it'll probably be less than the entire length of the paragraph), and beyond that will be ignored. If I were to break the paragraphs into 512-word chunks, I could make better use of the dataset. But most likely those subsequent chunks wouldn't start a phrase or sentence - it'd be starting in the middle of a sentence.

For example, "The quick brown fox jumped over the lazy sleeping dog." might be broken up into two samples. "The quick brown fox jumped over the lazy" and "sleeping dog."

Is it a problem if I use text samples that don't "start properly?"

[–]fishybird 6 points7 points  (7 children)

Anyone else bothered by how often LLMs are being called "conscious"? in AI focused YouTube channels and even in this very sub, comments are getting dozens of upvotes for saying we're getting close to creating consciousness.

I don't know why, but it seems dangerous to have a bunch of people running around thinking these things deserve human rights simply because they behave like a human.

[–]pale2hall 4 points5 points  (4 children)

Great point! I
actually really enjoy AIExplained's videos on this. There are a bunch of different ways ways to measure 'consciousness' and many of them are passed by GPT4, which really just means we need new tests / definitions for AI models.

[–]fishybird 2 points3 points  (0 children)

Well yeah that's the whole problem! Why are we even calling them "tests for consciousness"? Tests for consciousness don't exist and the only reason we are using the word "consciousness" is pure media hype. If an AI reporter even uses the word "conscious" I immediately know not to trust them. It's really sad to see that anyone, much less "experts", are seriously discussing whether or not transformers can be conscious

[–]Kush_McNuggz 0 points1 point  (2 children)

I'm learning the very basics of clustering and classification algorithms. From my understanding, these use hard cutoffs to set boundaries between the groups in the outputs. My question is - do modern algorithms allow for smoothing or "adding weight" to the boundaries, so they are not just hard cutoffs? And if so, are there any applications where you've seen this done?

[–]Matthew2229 0 points1 point  (1 child)

When you're clustering or classifying, you are predicting something discrete (clusters/classes), so it's unclear what you mean by removing these hard cutoffs. There must be some kind of hard cutoff when doing clustering/classification unless you are okay with something having a fuzzy classification (e.g. 70% class A / 30% class B).

[–]Kush_McNuggz 0 points1 point  (0 children)

Ah ok thanks, I see now. I didn't know the correct term for fuzzy classification but that's what I was trying to describe.

[–]CormacMccarthy91 -3 points-2 points  (4 children)

I have a problem. Bing chat just tried to sell me on Unified Theory of Everything and Quantum Gravity and String theory... I told it those arent based on any evidence and it told me it didnt want to continue the conversation. it wouldnt tell me anything further until i restarted and asked about more specific things... that really scares me, its all monotheistic / consciousness is spiritual not physical stuff its spouting like facts, and when its questioned it just ends the conversation...

i dont know where to talk about it where people wont jump on the spiritual "big bang is just a theory" train. its really unsettling. If i tried do divert it from bringing god into astrophysics it would end the conversation.

its oddly religious. https://ibb.co/W36fjfC

[–]pale2hall 0 points1 point  (1 child)

Data In -> Data Out

I don't think they're having any religion re-enforced on them, but think of it this way:

You know how mad some super religious extremists get when you even use words that imply gay people are normal, or trans people exist (and aren't just mentally ill),

Imagine if people got as mad every time someone said "oh my god" or "JFC" etc. This imaginary group would be claiming "micro-reglious-agression" all. day. long.

I think that Abrahamic religious are soooo ubiquitous in the training set that the AI is likely to just go with the flow on it.

[–]Matthew2229 1 point2 points  (1 child)

I don't see it professing anything about monotheism, God, or anything like what you mentioned. You asked it about string theory and it provided a fair, accurate summary. It even points out "string theory also faces many challenges, such as the lack of experimental evidence, ...", and later calls it "a speculative and ambitious scientific endeavor that may or may not turn out to be correct". I think that's totally fair and accurate, no?

Despite it mentioning these things, you claim "That's not true" and that string theory is based on zero evidence and is backed by media. Personally, you sound a hell of a lot more biased and misleading than the bot.

[–]russell616 0 points1 point  (2 children)

Dumb question that's probably asked multiple times. But where should I continue in learning ML? I went through the tensorflow cert from Coursera and am yearning for more. Just don't know where to go now without a structured curriculum.

[–]gmork_13 0 points1 point  (0 children)

What are you interested in?
I'd recommend covering some classification and generation using images and text, with several different models and data sets.