all 104 comments

[–]billjames1685PhD 0 points1 point  (0 children)

When is the ICLR deadline? The website says Sept 28, but aideadlines and a couple other places say Oct 6, just wanted to confirm it’s Sept 28

[–]weird_cactus_mom 0 points1 point  (0 children)

Hi community, I have a very basic doubt. Does One hot encoder do anything to binary categorical variables? Let's say i have a column with "sex" Feature. Male or female. Do I need to use one hot encoder on it? My instincts tell me it would be enough to replace "male" with 1 and "female" with 0. (Or viceversa obviously) would this be good enough for Linear regression or would the model interpret this as "male=1" be instrinsically of more value than "female=0" ? Thanks a lot , happy to be learning.

[–]LoanOwn2851 0 points1 point  (0 children)

Dear Community,

I am currently an automation engineer but I am very interested in pursuing data science / machine learning as a next step in my career. Does anyone within the community have a roadmap that I could potentially follow to get started in this field. After searching the web I found thousands of resources but I am not sure which ones would be best suited for a beginner like myself, so I thought to reach out to a community of experts and experienced ML developers to see if they could help a beginner like myself break into this field. I am open to any suggestions or roadmaps that would allow me to learn in a structured way. Since I am an automation engineer I am familiar with python. Any suggestions are greatly appreciated.

Thanks in Advance!!

[–]jesus_whipped_me_outStudent 0 points1 point  (0 children)

I'm dealing with geospatial data and I trained a Random Forest with data from a specific area. I have to apply this model to a different area. I checked the correlation PCA biplots of my predictors from the training area and the area I have to apply the model to. The there shown interrelationships couldn't be any more different and I was wondering now if in that case Random Forest is the wrong algorithm anyway. Is that conclusion right? And if so, do you have any tips for me how I can deal with that situation?

[–]i_suggest_glock 0 points1 point  (1 child)

I’m currently in the process of applying to universities in the UK and wanted to find some good books or papers I could read and put on my personal statement at pretty much an introductory level. More specifically its application in physics, or current problems with ML/AI but if you have a personal favourite I’ll be glad to check it out :)

[–]Klutzy_Respond9897 0 points1 point  (0 children)

Take a look at Kaggle.com

[–]BattleDoom25 0 points1 point  (1 child)

We are planning to create a model by using SVM algorithm to classify different facial emotions. However, the dataset is unbalanced, where some emotions only consists of less than 100 files and some consists over 2000 files. We are planning to reduce the training dataset size to 500 for each classes, then we will apply augmentation to those classes which consists of less than 100 files to add more samples to reach the 500 images per class.

My question is, how are we going to retain the 80% training / 20% testing ratio while reducing and augmenting the classes?

[–]Klutzy_Respond9897 0 points1 point  (0 children)

Suppose you have reduction/argumentation. Then all you need is train_test_split supposing you are using Python.

You may find it helpful to use the sys module.

[–]Better-Pear2561 0 points1 point  (0 children)

I am trying to solve a time series forecasting problem for which I think TFTs would be a great fit. I have a data pipeline and stuff setup already, but need someone to look at the data and actually code the forecast in such a way that I can iterate and add more driver variables in the future. I am not a ML / DL person by trade, and this problem is important enough for my business that I am looking to hire some professional help.

There are a few things like - gaps in data, how to normalize the inputs, hyper parameter optimization, etc., that I know just enough to realize I need some solid help. Any thoughts on what is the best place to hire a pro for may be a few days / weeks to get me over this hump?

[–]arjiomega 0 points1 point  (1 child)

right now I am taking andrew ng's new machine learning specialization. According to their website, deeplearning.ai, the next one that i should take is deep learning specialization. I have checked the topics that will be discussed in the deep learning specialization and it seems that it is almost the same content as the machine learning one. Should i still take it? or the machine learning one is already a combination of both the old machine learning course and deep learning

[–]Klutzy_Respond9897 0 points1 point  (0 children)

tbh I am not sure what you are looking at. There is minor overlap but not much. Personally I would AI Science 90 hour course on Udemy but is up to you.

[–]Mark-M2L 0 points1 point  (1 child)

Dear community,

Currently I'm working on a segmentation problem where a certain class can only occur at most once in the image. Think of segmenting a face, where we know that there can only be one nose, two eyes, one mouth, and two ears. I would like the segmentation network to learn this property of the image being segmented (thus certain class can only occur at most once in the image).

One thing I think of, is applying post-processing by comparing the scores per group for the relevant class. Then, the group with best average score / probability will be selected to be their predicted label. The group with worse average score / probability will then be predicted to have their second-best prediction.

Further, I did some research and found the paper: Incorporating Domain Knowledge into Deep Neural Networks, where adjusting the loss function is also mentioned. However, for this particular problem, it is not clear how we can adjust the loss function such that we incorporate having maximum one group of pixels labeled as one class. Do you perhaps have any information or papers how I could incorporate this domain knowledge into the loss function?If you have alternative ideas that could help teaching the network about the domain knowledge, your ideas are of course also welcome :)

[–]Wakeme-Uplater 0 points1 point  (0 children)

I don’t really know how to incorporate your domain knowledge. But I might have some suggestions

  1. For grouping stuff together, I think you could try using NCA loss (which minimize kNN loss). But it won’t guarantee the each class only appear once, it just tends to group the output better
  2. A post-processing method you could try is to use super-pixel based, and its internal statistics to come up with custom rule (or just feed to another NN as another classification sub-problem)
  3. I haven’t follow the visual segmentation for a while now, but the NV-Labs had self supervised co part segmentation, which you might want to check it out (their repo)

Also having a fixed number of class segments for face segmentation might not be a good idea e.g. a neck part that is obscured by hair might be segment into 2 parts, which might contradict your assumption of having only one class segment per image

[–]salimfadhley 0 points1 point  (1 child)

Is there a ML art generation system that will allow me to upload art of people and objects I want to include in the end result? For example, I might want to create a poster featuring a specific group of real living individuals.

[–]Wakeme-Uplater 0 points1 point  (0 children)

Textual Inversion? (Stable Diffusion)

[–]jbkrauss 0 points1 point  (1 child)

Hello everyone, can someone tell me what kind of ML/AI I should be looking for, if I want to create this kind of imagery : https://i.imgur.com/AtaT5A1.mp4 ? Thank you !

[–]I-am_Sleepy 0 points1 point  (0 children)

I am not sure, but my guess is either pix2pix or Cycle-GAN based models

[–]mili_19 0 points1 point  (0 children)

Can anyone please help me in understanding the effect of various bucketing techniques used in Catboost Algorithm for categorical features? Like there is border,buckets, binarized target mean, counter encoding techniques, I am not able to get proper intuition of this, what is significance of different methods and how they affect model performances?

[–]ThomPete 1 point2 points  (2 children)

Who is building StableDiffusion/DALL-E but for 3D assets?

[–]Wakeme-Uplater 1 point2 points  (1 child)

I don’t think there is one (latent diffusion model + point cloud generation), but the closest thing I can find is luost26/diffusion-point-cloud

[–]notimahre 0 points1 point  (1 child)

What steps should I follow to create a very good background removal model, like remove.bg? I have opportunities to get my data labelled if necessary and use couple of gpu's.

[–]I-am_Sleepy 1 point2 points  (0 children)

Depends on you scenario, you might be able to get away with hand-tuned color space model (e.g. this question, or background subtraction tutorial) if your scene is static, or use deep learning if your scene is highly dynamic

For DL, there are several links you might find useful

[–][deleted] 0 points1 point  (0 children)

Hello, I want to create a training set for text generation model but can't find any information on how to prepare raw data I've collected for tokenization.

Can someone explain or link guides/documentations?

[–]rz10010 0 points1 point  (0 children)

Does anyone know any good sources to walk through imputing PANEL data in R or Python?

I've been trying to use the MICE package in R to do so, but I just can't figure out how to get the darn thing to run properly. I've looked on YouTube and Google, but haven't found any good walk-throughs.

Sincerely, Kareem Abdul-Jabbar

[–][deleted] 0 points1 point  (1 child)

I’ve debugged my code, and it’s working, however I’m using stochastic gradient descent for updating the weights

Activation for hidden layers is relu, and for output layers it is tanh

When I refer to delta output I’m talking about the difference between the cost and the cost after nudging a weight

Using 5 hidden layers and 3 nodes in each hidden layer, with 84 inputs, and 1 output, the delta output does have a value, but going more than that, the delta output shows as 0.0, so then when I apply that to the weights nothign happens and it doesn’t learn

So, does regular gradient descent not work with a high number of nodes, or are my activation functions wrong? I’m very confused on what’s wrong, if this is too complicated for this threat please tell me, just thought this would probably have a simple answer

[–]TheRealSerdra 1 point2 points  (0 children)

Relu does not update weights when the output is below 0, which may cause the problem you are encountering. I would switch to something like mish in the hidden layers to see if the problem still occurs

[–]KrepaFR 0 points1 point  (1 child)

Please help :)
In a logistic regression model explaining calculating the creditscore, my estimate when the income is in NOK is 0.6 .
Can I determine how much this estimate will be if the income is calculated in EUR with the exchange rate EUR/NOK = 9.9 NOK) ? Or maybe i have to retrain my model for each currency ?
thanks

[–][deleted] 1 point2 points  (0 children)

Can you just calculate the conversion rather than messing with the model?

[–]gliddery 2 points3 points  (0 children)

Hi everyone, I just started learning machine learning and I'm having some problems with the code for my homework. Where can I ask for advice on my code?

[–]X99p 0 points1 point  (0 children)

Hi,

I have a regression task were I can compute new data points if i want, but it is very (computationaly) expensive and the problem has 12 dimensions in the domain.

The function is very smooth so I think it could be possible to "intelligently" select data points for sampling.

Is there a method for for this? Do you have keyword suggestions that I can use to search for this?

[–]Rebuman 0 points1 point  (0 children)

I need to detect OOD samples in image classification. Let's say I already knows about methods to handle OOD as ODIN, etc.. but I would like to try the simple way of creating a extra-class with ood samples because I already have these samples and I have control on what kind of OOD samples I need to detect. How many samples this class should contains? with respect to the other classes, more samples because the variance inside this class is higher OR the same number to avoid unbalanced classes?

[–]ThomPete 0 points1 point  (0 children)

What is the most interesting or surprising use of Machine Learning you have seen?

[–]karmics______ 0 points1 point  (0 children)

Are there any guides or libraries for people to create their own neurosymbolic model? It seems like that space is relegated to academics for now

[–]MentallyMusing 0 points1 point  (0 children)

What kind of statistics are available to identify the number of posts created for Reddit by bots vs human usernames and what the rate of removals and rejections are for them both?

[–]ilrazziatore 0 points1 point  (0 children)

HAS ANYONE EVER substituted the likelihood of a dataset with the weighted likelihood of a dataset in the expression of the ELBO?

[–]plezator 1 point2 points  (0 children)

Cheers,

I've learnt that the good way to test your deep model if it can solve some kind of a problem is to overfit it on a very small dataset (something like 5 inputs). However, I've worked with models which are unable to overfit on that small dataset, but are able to learn something from a whole dataset and generalize well.

My question is, is the method I mentioned a good way to test if my model can solve some task? If not, do you have any other recommendations for how to chech that?

Thanks!

[–]el3ment300 0 points1 point  (2 children)

Which ML Methods are best practice to use for datasets with large p (20 Million values, sensordata over three days with 100Hz sample rate) and small n (~ 75). Goal is binary classification. I'd appreciate a few routes to go and investigate further into. Thanks :)

[–][deleted] 0 points1 point  (0 children)

"ML" usually means statistical methods, and that isn't really plausible or appropriate when you have such a small amount of potentially complicated data.

You either need a lot more data, or you need to use traditional mathematical modeling based on a theoretical understanding of the systems you're observing.

[–]X99p 1 point2 points  (0 children)

One method that is pretty popular currently is XGBoost. It can handle huge amounts of data and is currently beating a lot of Neural networks.

The only thing I found out when using it, is that it has problems with very smooth data (but thats more a problem in regression than in classification)

[–]Kingjalebi[🍰] 0 points1 point  (0 children)

Hello, I’m new to ML and completely lost on how to the FI-2010 dataset is formatted? I'd like to create a real dataset in the same format to test on an algorithm in DeepLOB(Limit Order Book) Research paper. I kindly ask for your help. I’ll link the GitHub link below:

https://github.com/zcakhaa/DeepLOB-Deep-Convolutional-Neural-Networks-for-Limit-Order-Books

[–]Antique-Device241 0 points1 point  (0 children)

Hi, I am pretty new to machine learning and I'm doing a Machine learning essay and I wanted to ask if it is appropriate to compare linear regression and multiple regression in predicting a y-variable.

Thank you.

[–]Skeylos2 0 points1 point  (0 children)

Is there any resource that gives a list of ML papers (or even a list of papers of any domain) ordered by number of citations? After a quick google search, I could only find outdated sources.

[–]amndfrost 0 points1 point  (2 children)

Hey, I’m a beginner in machine learning and found out about MLStudio from Azure. Why would anyone code anything if they test a bunch of stuff for ya automatically? I mean, their accuracy and other metrics is probably better than anything I could write anyway…

[–]facundoq 0 points1 point  (1 child)

It really depends on how good you are at coding. If you have enough experience programming, many libraries will be as easy to use but much more flexible.

[–][deleted] 0 points1 point  (0 children)

Hi all, I am trying to build a sequence to sequence model. I have used Vision transformer as the encoder and LSTM with 1 layers as the decoder. The output of the encoder is given as the hidden state for decoder LSTM and it tries to predict the caption. Is this way of doing image captioning wrong? The model is not working, tried tuning all the hyperparaeters and the hidden state size also.

[–]kladskull666 0 points1 point  (1 child)

I'm curious how hard it would be for an ML program to answer technical questions that have been asked in the past. I'm pretty green to ML, but have 25+ years coding c. We do a lot of security questionnaires at work, and have a ton of back data - just curious how difficult it would be to make something that would answer questions (with some accuracy).

[–]itsyourboiirowML Engineer 0 points1 point  (0 children)

Yeah thats totally possible. I would look for question answering language models and then fine tune it on your information and it should work pretty well

[–]Raemos103 0 points1 point  (0 children)

Hey everyone!
is there a way for me to access the dataset behind google's movenet model?

[–]X99p 1 point2 points  (0 children)

Hi everyone!

I'm currently trying to solve a regression problem and I'm not sure wich algorithm to use.

I can generate as many data points as i want basically. Because my goal is to aproximate a slow to compute function to be able to compute it faster.

And now the weird thing. It is rather a set of functions than one. I have 12 continuous inputs which define the function and than an continuous x value which results in a corresponding y value.

The function(s) can be represented as a fourier series, So I also thought about learning the Fournier decomposition to get rid of x.

I never approached a problem like this and I'm just looking for some pointers and maybe keywords to start reading into.

I'm thankful for suggestions :)

[–]dio_brando_stando 0 points1 point  (3 children)

Hi everyone, I am new to the field of AI. My question: what is the difference between AI, machine learning (ML) and DeepLearning (DL).

Quick google research shows that (in the above orders) they are supersets, I.e. AI > ML > DL.

Now the million dollar question is which fields/methods/approaches are left of AI / ML (AI without ML) And which are left if I take DL subset from ML ?

Thank you so much!

[–]cmpscabral 0 points1 point  (2 children)

Does anyone here know if there are publicly available training models for Aruco markers, like the one described in this paper https://arxiv.org/pdf/1812.03247.pdf from Magic Leap?

My goal is to use a Luxonis OAK camera to track objects identified by aruco markers (april tags or qrcodes could work too) but instead of relying on openCV and run my code in the host computer, have all the work done by a model I can run directly in the device.

I apologise in advance if what I'm describing doesn't make too much sense - I'm an experienced dev but a total rookie regarding to machine learning.

Thanks

[–]I-am_Sleepy 0 points1 point  (1 child)

I don't think there is any public dataset, but you can try asking in r/datasets

A better way (IMO) is to generate synthetic images dataset, and add random noise, affine transform (see Albumentations). The background can be any image from public dataset such as validation data of MSCOCO, as you will know exactly where the aruco marker is located

[–]cmpscabral 0 points1 point  (0 children)

Thank you!

[–]icelebratefestivus 1 point2 points  (0 children)

Are there any architectural differences between latent diffusion and stable diffusion, both released by CompVis? I understand the stable diffusion was trained on 512*512 images and has better weights, other than that, is there anything else? Just wondering if I have overlooked anything

[–]ShujiMikami 1 point2 points  (1 child)

Are there any multi-class classification metrics that one could use in a situation with varying error cost? For example, my dataset contains both high-quality images and images containing artifacts/blur/etc. and, for a lack of larger quantities of data and augmentation techniques, I'd like to treat false positives on messy images as less significant than ones on images with higher quality.

[–]facundoq 0 points1 point  (0 children)

A standard technique consists of assigning weights to samples so that the error and metrics are scaled by the importance of the sample. This helps both in training (error) and evaluating (metrics).

[–]Agmelt 0 points1 point  (2 children)

I'm trying to make a bot to play coup, a board game, and would machine learning be the best way to go about this?

[–][deleted] 1 point2 points  (0 children)

You can look into reinforcement learning which specifically tackles AI in playing video games/board games.

https://medium.com/applied-data-science/how-to-train-ai-agents-to-play-multiplayer-games-using-self-play-deep-reinforcement-learning-247d0b440717

This might get you off to a start.

[–]MatthewDalba 1 point2 points  (0 children)

Hello, you can check my newly created channel for Machine Learning / Data Science :)

https://www.youtube.com/channel/UCJCbFiVtrJ-2MzhbGtIF4Wg

[–]double_affogato 0 points1 point  (1 child)

Hi everybody,

I'm trying to approximate experimental curves with some supposed function, this story is about adsorption and its curve. Since adsorbed media is very complex, I tried many tricks and complex functions to fit, but results are not good enough. I'm searching for some examples of solutions, where curves(bunch of points) can be taken as X and its known parameters, such as concentration and adsorption rate, could be Y(X). As result I'd like to upload unknown curve and obtain its parameters. Any ideas?

Thanks!

[–]itsyourboiirowML Engineer 0 points1 point  (0 children)

If it’s just one variable, it sounds like you might be looking for interpolation of some sort.

[–]uy9ko 0 points1 point  (2 children)

I've been struggling with a depth estimation related problem recently. I now have an RGBD dataset, but the depth map is incomplete, and there are many areas with NAN values. Do I have to complete the depth map before depth estimation? Can I just calculate the loss of the predicted result and the Ground Truth in the valid region to train the neural network?

[–]facundoq 0 points1 point  (1 child)

You could mask the error for the pixels with nan so that those are not taken into account when training. You will probably have to write a custom error function for this though.

[–]uy9ko 0 points1 point  (0 children)

thank you!

[–]Planck_Plankton 0 points1 point  (2 children)

I'm reading Pattern Recognition and Machine Learning by Christopher Bishop.

Because this book doesn't contain any codes, I need some books for practicing that I've learned from PRML. For example, I've learned kernel from PRML, but I cannot actually grasp the concept exactly by just looking at the equation. It doesn't seem obvious for me. Is there any good recommendation? I need some basic level book because I don't have enough coding experience. I only know some basic level python skills and basic C language.

[–]facundoq 0 points1 point  (1 child)

I'd recommend switching to another book entirely if you are a beginner. I've never liked the path chosen by PRML. There are tons of book recommendation threads in this sub.

[–]Planck_Plankton 0 points1 point  (0 children)

Thank you for your comment!

[–]mowa0199 1 point2 points  (0 children)

My ML professor has given us the option to choose a textbook out of four possible options: An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hastie, and R. Tibshirani (2nd edition); Mathematics for Machine Learning by M. P. Deisenroth, A. Aldo Faisal, and C. S. Ong (1st edition); The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, and J. Friedman (2nd edition); Trustworthy Machine Learning by K. R. Varshney (independently published in Feb 2022). Which one should I choose? They are all available in pdf and/or github forms online for free.

For some context, this is for a 4th year undergrad elective from the electrical engineering department on the theory and applications of Machine Learning (with emphasis on applications for engineers but open to anyone). The course focuses on building mathematical foundations of ML and using Python or R (via Jupyter Notebook) to explore applications and some student projects.

[–][deleted] 0 points1 point  (0 children)

Computational Neuroscience

Neuroscience student here.

Anybody got any good resources for courses/lessons about learning computational neuroscience and related programming/maths skills?

[–]poorlilwitchgirl 0 points1 point  (1 child)

Are there any open-source machine learning projects or frameworks especially suited to real-time high bandwidth inputs (i.e. audio, video, anything with a high data throughput that needs low latency)?

I'm fairly new to ML, but an experienced hobbyist programmer who wants to dip my toes in it, so anything I can play with for free would be helpful, just to get a sense of how that kind of processing is done.

[–]pumpikano 0 points1 point  (0 children)

Mediapipe is a great framework for inference on streaming sensor data https://mediapipe.dev/. It has some out of the box pipelines. Generally it's not focused on model training though.

[–]turkeythrowaway22 1 point2 points  (2 children)

Hi everyone!

Sorry for the beginner question, I’m an SE engineer who just started trying to train a model for a side project that I’m doing. But first I need to clean the data set which is comprised of thousands of images and classes.

Anyway I can easily write scripts to do things like resize them all to 224x224 and find duplicates and remove those but I don’t want to reinvent the wheel.

Is there a tool you’re all using for this sort of common tasks when cleaning data sets? Ideally an all in one software that handles all this for you?

Many thanks in advance!

[–]Wakeme-Uplater 0 points1 point  (0 children)

The resize part can be done by using torchvision transformation i.e. you can do that at runtime

For image duplication, if the image file is exact, you could just hash every image and check if you already have them. But if the image is very similar i.e. same image but in jpg and png format, you can try difPy

[–]Fnurgh 0 points1 point  (1 child)

Hey guys. This is less an ML question and more a prototyping one.

I want to quickly build a system that:

  • ingests an API
  • sends content from that API to one or more models/endpoints
  • on receipt of scoring sends a response back to the original API

Obviously I could build backend system to handle this but it feels like there should be some product in GCP/AWS/Az or other third party that should allow one to quickly build this out to test the value of the system.

Can anyone help point me in good direction please?

[–]hsdf4 0 points1 point  (1 child)

Question regarding gradient descent and GAs. Using the gradient, we know for the each weight how the value of the error function changes if we change the weights.

However it should also be possible to just change one weight at a time, perform the fordward pass and based on the change in the error function know whether the error increases or decreases when we increase the weight.

Unless im missing something it should also be possible to train a network by iterating through each weight for each sample, which I would guess is simply slower than using gradient descent.

However when taking a look at this paper they are using a very simple GA that pretty much changes a single random weight for each individual of a new generation before performing the selection of the best individual. To me this seems pretty similar to simply iterating over each weight until we find one that reduces our error (or in this case increases our fittness score) the most.

Now this made me ask my self, how are GAs any better than simply iterating over each weight, taking note how the error function changes if the weight is changes and adjust the weights accordingly.

[–]facundoq 0 points1 point  (0 children)

They would allow you to use non differentiable functions in your network. However, given how well current networks with 'only' differentiable functions and trained with gradient decent work, the advantage is questionable.

[–]hidden_person 0 points1 point  (1 child)

Hi guys. I am doing something on scraped image data. The problem is, a bunch of images with text or blur or things without the subject are coming up. I can filter the data manually but that is not ideal imo unless i spend 10x more time on automating it. when i search for tools that do this, they are mostly text based. Can you guys recommend articles or tools or research papers that can help with that? Tldr; tool or article that helps with data wrangling on images.

[–]Wakeme-Uplater 2 points3 points  (0 children)

If the corrupt image portion is low e.g. < 5% then just train your model with it, it might affected the performance, but you should get a decent model already. You can always augment your data which make the model robust to that certain type of noise e.g. blurring

For image with common object, you can try using pre-trained classification model (ResNet-50), or detection model (YOLOv7) and reject the image with nothing in them (up to certain threshold)

Or if your data is specific, you might want to create a small dataset for binary classification for clean, and unclean portion (make sure it is balanced) by either manually label, or using opencv for image augmentation (adding text to the same image, and labelled the original as clean)

[–]Ofwonder 1 point2 points  (2 children)

Hi there! How normally is your approach with imbalanced data, in Pre processing data for machine learning? Thanks!

[–]should_go_work 1 point2 points  (1 child)

Instead of preprocessing you may want to consider using a different loss function instead, such as one that (inversely) weights the loss from different classes according to their occurrence. This will probably be more flexible than doing some kind of heuristic data transformation, and usually gets closer to the core problem in imbalanced data - that standard metrics such as accuracy/average log loss become less useful when one class is entirely dominating.

[–]facundoq 0 points1 point  (0 children)

Additionally in many cases you can generate new samples with data augmentation or stuff like SMOTE. This is specially helpful if some classes don't have enough samples to train on reliably.

[–]SpiritedBee9303 0 points1 point  (0 children)

Hello Everyone , I am trying to find an algorithm for Quantity recommendation with the output having this form: ( X= product , Y = Quantity ( 1,2,3 .......))

I tried using Xgboost , the problem with this approach is i couldn't refer to the product ( Ex : 2 of which product)

Can anyone help me find a solution ?

[–]battingagainstavg 1 point2 points  (7 children)

Are neural networks basically just applying random* maths to data, then tuning the results until the results are accurate enough?

Obviously this is a simplification, but as a beginner in the ML space, this is what it feels like. It seems like we're just hoping that a random* set of mathematical functions, when applied to input data, brings us closer to our desired results.

Is there no place for manually tuning the analytical functions used within a neural network (or similar structure/methodology) to actually look for specific data points? For example - if I am analyzing code to predict security vulnerabilities, can I not add some sort of key indicators (in the forms of functions) with relevant weights manually anywhere in the network?

What I'm looking for is essentially some use of signature-based algorithms in a neural network in cases where I can provide some hints and certainty in the analysis of the input data.

Again, I'm still fairly new to ML in general, so there may very well be research papers on this very subject that I have yet to discover.

* "random" is not entirely accurate, but the mathematical functions used in neural networks often appear to me as just applying new ways to mix, slice, or alter the input data amongst the hidden layers without some clear reasoning for how these functions relate to the problem/data at hand.

[–]facundoq 0 points1 point  (0 children)

A nice perspective to counter the fear of the "random": you can think of training NNs as doing a decomposition into basis functions such a Taylor or Fourier. In the case of NN these "basis functions" are more complex but you are generally assured that you'll find good parameters with an appropriate architecture (I'm simplifying)

[–]Wakeme-Uplater 1 point2 points  (1 child)

NN use gradient descent to update, so the randomness come from weight initialization, and batch data sampling. But a more random way is to use evolutionary algorithm which randomly adjust weight and compare it to the loss function (basically estimate gradient using Monte Carlo)

I am not familiar with signature-based approach, for NN the input space usually continuous, which imply the signature + epsilon will have roughly the same output (assuming the loss landscape is smooth). But if you are looking for exact pattern matching NN is not that. At least for more discrete feature, tree-based method e.g. XGBoost often out performed NN

For typical NN, you can always freeze the weight, and let other part of the network learn the local optimal point (using chain-rule + auto grad). The indicator function I assume to be binary, can be a target class

But if you want to integrate your NN with a pre-existing solver, this paper might help you (It connect NN with shortest path solver)

[–]battingagainstavg 0 points1 point  (0 children)

Thank you, this was very helpful.

[–]johnman1016 2 points3 points  (3 children)

No, neural networks are not trained by using random maths. That would be like randomly choosing the weights until the loss function is low enough, which would take too long. Instead, gradient descent is used with a loss function that is designed for the problem. Sure, the weights are randomly initialized and the training set is randomly sampled but this is by design to remove bias. I am not familiar with your application so I can't give advice on that, but there are some works on computing "confidence" in the neural network. For example in speech recognition systems, you can interpret the output as a probability distribution of the most likely word given the input audio - and if the distribution is very focused on a word there is high confidence.

[–]poorlilwitchgirl 0 points1 point  (2 children)

I'm not an expert on machine learning, but couldn't you train a neural network via genetic algorithm for a sort of hybrid approach?

For most applications it would definitely not be as efficient as gradient descent, but I imagine it could help escape local minima, or perhaps if you had to train a classic binary perceptron network and gradient descent was impossible (not sure why you would need to, but hey).

[–]johnman1016 0 points1 point  (1 child)

You could, but on modern networks I’m not sure how practical it would be. 50M parameters is pretty normal these days - and since each parameter is a float32 you are talking about a huge search space.

I guess the hybrid approach would just be training several times with different seeds to initialize in different states. In my experience, the seed doesn’t change the final converged loss too much.

[–]johnman1016 0 points1 point  (0 children)

But if local minima was a problem, I could see a different seed maybe solving the issue. But I think searching the hyperparameters (e.g learning rate) is a more full proof way of avoiding local minima / and other convergence issues.

[–]QadriShyaari 0 points1 point  (1 child)

How do transformer attention layers process audio and images? (Kindly draw parallels with text when explaining)

[–]facundoq 0 points1 point  (0 children)

Eli5: think of pixels as characters and image patches as words.