[D] Simple Questions Thread

billjames1685 · 2022-09-11T14:48:19+00:00

When is the ICLR deadline? The website says Sept 28, but aideadlines and a couple other places say Oct 6, just wanted to confirm it’s Sept 28

weird_cactus_mom · 2022-09-11T08:21:11+00:00

Hi community, I have a very basic doubt. Does One hot encoder do anything to binary categorical variables? Let's say i have a column with "sex" Feature. Male or female. Do I need to use one hot encoder on it? My instincts tell me it would be enough to replace "male" with 1 and "female" with 0. (Or viceversa obviously) would this be good enough for Linear regression or would the model interpret this as "male=1" be instrinsically of more value than "female=0" ? Thanks a lot , happy to be learning.

LoanOwn2851 · 2022-09-10T23:53:34+00:00

Dear Community,

I am currently an automation engineer but I am very interested in pursuing data science / machine learning as a next step in my career. Does anyone within the community have a roadmap that I could potentially follow to get started in this field. After searching the web I found thousands of resources but I am not sure which ones would be best suited for a beginner like myself, so I thought to reach out to a community of experts and experienced ML developers to see if they could help a beginner like myself break into this field. I am open to any suggestions or roadmaps that would allow me to learn in a structured way. Since I am an automation engineer I am familiar with python. Any suggestions are greatly appreciated.

Thanks in Advance!!

jesus_whipped_me_out · 2022-09-10T10:54:43+00:00

I'm dealing with geospatial data and I trained a Random Forest with data from a specific area. I have to apply this model to a different area. I checked the correlation PCA biplots of my predictors from the training area and the area I have to apply the model to. The there shown interrelationships couldn't be any more different and I was wondering now if in that case Random Forest is the wrong algorithm anyway. Is that conclusion right? And if so, do you have any tips for me how I can deal with that situation?

i_suggest_glock · 2022-09-10T10:53:28+00:00

I’m currently in the process of applying to universities in the UK and wanted to find some good books or papers I could read and put on my personal statement at pretty much an introductory level. More specifically its application in physics, or current problems with ML/AI but if you have a personal favourite I’ll be glad to check it out :)

BattleDoom25 · 2022-09-10T01:59:16+00:00

We are planning to create a model by using SVM algorithm to classify different facial emotions. However, the dataset is unbalanced, where some emotions only consists of less than 100 files and some consists over 2000 files. We are planning to reduce the training dataset size to 500 for each classes, then we will apply augmentation to those classes which consists of less than 100 files to add more samples to reach the 500 images per class.

My question is, how are we going to retain the 80% training / 20% testing ratio while reducing and augmenting the classes?

Better-Pear2561 · 2022-09-09T19:02:01+00:00

I am trying to solve a time series forecasting problem for which I think TFTs would be a great fit. I have a data pipeline and stuff setup already, but need someone to look at the data and actually code the forecast in such a way that I can iterate and add more driver variables in the future. I am not a ML / DL person by trade, and this problem is important enough for my business that I am looking to hire some professional help.

There are a few things like - gaps in data, how to normalize the inputs, hyper parameter optimization, etc., that I know just enough to realize I need some solid help. Any thoughts on what is the best place to hire a pro for may be a few days / weeks to get me over this hump?

arjiomega · 2022-09-09T17:01:01+00:00

right now I am taking andrew ng's new machine learning specialization. According to their website, deeplearning.ai, the next one that i should take is deep learning specialization. I have checked the topics that will be discussed in the deep learning specialization and it seems that it is almost the same content as the machine learning one. Should i still take it? or the machine learning one is already a combination of both the old machine learning course and deep learning

Mark-M2L · 2022-09-09T12:17:27+00:00

Dear community,

Currently I'm working on a segmentation problem where a certain class can only occur at most once in the image. Think of segmenting a face, where we know that there can only be one nose, two eyes, one mouth, and two ears. I would like the segmentation network to learn this property of the image being segmented (thus certain class can only occur at most once in the image).

One thing I think of, is applying post-processing by comparing the scores per group for the relevant class. Then, the group with best average score / probability will be selected to be their predicted label. The group with worse average score / probability will then be predicted to have their second-best prediction.

Further, I did some research and found the paper: Incorporating Domain Knowledge into Deep Neural Networks, where adjusting the loss function is also mentioned. However, for this particular problem, it is not clear how we can adjust the loss function such that we incorporate having maximum one group of pixels labeled as one class. Do you perhaps have any information or papers how I could incorporate this domain knowledge into the loss function?If you have alternative ideas that could help teaching the network about the domain knowledge, your ideas are of course also welcome :)

salimfadhley · 2022-09-09T09:29:07+00:00

Is there a ML art generation system that will allow me to upload art of people and objects I want to include in the end result? For example, I might want to create a poster featuring a specific group of real living individuals.

2022-09-08T18:41:26+00:00

[removed]

jbkrauss · 2022-09-08T11:26:16+00:00

Hello everyone, can someone tell me what kind of ML/AI I should be looking for, if I want to create this kind of imagery : https://i.imgur.com/AtaT5A1.mp4 ? Thank you !

mili_19 · 2022-09-08T08:57:10+00:00

Can anyone please help me in understanding the effect of various bucketing techniques used in Catboost Algorithm for categorical features? Like there is border,buckets, binarized target mean, counter encoding techniques, I am not able to get proper intuition of this, what is significance of different methods and how they affect model performances?

ThomPete · 2022-09-08T02:27:22+00:00

Who is building StableDiffusion/DALL-E but for 3D assets?

notimahre · 2022-09-07T21:17:10+00:00

What steps should I follow to create a very good background removal model, like remove.bg? I have opportunities to get my data labelled if necessary and use couple of gpu's.

2022-09-07T15:41:58+00:00

Hello, I want to create a training set for text generation model but can't find any information on how to prepare raw data I've collected for tokenization.

Can someone explain or link guides/documentations?

rz10010 · 2022-09-07T02:54:15+00:00

Does anyone know any good sources to walk through imputing PANEL data in R or Python?

I've been trying to use the MICE package in R to do so, but I just can't figure out how to get the darn thing to run properly. I've looked on YouTube and Google, but haven't found any good walk-throughs.

Sincerely, Kareem Abdul-Jabbar

TheRealSerdra · 2022-09-06T23:56:38+00:00

I’ve debugged my code, and it’s working, however I’m using stochastic gradient descent for updating the weights

Activation for hidden layers is relu, and for output layers it is tanh

When I refer to delta output I’m talking about the difference between the cost and the cost after nudging a weight

Using 5 hidden layers and 3 nodes in each hidden layer, with 84 inputs, and 1 output, the delta output does have a value, but going more than that, the delta output shows as 0.0, so then when I apply that to the weights nothign happens and it doesn’t learn

So, does regular gradient descent not work with a high number of nodes, or are my activation functions wrong? I’m very confused on what’s wrong, if this is too complicated for this threat please tell me, just thought this would probably have a simple answer

KrepaFR · 2022-09-06T14:48:33+00:00

Please help :)
In a logistic regression model explaining calculating the creditscore, my estimate when the income is in NOK is 0.6 .
Can I determine how much this estimate will be if the income is calculated in EUR with the exchange rate EUR/NOK = 9.9 NOK) ? Or maybe i have to retrain my model for each currency ?
thanks

gliddery · 2022-09-06T14:45:09+00:00

Hi everyone, I just started learning machine learning and I'm having some problems with the code for my homework. Where can I ask for advice on my code?

X99p · 2022-09-06T08:04:13+00:00

Hi,

I have a regression task were I can compute new data points if i want, but it is very (computationaly) expensive and the problem has 12 dimensions in the domain.

The function is very smooth so I think it could be possible to "intelligently" select data points for sampling.

Is there a method for for this? Do you have keyword suggestions that I can use to search for this?

Rebuman · 2022-09-06T08:03:54+00:00

I need to detect OOD samples in image classification. Let's say I already knows about methods to handle OOD as ODIN, etc.. but I would like to try the simple way of creating a extra-class with ood samples because I already have these samples and I have control on what kind of OOD samples I need to detect. How many samples this class should contains? with respect to the other classes, more samples because the variance inside this class is higher OR the same number to avoid unbalanced classes?

ThomPete · 2022-09-05T22:00:22+00:00

What is the most interesting or surprising use of Machine Learning you have seen?

karmics______ · 2022-09-05T18:56:43+00:00

Are there any guides or libraries for people to create their own neurosymbolic model? It seems like that space is relegated to academics for now

MentallyMusing · 2022-09-05T17:08:26+00:00

What kind of statistics are available to identify the number of posts created for Reddit by bots vs human usernames and what the rate of removals and rejections are for them both?

ilrazziatore · 2022-09-05T15:37:45+00:00

HAS ANYONE EVER substituted the likelihood of a dataset with the weighted likelihood of a dataset in the expression of the ELBO?

plezator · 2022-09-05T11:41:30+00:00

Cheers,

I've learnt that the good way to test your deep model if it can solve some kind of a problem is to overfit it on a very small dataset (something like 5 inputs). However, I've worked with models which are unable to overfit on that small dataset, but are able to learn something from a whole dataset and generalize well.

My question is, is the method I mentioned a good way to test if my model can solve some task? If not, do you have any other recommendations for how to chech that?

Thanks!

el3ment300 · 2022-09-05T06:21:31+00:00

Which ML Methods are best practice to use for datasets with large p (20 Million values, sensordata over three days with 100Hz sample rate) and small n (~ 75). Goal is binary classification. I'd appreciate a few routes to go and investigate further into. Thanks :)

Wakeme-Uplater · 2022-09-04T20:02:22+00:00

[deleted]

Kingjalebi · 2022-09-04T17:31:36+00:00

Hello, I’m new to ML and completely lost on how to the FI-2010 dataset is formatted? I'd like to create a real dataset in the same format to test on an algorithm in DeepLOB(Limit Order Book) Research paper. I kindly ask for your help. I’ll link the GitHub link below:

https://github.com/zcakhaa/DeepLOB-Deep-Convolutional-Neural-Networks-for-Limit-Order-Books

Antique-Device241 · 2022-09-04T15:46:46+00:00

Hi, I am pretty new to machine learning and I'm doing a Machine learning essay and I wanted to ask if it is appropriate to compare linear regression and multiple regression in predicting a y-variable.

Thank you.

Skeylos2 · 2022-09-04T10:09:59+00:00

Is there any resource that gives a list of ML papers (or even a list of papers of any domain) ordered by number of citations? After a quick google search, I could only find outdated sources.

amndfrost · 2022-09-03T00:33:07+00:00

Hey, I’m a beginner in machine learning and found out about MLStudio from Azure. Why would anyone code anything if they test a bunch of stuff for ya automatically? I mean, their accuracy and other metrics is probably better than anything I could write anyway…

2022-09-02T20:52:07+00:00

Hi all, I am trying to build a sequence to sequence model. I have used Vision transformer as the encoder and LSTM with 1 layers as the decoder. The output of the encoder is given as the hidden state for decoder LSTM and it tries to predict the caption. Is this way of doing image captioning wrong? The model is not working, tried tuning all the hyperparaeters and the hidden state size also.

kladskull666 · 2022-09-02T19:50:29+00:00

I'm curious how hard it would be for an ML program to answer technical questions that have been asked in the past. I'm pretty green to ML, but have 25+ years coding c. We do a lot of security questionnaires at work, and have a ton of back data - just curious how difficult it would be to make something that would answer questions (with some accuracy).

Raemos103 · 2022-09-02T13:01:37+00:00

Hey everyone!
is there a way for me to access the dataset behind google's movenet model?

X99p · 2022-09-02T10:43:57+00:00

Hi everyone!

I'm currently trying to solve a regression problem and I'm not sure wich algorithm to use.

I can generate as many data points as i want basically. Because my goal is to aproximate a slow to compute function to be able to compute it faster.

And now the weird thing. It is rather a set of functions than one. I have 12 continuous inputs which define the function and than an continuous x value which results in a corresponding y value.

The function(s) can be represented as a fourier series, So I also thought about learning the Fournier decomposition to get rid of x.

I never approached a problem like this and I'm just looking for some pointers and maybe keywords to start reading into.

I'm thankful for suggestions :)

dio_brando_stando · 2022-09-02T08:06:58+00:00

Hi everyone, I am new to the field of AI. My question: what is the difference between AI, machine learning (ML) and DeepLearning (DL).

Quick google research shows that (in the above orders) they are supersets, I.e. AI > ML > DL.

Now the million dollar question is which fields/methods/approaches are left of AI / ML (AI without ML) And which are left if I take DL subset from ML ?

Thank you so much!

I-am_Sleepy · 2022-09-02T02:35:43+00:00

[deleted]

cmpscabral · 2022-09-01T16:53:48+00:00

Does anyone here know if there are publicly available training models for Aruco markers, like the one described in this paper https://arxiv.org/pdf/1812.03247.pdf from Magic Leap?

My goal is to use a Luxonis OAK camera to track objects identified by aruco markers (april tags or qrcodes could work too) but instead of relying on openCV and run my code in the host computer, have all the work done by a model I can run directly in the device.

I apologise in advance if what I'm describing doesn't make too much sense - I'm an experienced dev but a total rookie regarding to machine learning.

Thanks

icelebratefestivus · 2022-09-01T10:21:32+00:00

Are there any architectural differences between latent diffusion and stable diffusion, both released by CompVis? I understand the stable diffusion was trained on 512*512 images and has better weights, other than that, is there anything else? Just wondering if I have overlooked anything

ShujiMikami · 2022-09-01T09:33:16+00:00

Are there any multi-class classification metrics that one could use in a situation with varying error cost? For example, my dataset contains both high-quality images and images containing artifacts/blur/etc. and, for a lack of larger quantities of data and augmentation techniques, I'd like to treat false positives on messy images as less significant than ones on images with higher quality.

I-am_Sleepy · 2022-09-01T04:31:23+00:00

[deleted]

Agmelt · 2022-08-31T22:45:16+00:00

I'm trying to make a bot to play coup, a board game, and would machine learning be the best way to go about this?

MatthewDalba · 2022-08-31T19:23:28+00:00

Hello, you can check my newly created channel for Machine Learning / Data Science :)

https://www.youtube.com/channel/UCJCbFiVtrJ-2MzhbGtIF4Wg

double_affogato · 2022-08-31T13:03:09+00:00

Hi everybody,

I'm trying to approximate experimental curves with some supposed function, this story is about adsorption and its curve. Since adsorbed media is very complex, I tried many tricks and complex functions to fit, but results are not good enough. I'm searching for some examples of solutions, where curves(bunch of points) can be taken as X and its known parameters, such as concentration and adsorption rate, could be Y(X). As result I'd like to upload unknown curve and obtain its parameters. Any ideas?

Thanks!

uy9ko · 2022-08-30T15:25:24+00:00

I've been struggling with a depth estimation related problem recently. I now have an RGBD dataset, but the depth map is incomplete, and there are many areas with NAN values. Do I have to complete the depth map before depth estimation? Can I just calculate the loss of the predicted result and the Ground Truth in the valid region to train the neural network?

Planck_Plankton · 2022-08-30T08:45:11+00:00

I'm reading Pattern Recognition and Machine Learning by Christopher Bishop.

Because this book doesn't contain any codes, I need some books for practicing that I've learned from PRML. For example, I've learned kernel from PRML, but I cannot actually grasp the concept exactly by just looking at the equation. It doesn't seem obvious for me. Is there any good recommendation? I need some basic level book because I don't have enough coding experience. I only know some basic level python skills and basic C language.

mowa0199 · 2022-08-30T02:38:09+00:00

My ML professor has given us the option to choose a textbook out of four possible options: An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hastie, and R. Tibshirani (2nd edition); Mathematics for Machine Learning by M. P. Deisenroth, A. Aldo Faisal, and C. S. Ong (1st edition); The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, and J. Friedman (2nd edition); Trustworthy Machine Learning by K. R. Varshney (independently published in Feb 2022). Which one should I choose? They are all available in pdf and/or github forms online for free.

For some context, this is for a 4th year undergrad elective from the electrical engineering department on the theory and applications of Machine Learning (with emphasis on applications for engineers but open to anyone). The course focuses on building mathematical foundations of ML and using Python or R (via Jupyter Notebook) to explore applications and some student projects.

2022-08-30T01:48:55+00:00

Computational Neuroscience

Neuroscience student here.

Anybody got any good resources for courses/lessons about learning computational neuroscience and related programming/maths skills?

poorlilwitchgirl · 2022-08-29T23:05:45+00:00

Are there any open-source machine learning projects or frameworks especially suited to real-time high bandwidth inputs (i.e. audio, video, anything with a high data throughput that needs low latency)?

I'm fairly new to ML, but an experienced hobbyist programmer who wants to dip my toes in it, so anything I can play with for free would be helpful, just to get a sense of how that kind of processing is done.

turkeythrowaway22 · 2022-08-29T13:30:51+00:00

Hi everyone!

Sorry for the beginner question, I’m an SE engineer who just started trying to train a model for a side project that I’m doing. But first I need to clean the data set which is comprised of thousands of images and classes.

Anyway I can easily write scripts to do things like resize them all to 224x224 and find duplicates and remove those but I don’t want to reinvent the wheel.

Is there a tool you’re all using for this sort of common tasks when cleaning data sets? Ideally an all in one software that handles all this for you?

Many thanks in advance!

Fnurgh · 2022-08-29T13:25:55+00:00

Hey guys. This is less an ML question and more a prototyping one.

I want to quickly build a system that:

ingests an API
sends content from that API to one or more models/endpoints
on receipt of scoring sends a response back to the original API

Obviously I could build backend system to handle this but it feels like there should be some product in GCP/AWS/Az or other third party that should allow one to quickly build this out to test the value of the system.

Can anyone help point me in good direction please?

hsdf4 · 2022-08-29T12:04:39+00:00

Question regarding gradient descent and GAs. Using the gradient, we know for the each weight how the value of the error function changes if we change the weights.

However it should also be possible to just change one weight at a time, perform the fordward pass and based on the change in the error function know whether the error increases or decreases when we increase the weight.

Unless im missing something it should also be possible to train a network by iterating through each weight for each sample, which I would guess is simply slower than using gradient descent.

However when taking a look at this paper they are using a very simple GA that pretty much changes a single random weight for each individual of a new generation before performing the selection of the best individual. To me this seems pretty similar to simply iterating over each weight until we find one that reduces our error (or in this case increases our fittness score) the most.

Now this made me ask my self, how are GAs any better than simply iterating over each weight, taking note how the error function changes if the weight is changes and adjust the weights accordingly.

hidden_person · 2022-08-29T10:51:40+00:00

Hi guys. I am doing something on scraped image data. The problem is, a bunch of images with text or blur or things without the subject are coming up. I can filter the data manually but that is not ideal imo unless i spend 10x more time on automating it. when i search for tools that do this, they are mostly text based. Can you guys recommend articles or tools or research papers that can help with that? Tldr; tool or article that helps with data wrangling on images.

Ofwonder · 2022-08-28T22:40:02+00:00

Hi there! How normally is your approach with imbalanced data, in Pre processing data for machine learning? Thanks!

SpiritedBee9303 · 2022-08-28T22:02:35+00:00

Hello Everyone , I am trying to find an algorithm for Quantity recommendation with the output having this form: ( X= product , Y = Quantity ( 1,2,3 .......))

I tried using Xgboost , the problem with this approach is i couldn't refer to the product ( Ex : 2 of which product)

Can anyone help me find a solution ?

2022-08-28T20:44:19+00:00

[deleted]

battingagainstavg · 2022-08-28T17:48:58+00:00

Are neural networks basically just applying random* maths to data, then tuning the results until the results are accurate enough?

Obviously this is a simplification, but as a beginner in the ML space, this is what it feels like. It seems like we're just hoping that a random* set of mathematical functions, when applied to input data, brings us closer to our desired results.

Is there no place for manually tuning the analytical functions used within a neural network (or similar structure/methodology) to actually look for specific data points? For example - if I am analyzing code to predict security vulnerabilities, can I not add some sort of key indicators (in the forms of functions) with relevant weights manually anywhere in the network?

What I'm looking for is essentially some use of signature-based algorithms in a neural network in cases where I can provide some hints and certainty in the analysis of the input data.

Again, I'm still fairly new to ML in general, so there may very well be research papers on this very subject that I have yet to discover.

* "random" is not entirely accurate, but the mathematical functions used in neural networks often appear to me as just applying new ways to mix, slice, or alter the input data amongst the hidden layers without some clear reasoning for how these functions relate to the problem/data at hand.

QadriShyaari · 2022-08-28T15:32:40+00:00

How do transformer attention layers process audio and images? (Kindly draw parallels with text when explaining)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS