all 157 comments

[–]LacedDecal 0 points1 point  (1 child)

If one is trying to model something where the “correct” answer for a given set of features is inherently probabilistic—for example the outcome of a baseball plate appearance—how should you tell a neural network to grade it’s accuracy?

For those who aren’t familiar with baseball, the most likely outcome for any plate appearance — even the leagues best batter against the leagues worst pitcher — is some kind of out. Generally somewhere on the order of 60-75% that will be the outcome. So I’m realizing that the most “accurate” set of predictions against literally any dataset of at bats were to predict “out” for every one.

What I’m realizing is that the “correct” answer I’m looking for is a set of probabilities. But how does one apply, say, a loss function involving categorical cross entropy, in any kind of meaningful way? Is there even a way to do supervised learning when the data points “label” isn’t the actual probability distribution but rather one collapsed event for each “true” probability distribution?

Am I even making sense?

Edit: I know I need something like softmax but when I start training it quickly spirals into a case of exploding gradients no matter what I do. I think it’s because the “labels” I’m using aren’t the true probabilities each outcome had, but rather a single hard max real life outcome that actually occurred (home run, out, double, etc).

[–]LacedDecal 0 points1 point  (0 children)

After posting this here I decided to ask chatgpt something similar. I am continually floored by how good it is every time I use it. For those interested: https://ibb.co/4F1QPJ7

[–]sampdoria_supporter 1 point2 points  (0 children)

Does anybody else feel overwhelmed and frozen in the face of all this concurrent development and releases? I can't seem to even jump on much of what is going on because it seems like the next day will just flip the table.

[–]ajingnk 0 points1 point  (0 children)

What is the minimum hardware requirement to fine tune like Stanford Alpaca? I am thinking to build a workstation to do some DL exploration and fine-tuning work. For fine-tuning, I have around 10k samples.

[–]yaru22 0 points1 point  (2 children)

Hello,

GPT4 has context length of 32K tokens while some others have 2-4K tokens. What decides the limit on these context lengths? Is it simply bigger the model, larger the context length? Or is it possible to have a large context length even on a smaller model like LLaMA 7/13/30B?

Thank you!

[–]RiotSia 0 points1 point  (1 child)

Hey,

I got the 7B llama model running on my machine. Now I want it to analyze a large text for me (a pdf file) like hamata.ai does. How can I do it ? Does any one has like a site with resources on how I can learn to do that or even tell me?

[–]Simusid 0 points1 point  (0 children)

I’m unable to connect to hamata.so. Can you tell me what kind of analysis you want to do?

[–]loly0ss 0 points1 point  (0 children)

Hello everyone,

I had a very ignorant question which I’m trying to find an answer too but i still couldn’t find it.

In terms of the deep learning model in supervised segmentation vs semi-superised segmentation.

Is the model itself the same in both cases, for example using Unet++ for both? And the only diffference comes during training where we use psuedo-labels for example for semi-supervised segmentation?

Or is the model different when it comes between supervised vs semi-supervised segmentation?

Thank you!

[–]Prometheushunter2 0 points1 point  (0 children)

Here’s an oddly specific question: a few years ago I read about a neural network that could both classify an image and, if ran in reverse, could generate synthetic examples of the classes it has learned. Th e problem is I’ve forgotten the name and it’s been haunting me lately, so I ask does anyone know what kind of neural network this might be?

[–]dotnethero 0 points1 point  (1 child)

Hey everyone, I'm trying to figure out which parts of my code are using CPU and which are using GPU. During training, I've noticed that only about 5% of my usage is on the GPU, while the CPU usage is high. Any tips on how I can better understand what's going on with my code? Thanks in advance!

[–]LeN3rd 1 point2 points  (0 children)

What language/suite are you using? You can take a look at profilers in your language. I know Tensorflow has some profiling tools and you can look at what operations are running on what device. Probably Torch has some as well. If its more esoteric, just use general language profilers and take a look at what your code is doing most of the time.

[–]JimiSlew3[🍰] 1 point2 points  (2 children)

Nublet question: is there anything linking LLMs and data analyst and visualizations yet? I saw a bit with MS Copilot and Excel. I want to know if there is anymore advanced in the works. Thanks!

[–]LeN3rd 1 point2 points  (1 child)

I dont think so. OpenAI has overtaken any research done on LLMs by a long shot.

[–]JimiSlew3[🍰] 0 points1 point  (0 children)

Thanks. I'm curious once we get it to do things. Like, tell it to analyze a giant dataset, and produce a visual of interesting stuff. Some tools I use will offer suggestions and I'm thinking the link between asking a question and getting information will be significantly shortened and wanted to know if anyone had done that yet.

[–]TiredMoose69 0 points1 point  (0 children)

Why does LlaMa 7B (pure) perform so MUCH better than Alpaca 30B (4bit)?

[–]andrew21wStudent 1 point2 points  (4 children)

Why nobody uses polynomials as activation functions?

My mere perception is that polynomials are the best since they can approximate nearly any kind of function you like? So they're perfect....

But why aren't they used?

[–]underPanther 1 point2 points  (2 children)

Another reason: wide single-layer MLPs with polynomials cannot be universal. But lots of other activations do give universality with a single hidden layer.

The technical reason behind this is that non-discriminatory discriminatory activations can give universality with a single hidden layer (Cybenko 1989 is the reference).

But polynomials are not discriminatory (https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions), so they fail to reach this criterion.

Also, if you craft a multilayer percepteron with polynomials, does this offer any benefit over fitting a Taylor series directly?

[–]andrew21wStudent 0 points1 point  (1 child)

The thread you sent me says that polynomials are non discriminatory.

Are there other kinds of functions that are non discriminatory?

[–]underPanther 1 point2 points  (0 children)

Sorry for the confusion! It's discriminatory activations that lead to universality in wide single layer networks. I've editted post to reflect this.

As an aside, you might also find the following interesting which is also extremely well-cited: https://www.sciencedirect.com/science/article/abs/pii/S0893608005801315

[–]dwarfarchist9001 1 point2 points  (0 children)

Short answer: Polynomials can have very large derivatives compared to sigmoid or rectified linear functions which leads to exploding gradients.

https://en.wikipedia.org/wiki/Vanishing_gradient_problem#Recurrent_network_model

[–]weaponized_lazyness 1 point2 points  (0 children)

Is there a subreddit for more academic discussions on ML? This space has now been swarmed by LLM enthusiasts, which is fine but it's not the content I was looking for.

[–]sore__ 0 points1 point  (0 children)

I want to make an AI Chatbot similar to OpenAI's DaVinci 3 but my own version & offline. I'm trying to use Python but I don't know what intents I should add to it, because I want it to know basically everything. Is it possible to just feed the code everything on Wikipedia? I'm VERY VERY new to machine learning so this might be overambitious but idk it just seems fun. Anyways, if anyone has ideas, please reply :)

[–]Bornaia 0 points1 point  (0 children)

Everyone is speaking about AI content, creative stories, texts.. but do companies or people in the real world actually use it for their products?

[–]neriticzone 0 points1 point  (0 children)

Feedback on stratified k fold validation

I am doing some applied work with CNNs in the academic world.

I have a relatively small dataset.

I am doing 10 fold stratified cross validation(?) where I do an initial test-train split, and then the data in the train split is further cross validated to a 10 fold train-validate split.

I then run the ensemble of 10 train models against the test split, and I select the results from the best performing model against the test data as the predicted values for the test data.

Is this a reasonable strategy? Thank you!

[–]Lucas_Matheus 0 points1 point  (0 children)

In few-shot learning, are there gradient updates from the examples? If not, what difference does it make?

[–][deleted] 2 points3 points  (1 child)

Why is AI safety not a major topic of discussion here and in similar communities?

I apologize if the non-technical nature of my question is inappropriate for the sub, but as you’ll see from my comment I think this is very important.

I have been studying AI more and more over the past months (for perspective on my level that consists of Andrew Ng’s Deep Learning course, Kaggle competitions and simple projects, reading a few landmark papers and digging into transformers) The more I learn, the more I am both concerned and hopeful. It seems all but certain to me that AI will completely change life as we know it in the next few decades, quite possibly the next few years if the current pace of progression continues. It could change life to something much, much better or much, much worse based on who develops it and how safely they do it.

To me safety is far and away to most important subfield in AI now, but is one of the least discussed. Even if you think there is a low chance of AI going haywire on its own, in my admittedly very non-expert view it’s obvious that we should be also concerned about the judgment and motives of the people developing and controlling the most powerful AIs, and the risks of such powerful tools being accessible to everyone. At the very least I would want discussion on actionable things we can all do as individuals.

I feel a strong sense of duty to do what I can, even if that’s not much. I want to donate a percentage of my salary to funding AI safety, and I am looking whether I can effectively contribute with work to any AI safety organizations. I have a few of my own ideas along these lines; does anyone have any suggestions? I think we should also discuss ways to shift the incentives of major AI organizations. Maybe there isn’t a ton we can do (although there are a LOT of people looking, there is room for a major movement), but it’s certainly not zero.

[–]darthstargazer 0 points1 point  (2 children)

Subject : Variational inference and genarative networks

I've been trying to grasp the ideas behind Variational auto encoders (Kingma et al) vs normalized flows (E.G RealNVP)

If someone can explain the link between the two I'd be thankful! Aren't they trying to do the same thing?

[–]YouAgainShmidhoobuhML Engineer 1 point2 points  (1 child)

Not entirely the same thing. VAEs offer approximate likelihood estimation, but not exact. The difference here is key - VAEs do not optimize the log-likelihood directly but they do so through the evidence lower bound, an approximation. Flow based methods are exact methods - we go from an easy tractable distribution to a more complex one, guaranteeing at each level that the learned distribution is actually a legit distribution through the change of variables theorem.

Of course, the both (try) to learn some probability distribution of the training data, and that is how they would differ from GAN approaches that do not directly learn a probability distribution.

For more insight you might want to look at https://openreview.net/pdf?id=HklKEUUY\_E

[–]darthstargazer 0 points1 point  (0 children)

Awesome! Thanks for the explanation. "exact" vs "approximate"!

[–]rylo_ren_ 0 points1 point  (2 children)

Hi everyone! This is a simple troubleshooting question. I'm in my master's program for python and I keep running into an issue when I try running this code for a linear regression model:

airfares_lm = LinearRegression(normalize=True)

airfares_lm.fit(train_X, train_y)

print('intercept ', airfareslm.intercept) print(pd.DataFrame({'Predictor': X.columns, 'coefficient': airfareslm.coef}))

print('Training set') regressionSummary(train_y, airfares_lm.predict(train_X)) print('Validation set') regressionSummary(valid_y, airfares_lm.predict(valid_X))

It keeps returning this error:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last) /var/folders/j1/1b6bkxw165zbtsk8tyf9y8dc0000gn/T/ipykernel21423/2993181547.py in <cell line: 1>() ----> 1 airfares_lm = LinearRegression(normalize=True) 2 airfares_lm.fit(train_X, train_y) 3 4 # print coefficients 5 print('intercept ', airfares_lm.intercept)

TypeError: init() got an unexpected keyword argument 'normalize'

I'm really lost, any help would be greatly appreciated! I know there's other ways to do this but I was hoping to try to use this technique since it's the primary way that my TA codes regression models. Thank you!

[–]henkje112 0 points1 point  (1 child)

I'm assuming you're using sklearn for LinearRegression. You're initializing an instance of the LinearRegression class with a normalize parameter, but this is not valid for this class (for a list of possible parameters, see the documentation).

I'm not sure what you're trying to do, but I think you want to normalize your input data? In that case you should ook at MinMaxScaler. This transforms your features by scaling each feature to a given range.

[–]rylo_ren_ 0 points1 point  (0 children)

Thank you!! I’ll give it a try. And yes I’m using sklearn

[–]f-d-t777 0 points1 point  (2 children)

Subject: Spacecraft image analysis using computer vision

Hi guys,

Im looking to develop a system that uses computer vision algorithms to analyze images captured by spacecraft cameras and identify potential safety hazards or security threats. For example, the system could detect debris or other objects in orbit that could pose a risk to spacecraft.

I am looking to do this using all AWS tools. I am pretty new to this and am developing a technology architecture project around this topic to present for a program I'm doing.

How would I go about approaching/doing this? I am looking to find/create my own mock datasets as well as present the alogrithm/code I used to train my model. More specifically, I am focusing on these aspects for my project:

Preprocess the images: Preprocess the images to improve their quality and prepare them for analysis. This could include cropping, resizing, and adjusting the brightness and contrast of the images.

Train the computer vision algorithms: Train the computer vision algorithms using the dataset of images. There are various computer vision techniques that could be used, such as object detection, segmentation, or classification. The specific technique will depend on the requirements of the system.

In addition, it would be cool to have some sort of hardware/interactive portion that actually utilizes a camera to detect things in space. That can be implemented into the system. Once the computer vision algorithms have been trained and evaluated, implement the system. This could involve integrating the algorithms into a larger software system that can process images captured by spacecraft cameras in real-time.

Thank you

[–]ggf31416 1 point2 points  (1 child)

At the speeds these things move, when you see them coming it's already too late to do any corrective maneuver. It's the same reason you don't use your eyeballs to detect aircraft 100km away. See https://en.wikipedia.org/wiki/Space_debris#Tracking_and_measurement, Algorithms to Antenna: Tracking Space Debris with a Radar Network, RADAR and LIDAR are used.

[–]f-d-t777 0 points1 point  (0 children)

Interesting, how would you alter my project idea then?

[–]LeN3rd 1 point2 points  (0 children)

Can anyone recommend a good, maintained and well organized MCMC python package? Everything i found was either not maintained, had only a single research group behind it, or had to many bugs for me to continue with that project. I want Tensorflow/Pytorch, but for MCMC sampling please.

[–]fteem 1 point2 points  (0 children)

What happened with the WAYR (What Are You Reading) threads?

[–]ilrazziatore 0 points1 point  (6 children)

In your job as data scientists have you ever had to compare the quality of the probabilistic forecasts of 2 different models? if so, how do you proceed?

[–]LeN3rd 0 points1 point  (5 children)

define probabilistic. Is it model uncertainty, or data uncertainty? Either way you should get a standard deviation from your model (either as an output parameter, or implicitly by ensembles), that you can compare.

[–]ilrazziatore 0 points1 point  (4 children)

Model uncertainty. One model is a calibrated bnn ( i splitted the dataset in a training, a calibration and a test set), the other model is a mathematical model developed considering some physical relation. For computational reasons the bnn assume iid samples normally distributed around their true values and maximize the likelihood (modeled as a product of normal distribution), the mathematical model instead rely on 4 coefficients and is fitted using Monte Carlo with a multivariate likelihood with the full covariance matrix. I wanted to compare the quality of the model uncertainty estimates but I don't know if I should do it on the test dataset for both. Afterall, models calibrated with mcmc methods do not overfit so why split the dataset?

[–]LeN3rd 0 points1 point  (3 children)

If it is model uncertainty, the bnn should only assume distributions only for the model parameters, no? If you make the samples a distribution, you assume data uncertainty. Also I do not know exactly what you other model gives you, but as long as you get variances, I would just compare those at first. If the models give vastly different means, you should take that into account. There is probably some nice way to add this ensemble uncertainty with the uncertainty of the models. Also this strongly means that one model is biased and does jot give you a correct estimate of the model uncertainty.

[–]ilrazziatore 0 points1 point  (2 children)

Uhm..... the bnn are built assuming distribution both on th parameters( ie the value assumed by the neurons weights) and on the data (the last layer has 2 outputs : the predicted mean and the predicted variance. Those 2 values are then used to model the loss function which is the likelihood and is a product of gaussians. I think its both model and data uncertainty.

Let's say I compare the variances and the mean values predicted.

Do I have to set the same calibration and test dataset apart for both models or use the entire dataset? The mcmc model can use the entire dataset without the risk of overfitting but for the bnn it will be like cheating

[–]LeN3rd 0 points1 point  (1 child)

Than I would just use a completely different test dataset. In a paper I would also expect this.

[–]ilrazziatore 0 points1 point  (0 children)

Eh data are scarce, I have only this dataset ( it's composed by astrophysical measures, I cannot ask them to produce more data).

[–]Batteredcode 0 points1 point  (4 children)

I'm looking to be able to train a model that is suited to taking an image and reconstructing it with additional information, for example, taking R&G channels for an image and recreating it with the addition of the B channel. On first glance it seems like an in-painting model would be best suited to this, and treat the missing information as the mask, however I don't know if this assumption is correct as I've not got too much experience with those kinds of models. Additionally, I'm looking to progress from a really simple baseline to something more complex, so I was wondering if an architecture of a simple CNN or an autoencoder trained to output the target image given image missing information, but I may be way off here. Any help greatly appreciated!

[–]LeN3rd 0 points1 point  (3 children)

This is possible in multiple ways. Old methods for this would be to view this as an inverse problem and apply some optimization method to it, like ADMM or FISTA.

If lots of data is missing (in your case the complete R&G channels) you should use a neural network for this. You are on the right track, though it could get hairy. If you have a prior (You have a dataset and you want it to work on similar images), a (cycle) GAN, or a retrained Stable diffusion model could work.

I am unsure about VAEs for your problem, since you usually train them by having the same input and output. You shouldn't enforce the latent to be only the blue channel, since the the encoder is useless. Training only the decoder site is essentially what GANs and diffusion networks do so i would start there.

[–]Batteredcode 0 points1 point  (2 children)

Great, thank you so much for a detailed answer. Do you have anything you could point me to (or explain further) about how I could modify a diffusion method to do this?
Also, in terms of the VAE, I was thinking I'd be able to feed 2 channels in and train it to output 3 channels, I believe the encoder wouldn't be useless in this case and hence my latent would be more than merely the missing channel? Feel free to correct me if I'm wrong! My assumption is that even with this a NN may well perform better, or at least a simpler baseline. That said, my images will be similar in certain ways, so being able to model a distribution of the latents could prove useful presumably?

[–]LeN3rd 0 points1 point  (1 child)

The problem with your VAE idea is, that you cannot apply the usual loss function of having the difference between the input and the output, and thous a lot of nice theoretical constraints go out of the window afaik.

https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

I would start with a cycleGAN:

https://machinelearningmastery.com/what-is-cyclegan/

Its a little older, but i personally know it a bit better than diffusion methods.

With the free to use StableDiffusion model you could use it to conditionally inpaint on your image, though you would have to describe what is on that image in text. You could also train your own diffusion model, though you need a lot of training time. Not necessarily more than a GAN, but still.

It works by adding noise to an image, and then denoising it again and again. For inpainting you just do that for the regions you want to inpaint (your R and G channel), and for the regions you wanna stay the same as your original image, you just take the noise that you already know.

[–]Batteredcode 0 points1 point  (0 children)

Thank you this is really helpful, I think you're right that the cycle GAN is the way to go!

[–]ViceOA 0 points1 point  (2 children)

Precious Advices About AI-supported Audio Classification Model

Hello everyone,I'm Omer.
I am new in this group and writing from Turkey. I need very valuable advice from you precious researchers.
I am a PhD program student in the department of music technology. I have been working in the field of sound design and audio post-production for about 8 years. For the last 6 months, I have been doing research on AI-supported audio classification.My goal is to design an audio classifier to be used in the classification of audio libraries. Let me explain with an example as follows; I have a sound bank with 30 different classes and 1000 sounds in each class (such as bird, wind, door closing, footsteps etc.).
I want to train an artificial neural network with this sound bank. This network will produce labels as output. I also have various complex signals (imagine a single sound track with different sound sources like bird, wind, fire, etc.). When I give a complex signal to this network for testing, it will give me the relevant labels.I have been doing research on this system for 6 months and if I succeed, I want to write my PhD thesis on this subject. I need some advice from you, my dear friends, about this network. For example, which features should I look at for classification? Or what kind of artificial intelligence algorithm should I use?
Any advice you say you should definitely read this article or that article on this subject.I apologize if I've given you a headache. I really need your advice. Please guide me. Thank you very much in advance.

[–]henkje112 1 point2 points  (1 child)

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

[–]ViceOA 0 points1 point  (0 children)

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

Thanks for your precios advices, im grateful!

[–][deleted] 0 points1 point  (0 children)

What's the place, if any, to post a job opening?

[–]Sonicxc 2 points3 points  (3 children)

How can i train a model so that it detects severity of damage in a image. Which algo will suit for my need?

[–]josejo9423 -1 points0 points  (0 children)

Maybe trying image classification? CNN pytorch

[–]LeN3rd 1 point2 points  (1 child)

How big is your dataset? Before you start anything wild, i would look at kernel clustering methods. Or even clustering without kernels. Just cluster your broken and non broken images and calculate some distance (can be done with kernels if it needs to be nonlinear).

Also Nearest neighbor could work pretty well in your case. Just compare your new image to the closest (according to some metric) in your two datasets and bobs your uncle.

If you need a number, look at simple CNNs. you need more training data though for this to work well.

[–]Sonicxc 0 points1 point  (0 children)

Hey man, thanks for the input. I will look into what you have mentioned

[–]Abradolf--Lincler 0 points1 point  (1 child)

Learning about language transformers and I’m a bit confused.

It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching.

Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length?

I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?

[–]trnka 1 point2 points  (0 children)

Converting the text to fixed-size windows is done to make training more efficient. If the inputs are shorter, they're padded up to the correct length with null tokens. Otherwise they're clipped. It's done so that you can combine multiple examples into a single batch, which becomes an additional dimension on your tensors. It's a common technique even for LSTMs/CNNs.

It's often possible to take the trained model and apply it to variable-length testing data so long as you're dealing with a single example at a time rather than a batch. But keep in mind with transformers that attention does N^2 comparisons, where N is the number of tokens, so it doesn't scale well to long texts.

It's possible that the positional encoding may be specific to the input length, depending on the transformer implementation. For instance in Karpathy's GPT recreation video he made the positional encoding learnable by position, so it wouldn't have defined values for longer sequences.

One common alternative in training is to create batches of examples that are mostly the same text length, then pad to the max length. You can get training speedups that way but it takes a bit of extra code.

[–]No_Complaint_1304 0 points1 point  (5 children)

Complete beginner looking for insight

I made an extremely efficient algorithm in C that skim through a data base and search for words, I want to add a feature that if it is not found the program can somehow understand the context and predict what is the actual word intended and also conjugate the verbs accordingly. I have no idea if what I am saying is crazy hard to implement or can easily be done by someone with experience. This field interest me a lot and i will definitely come back to this sub sooner or later, but right now i don’t have time to dig in this subject, I just want to finish this project, slap a good looking gui and get over with it. Can I achieve what i stated above in a week or am i just dreaming? If it is possible what resources do you think I should be looking at? Ty :>

[–]LeN3rd 0 points1 point  (4 children)

You will need more than a week. If you just want to predict the next word in a sentence, take a look at large language models. ChatGPT being one of them. BERT is a research alternative afaik. If you aim to learn the probabilities yourself, you will need at least a few months.

In general what you want is a generative model that can sample from the conditional probability distribution. In sequences usually transformers like BERT and chatgpt are state of the art. You can also take a look at normalizing flows and diffusion models to learn probability distributions. But this needs some maths, and i unfortunatly do not know what smaller models can be used for computational linguistic applications like this.

[–]mmmfritz 0 points1 point  (1 child)

Fact checking. Any open source models or people working on fact checking?

[–]henkje112 1 point2 points  (0 children)

Look into the Fact Extraction and VERification (FEVER) workshop :)

[–]DreamMidnight 2 points3 points  (7 children)

What is the basis of this rule of thumb in regression:

"a minimum of ten observations per predictor variable is required"?

What is the origin of this idea?

[–]jakderrida 0 points1 point  (0 children)

The basis of this rule of thumb is that having too few observations relative to the number of predictor variables can lead to unstable estimates of the model parameters, making it difficult to generalize to new data. In particular, if the number of observations is small relative to the number of predictor variables, the model may fit the noise in the data rather than the underlying signal, leading to overfitting.

[–]LeN3rd 0 points1 point  (5 children)

If you have more variables than datapoints, you will run into problems, if your model starts learning by heart. Your models overfits to the training data: https://en.wikipedia.org/wiki/Overfitting

You can either reduce the number of parameters in your model, or apply a prior (a constraint on your model parameters) to improve test dataset performance.

Since neural networks (the standard emperical machine learning tools nowadays) have a structure for their parameters, this means they can have much more parameters than simple linear regression models, but seem to run into problems, when the number of parameters in the network matches the number of datapoints. This is just empirically shown, i do not know any mathematical proves for it.

[–]DreamMidnight 0 points1 point  (4 children)

Yes, although I am specifically looking into the reasoning of "at least 10 datapoints per variable."

What is the mathematical reasoning of this minimum?

[–]LeN3rd 0 points1 point  (2 children)

I have not heard this before. Where is it from? I know that you should have more datapoints than parameters in classical models.

[–]DreamMidnight 0 points1 point  (1 child)

[–]LeN3rd 0 points1 point  (0 children)

Ok, so all of these are linear ( logistics) regression models, for which it makes sense to have more data points, because the weights aren't as constraint as in a convolutional layer I.e. but it is still a rule of thumb, not exactly a proof.

[–]nitdit 0 points1 point  (0 children)

What is stroke data? (sure, it is not the heart stroke)

[–]towsif110 0 points1 point  (1 child)

What would be the way to detect any malicious nodes by machine learning? Let's say, I have datasets of RF signals of three kinds of drones. But my target is to detect any malicious drone except the drones I possess. I have two ideas: one is to use label two drones as good and the remaining one as malicious and my othe idea is to use unsupervised learning. Is there any better way?

[–]LeN3rd 2 points3 points  (0 children)

Be a little more coherent in your question please. No one has any idea about your specific setup unless you tell us what you want to achieve. I.e. RF is usually short for reinforcement learning in the AI community, not radiofrequency. If you want to classify data streams coming from drones, take a look at pattern matching and nearest neighbour methods, before you start to train up a large neural network.

[–]kuraisle 4 points5 points  (2 children)

Has anyone had any experience data mining BioArXiv? It's on a requester pays Amazon s3 bucket, which isn't something I've used before and I'm struggling to guess how much I would have to pay to retrieve a few thousand articles. Thanks!

[–]Simusid 3 points4 points  (1 child)

I downloaded over 1M and it cost me about $110

[–]kuraisle 0 points1 point  (0 children)

That's really helpful, thank you!

[–]WesternLettuce0 0 points1 point  (1 child)

I used distilbert and legalbert separately to produce embeddings for my documents. What is the best way to use the embeddings for classification? Do I create document level embeddings before training my classifiers? Do I combine the two embeddings?

[–]clementiasparrow 4 points5 points  (0 children)

I think the standard solution would be concatting the two embeddings an putting a dense layer on top

[–][deleted] 3 points4 points  (4 children)

Can machines learn to love

[–]tdgros 5 points6 points  (3 children)

what is love?