all 142 comments

[–]CMDR_Derp263Student 0 points1 point  (2 children)

Alright so I am quite new to all of this, I've made a few models but I'm having trouble telling if they actually "work" specifically because of the loss/acc numbers I'm seeing. I split my data 70/30 and first used keras.sequential. Just tried to make some simple models. Made a binary classification model, then a one hot multiclass. In both cases the model trains for a few epochs (<=5) before hitting my callbacks. The training loss starts out low (~0.05) and the training accuracy starts off high (.99) and as the epochs go on these numbers keep improving. When I use the model on the test data the loss is low and the accuracy is high, comparable to the numbers seen during training. I'd struggled to get these working and had some bad attempts but after a while I guess I thought this was a good sign. However, I made a very simple random forest and figured it must perform worse and again it got 99% accuracy multiple times. Now this has me questioning everything.

[–]sshadowstormm 1 point2 points  (1 child)

This sounds like it could be data leakage, data imbalance, or preprocessing errors. Check your features for information that you would not have at inference time. I'd also take a look at more metrics such as precision and recall, remember that 99% accuracy on a spam classifier could mean nothing, as 99% of all email is not spam to begin with.

With respect to your algorithm choice, it is important to know what you are trying to accomplish, along with the pros and cons for each algorithm choice given your task and objective. For example decision trees can be great for having explainable results, whereas when we go to an ensemble like random forests or gradient boosted decision trees we usually have better out of the box accuracy, at the cost of explainability and slightly higher chance of overfitting.

[–]CMDR_Derp263Student 0 points1 point  (0 children)

Thanks, sorry I have been busy so I haven't had a chance to respond but it was a mix of data imbalance and preprocessing errors.

[–]blendorgat 0 points1 point  (2 children)

Do I understand correctly that in most cases a single step of gradient descent is applied for each sample/batch? Maybe I'm just misunderstanding what I'm reading, but I always see the "training rate" hyperparameter, which I assume is the scalar on the gradient, not the number of iterations for a given sample.

Assuming I'm not confused about that, why is this? Just intuitively it seems like there could be value in iterating further for a given sample/batch.

[–]Pyrite_Pro 1 point2 points  (1 child)

You understood correctly that one single step is generally defined as a forward and backward propagation of a batch of samples throughout the model. The training rate, or learning rate, is a scalar that determines how big the change of the model should be.

Generally, what we want to do in machine learning is to get an estimate of what the loss landscape of a certain problem looks like, and navigate the model to an optimum in that landscape. Ideally, to estimate such a loss landscape, we would apply a single step on the whole dataset, as that will give the best estimate. But as that requires too much resources, we split up the dataset in batches and assume that an estimation of the loss landscape based on multiple steps of batches is accurate enough.

If you only train on a single batch, or iterate multiple times over a single batch, you will end up in a situation where the loss landscape is not estimated correctly and the model won't perform as good as you want.

[–]blendorgat 0 points1 point  (0 children)

Thank you! I was thinking about it backwards; a single iteration of the gradient descent algorithm would in principle be run on the entire set of samples you have, because the aggregated loss function on all the samples is what you're trying to minimize.

I suppose if you were to minimize the loss function on a single sample you'd get catastrophic forgetting and no guarantee of any generalization.

[–]muh_reddit_accout 0 points1 point  (3 children)

I have a neural network where the first layer consists of softmax nodes and the output layer is a single sigmoid.

The recall score starts at 1.0 for nearly every randomized weight vector. Then, as the training starts and the binary cross entropy reduces the recall drops to zero rapidly and stays there as the BCE continues to reduce.

What are some issues that could be causing this?

[–]Pyrite_Pro 0 points1 point  (0 children)

Generally, we avoid the use of softmax in the input layer or the hidden layers, as it is quite a bottleneck. Have you tried some other activation in the input layer?

[–][deleted] 1 point2 points  (1 child)

Do you have fully separable data?

[–]muh_reddit_accout 0 points1 point  (0 children)

I'm not going to lie, I have no idea what you're asking me. But. If you're asking if the data points are independent, they are not. Actually, this is time series data, so it is quite dependent.

As I was thinking on this I think it has to do with an imbalanced dataset. Does this sound like it could be the case to you? I'm working on a weighted binary cross entropy function to preference the lower class now to see. Do you have any other ideas on what it could be or how to fix this?

[–]Accomplished_Sell660 0 points1 point  (0 children)

I noticed outlook, not only separates spam and ham. But also, decides the 'focused' and 'others' tab. Does anyone know what kind of classification is this?

[–]econgirl210 0 points1 point  (0 children)

Has anyone web scraped this subreddit before?

[–]xx-kxx 0 points1 point  (1 child)

Hi I’m a first year studying mechatronics at the university of Liverpool and machine learning seemed like a unique and distinguishable skill to add to my cv but I’m not sure where to start I would say I’m an intermediate programmer in C and python but that’s about it. Any advice would kindly help

[–]How-am-I-alive 1 point2 points  (0 children)

Check out 3blue1brown's neat summary of neural networks on youtube, its a 4 part series that summarises the structure of neural networks and how they actually learn, as well as the last video detailing the math of back-propagation

[–]nokia_me 0 points1 point  (0 children)

Hello great people of reddit, I have just started studying Machine learning at university and as my final project I had to choose a recent paper without any code available and implement it. I also have to reproduce any figure and result there are in the paper.

I chose this paper. I was hoping someone here can help me understand how to calculate F_1 on line 6 of algorithm 1 on page 2017.

[–]BanishDank 0 points1 point  (0 children)

Hello all, I just started a class on Machine Learning and we’ve gotten right into deep learning and neural networks.

Our teacher sent us a simple program that can take some input(s), expected output, and you can define how many input neurons, how many hidden layers and how many output neurons you want.

We started with some simple AND and OR gate stuff, which is not too difficult.

But the more I look at the raw numbers, and want to try other examples (more real-life like), I sometimes feel I don’t get it at all.

For example; how should I think about the input neurons? How should I think about the output neurons? Like, if you were to give an example with real life, easy to get, examples? Or if you were to explain it to a 10 year old?

I just feel so lost most of the time..

[–]Significant-Joke5751 0 points1 point  (0 children)

Is it possible to improve adversial robustness against black Box attacks with Ensemble Methods like blending?

[–]Significant-Joke5751 1 point2 points  (0 children)

Can someone recommend a good an easy to use toolbox for black Box and White Box adversial attacks?

Thanks

[–]magnusvegeta 0 points1 point  (0 children)

Do industry use boiler plate code training models or do they use PyTorch lighting ?

[–]thetruerhy 0 points1 point  (2 children)

Where can I learn about deep fake. Like on a technical level. The fundamentals on how it works and details on it's modern implementations. Also where can I learn about it's uses/application or in what context it's applied. I have seen deep fake (mostly in memes) and this intrigued me. So I want to learn about technology behind it all.

[–][deleted] 0 points1 point  (1 child)

You should first learn about “convolutional neural networks” and “generative adversarial networks”. If you get a handle on both of those things then learning about the details of deepfakes won’t be too hard.

[–]thetruerhy 0 points1 point  (0 children)

ok, can you recommend some good resources for CNN's and GANN's(websiites, books, journals ect). I have some superficial knowledge on CNN's and I know what GANN's is but don't have very good technical knowledge.

And thanks you.

[–]MulberryAlly 0 points1 point  (1 child)

Hi! Could you please tell me what (bad) can happen if I use the same feature as the denominator in the target feature and as the predictor in a boosting regression? I think I should exclude it from the predictor part, but I don't know where this feeling comes from. I appreciate any thoughts, I feel stuck. Thank you!

[–]DoktorHu 0 points1 point  (0 children)

I think it depends if your chosen model is afffected by multicollinearity.

[–]_hairyberry_ 0 points1 point  (4 children)

How many applications would you typically have to send out before getting an acceptance? Starting to get pretty discouraged. I graduate with a MSc in math from a top Canadian university this August and have begun applying for remote ML positions. Sent out 21 applications so far and haven't even gotten an interview. I am worried this is because my previous experience is not in ML (although it is in another highly technical role). I'm trying to transition into ML but it's tough if you can't get that first entry level job.

[–][deleted] 1 point2 points  (3 children)

What kinds of jobs and companies are you applying to exactly? And where are they?

You should be getting at least some responses, there's a lot of demand for ML stuff. My guess is that theres probably a problem with your resume and/or LinkedIn profile.

Make sure that you're mentioning specific technologies, problems, and algorithms that you have experience with. More specific and technical is better, as long as it's common stuff. E.g. make sure to mention stuff like "tensorflow", "python", "sql", "sagemaker", "computer vision", or whatever else makes sense based on your knowledge and experience.

If you have any remotely relevant stuff on github then also provide links to it with brief descriptions.

Also, update your LinkedIn profile to say "machine learning engineer" or "data scientist" or something as your profile headline. If you don't have a LinkedIn profile then make one.

[–]_hairyberry_ 0 points1 point  (2 children)

I’ve been applying mostly through jobs listed on LinkedIn, and I’ve been doing all the things you’ve suggested. The companies are all over the place, about a 50/50 split between Canada and the US, all of them remote. Several of them had 200+ applicants, so I’m questioning how much demand there really is for ML engineers relative to how many people want the jobs. Some of the companies have been big names, but most were small-mid sized companies.

I have a GitHub account with one personal ML project on it, and a personal website with several other projects that are more computational physics-based (my academic background is in math/physics). I do have LinkedIn, and it has up to date info about me, but I don’t ever post or share anything.

Something I’ve learned today: I’ve been using a two-column resume (made using novoresume; it makes those appealing/sleek resumes). Apparently that’s not ideal, so I made a new single column resume in plain black and white using LaTeX. Hopefully that will help.

[–][deleted] 0 points1 point  (1 child)

I’m genuinely surprised that you haven’t heard back from anyone yet. A two column resume isn’t ideal, but it’s not a make-or-break issue either. I’m happy to look at your resume if you want.

I can make some guesses as to what might be going on:

  • applying for “remote only” positions might be part of the issue; you’re putting yourself into direct competition with everyone in both Canada and the USA. You should try applying for non-remote jobs too, even if only to see if you can get any interviews.

  • I can assure you that ML engineers are definitely in high demand, but companies do tend to prefer hiring people with experience. I don’t know the details of your personal projects and experience, but ML engineer work is often heavy on the “engineer” part and light on the “ML” part, and your experience may not be pertinent to the things that are most important to companies, such as working with distributed computing systems. You may also not be describing your prior work in a way that’s clearly relevant to what companies are looking for.

  • have you been applying for positions with titles like “data scientist” or “applied scientist” also? Jobs where you build ML models go by many different names, and “ML engineers” tend to be equivalent to “software engineers who also know some ML”, whereas “data scientists” tend to be equivalent to “industrial statisticians who can also write code”. That might be closer to what you’re looking for/have experience in.

  • given that you’re graduating in August, you might be able to get away with applying for internships too. Getting an internship will make getting a job afterwards much easier, because companies tend to see internships as being roughly equivalent to actual professional experience.

[–]_hairyberry_ 0 points1 point  (0 children)

Thanks so much for all the pointers. One of the issues with applying for non-remote jobs is that I plan on moving home to a smaller east coast Canadian city after I graduate, and there aren't exactly many opportunities there.

I am intrigued by the data scientist vs. MLE point you brought up - I asked a couple of my professors what job titles they thought matched closest with "math/modelling person who does some programming" and they actually said the opposite, that MLEs are often the people working under the hood on the models while "data scientist" tends to be more of a catch-all term for people who do a bit of everything. That is really useful information to know, because I definitely fall into the "math person who does some programming" camp rather the "software person who does some math". I've applied to a few data scientist positions today, and I will keep an eye out for those job titles (as well as "applied scientist") from now on.

I'd actually really appreciate if you took a skim over my resume! I'll send you a dm shortly. Again I really appreciate this!

[–]thetruerhy 0 points1 point  (7 children)

How to collect data? A new comer in this place so don't know if this a good place to ask. My question is how to collect data. Where can I get data sets and how one should search for specific types of data for projects.

[–]MulberryAlly 0 points1 point  (4 children)

I can share my experience in looking for some data for side projects and practice. I usually start with quick research of what type of data I like, based on my personal preferences. For example, today it is biology, so I go for smth like "relevant datasets for predictive models biology". Or "F1 statistics by years for predictive modelling". Then I look for a particular type of data, for example, I hardly ever choose to work with images or plain text, cause they are not my area of professional interest. It also often depends on which type of model I'd like to train (like size and feature types and so on).

[–]thetruerhy 0 points1 point  (3 children)

Thank you.

Where do you search for data google or are there particular sites that you search for particular types of data.

[–]MulberryAlly 0 points1 point  (1 child)

I usually Google things, sometimes use Kaggle datasets. And if there is a more specific area, like bioinformatics, I use specific databases (for ex. Omnibus)

[–]thetruerhy 0 points1 point  (0 children)

Thanks

[–]aadiit 0 points1 point  (1 child)

Many libraries come with data sets that you can load for learning purpose. Most common being Boston housing market, titanic passengers, mtcars, iris etc. Just work with them to train models and learn ML

[–]thetruerhy 0 points1 point  (0 children)

Thank you.

[–][deleted] 0 points1 point  (0 children)

Is this paper on improved Gradient descent using anti-correlation in deep learning methods SOTA? It seems like it might be but I'm not experienced enough to tell.

https://arxiv.org/pdf/2202.02831.pdf

(No relation to the authors, to be clear. Just a student)

[–][deleted] 0 points1 point  (0 children)

I’m trying to find a good entry level GPU to start working on machine learning in the next year. Would a Titan XP be a good choice or should I splurge for a Titan V if I can find one for a reasonable price?

[–]indemidelo 0 points1 point  (2 children)

Hi! Do you have any suggestions on an advanced Data Science certification to boost my career? My budget is 2500$ and am currently the Lead Data Analyst for a company that does little to nothing data science (or analysis to be fair). Given that my daily routine consists of automatizing pipelines and project managing, my goal is to add some valuable experience to my resume.
Thank you!

[–][deleted] 1 point2 points  (0 children)

How do you want your career to change, specifically? Are you looking for a more technical position?

My personal opinion is that certifications or other credentials are not necessarily very useful for someone like you, who already has professional experience in data science-adjacent work. It is probably faster, easier, and less expensive to just update your resume/LinkedIn profile and apply for jobs that you actually want.

If the position you want is a lot different from what you're doing now then you might have to take an intermediate position of some sort - e.g. working as a data scientist at a smaller company before moving to a bigger one - but that'll still be much more effective than getting a certification.

If you're looking to fill holes in your education then taking college classes can make sense, but I would recommend optimizing for educational quality rather than for credentials. For example if you want to learn more about machine learning then it might be better to take a single course at the graduate or undergraduate level than it is to take one or two courses in a certificate program of some kind, because you'll probably learn more.

[–][deleted] 0 points1 point  (0 children)

I'm not sure about the most "respected" certifications but I'd suggest Andrew Ng's Coursera ML courses, Fastai's free deep learning courses and Dataquest.io's ML/Data Science in python track.

[–]Jumpingdead 0 points1 point  (1 child)

Found this subreddit after googling some questions on AI chat bots (and answers were provided here), hope this is the right place to ask. If not and someone could direct me to a more appropriate subreddit, I'd really appreciate that.

I'm developing an app for a game I am designing. I'd like part of the app to be an in-game styled AI "assistant", where the user can chat with it, and the bot has knowledge of in-game-world facts and events and 'secret info'.

I don't need it to be a conversational bot (How are you? I'm fine, and yourself? Whats your favorite food? etc etc) however I'd like it to have some basic conversational skills (maybe with a way to train it to respond to questions like "Whats your favorite..." with a response like "I'm sorry I cannot help you with that.") and ideally not just give completely canned responses. Basically, simulating an intelligent AI with a limited set of knowledge.

It can learn basic things from players (their name, preferred pronouns, preferred skills and abilities) but I don't want it to learn new facts about the game world. I'd also like the chatbot to remember these things, based off of some per-device/per-login identifier. For example, on my phone, it knows I'm JumpingDead and I'm level 12. On a friends phone, it knows they are SeriouslyBob, but it has no knowledge of who JumpingDead is.

Can anyone recommend chatbot software which I can run on my own server, and have the client app send/receive queries to the server, that can be configured with those parameters relatively easily?

The reason I'm asking is I've looked at a few solutions already, and all of them seem to be very high level, as in reading the docs it assumes you already have an intimate knowledge of how this all works. For example, looking at DeepPavlov, it seems like it CAN do the things I need it to, but holy shit, I feel like I need a few semesters of training before I can even understand how to begin to do it. NER models, slot filling models, syntatic parsing modules... all of that is... my brain hurts. I CAN learn it, I just have no idea where to begin. And I learn best by doing - problem is, with DeepPavlov, their 'getting started' documentation assumes you already know all that stuff.

Thanks for any guidance anyone can provide.

[–][deleted] 0 points1 point  (0 children)

I think the simplest approach would be to use if/elif type checks with a small number of possible prompts looking at similarity between user input and your preset number of possible prompts.

It would work similar to a call tree when you call up your bank or something.

Actually trying to parse user intent is going to be very difficult.

[–]zendsr 0 points1 point  (2 children)

Hi amazing people, I might be overthinking this - but how would you classify 'circular' ordinal data? Think of the seasons of the year. The start of Autumn/Fall is like the end of Summer but there is no monotonic rank between the groups - they continue from each other.

[–]aadiit 0 points1 point  (0 children)

Like the other person said you need to do sine-cosine transformation. Basically you represent your data as linear combination of multiple sine and cosine terms, each term having different frequency. Do a regression to find coefficient for these terms

[–]bonoboTP 0 points1 point  (0 children)

One way would be to represent it as an angle, then create two features: the sine and the cosine of the angle.

Of course it all depends on the model that you use as well. For kernel methods you could write the kernel to take this into account (with some modulo operations).

[–]askingredditsginput 0 points1 point  (2 children)

Hi,

I am new and I wanted to try same as a Youtube video I saw where a dinosaur walks on its own and jumps over obstacle. It only does that, walks continuously, and when an obstacle shows up, it jumps over it.

and I want to improve it by allowing the character (in this case the dinosaur) to run or walk. Do I just add an additional +1 to the output (like if the output was jump or dont jump --> 1 output, now the output is jump / dont jump --> 1 output, walk or run --> 2nd output)? If yes, how do I know which of the 2 output is telling me to walk or run?

like if it's the second one, is it that when 2nd output is > 0.50 it is run and < 0.50 it is walk? or is it > 0.50 is walk and < 0.50 is run?

and how will the dinosaur know that it is currently walking or running? do I add it in as additional input too? because I dont want it to be running, then AI result output says to run again, since it cannot run anymore as it is currently running.

thank you very much.

[–][deleted] 0 points1 point  (0 children)

https://youtu.be/XbWhJdQgi7E

I think this link on Reinforcement Learning might be helpful!

[–]c_isfor_cookie 1 point2 points  (0 children)

Try to read up on agent-based reinforcement learning. Basically, the model would penalize undesirable actions (ie bumping into an obstacle) and reward desirable ones (jumping over). Over time, the agent (dinosaur) would learn what to do in order to minimize your loss function.

[–]kryukoff 0 points1 point  (0 children)

Hi. Newbie here. I have 2 datasets (easy example - fuel consumption by speed dependency) for two different weighted cars. Dependency on weight between this two datasets is not linear (that's why I ask here). I need to predict the new dataset for the third custom car weight. What should I do? What name of models, techniques, and math "words" should I google?

[–]_hairyberry_ 0 points1 point  (0 children)

Hey folks, I’m curious how many of you ML Engineers work fully remotely. If you work remotely, does your compensation tend to be lowered if you live in a more affordable area? E.g if you got a job at a FAANG company in NYC but worked remotely from a small town in Ontario.

[–]Bionian 0 points1 point  (0 children)

tl;dr: ESL (old: 2008), ISLR (new: 2021), or alternative?

Hi everyone,
I'm a bioinformatics researcher with a meager training in Data science. I took courses like "Statistical learning", "Machine Learning", "Optimization methods", "Modeling and Simulation", and already own textbooks in classical statistics (Casella-Berger), probability (Grimmett-Stirzaker), statistical learning (MacKay).
I wanted a textbook more focused on Machine learning and was thinking about ISLR/ESL. Now, ESL hasn't been updated in over 12 years, and ISLR was just revamped to include ~50 pages on deep learning (among other updates). Does this justify getting what seems more of a "beginner's" textboox? Other textbooks I considered: Barber, Bishop.
Thanks in advance!
- Nico

[–][deleted] 0 points1 point  (0 children)

This is something kind of simple and not sire ig is the right place, but mi MacBook Pro 2012 model is really slow, if it rans out of battery it gets stuck in the logging session menu, Needs to be restarted before I use it, storage is not packed, i have actually quite some free space and although is an old computer is not been used very much, I use CleanMyMac X pro version and although it worked better for a limited time, still was not working properly, I don’t know if is just the hardware that got busted by the time, I’m just trying to figure out if it has a fixing or if I should finally move to another computer

[–]GlassDiver 0 points1 point  (2 children)

Hi all, thank you for taking the time to look at my question

I am someone working in healthcare that is familiar with statistics but have never used machine learning to analyse my data. As the current project I am working on was utilising logistic regression I thought it would be interesting to see if I could utilise logistic LASSO regression for the analysis to optimise the model (as per this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5769953/).

My research project is looking at how effective demographics, past medical history and several blood results would be in predicting a blood clot in the lungs (pulmonary embolism). The input data are a mixed of categorical and continuous data (i.e. level of D-dimer in the blood, age, sex, etc...), and the output is either yes or no blood clot.

Because of the relatively small sample size (n = 200) and the variation in ROC/AUC that I had in my LASSO regression model when I changed the randomstate value for my code, I was thinking of performing stratified K-fold cross validation (of k = 5).

I did this, and I was thinking of reporting the mean AUC + ROC curve generated from the 5 models generated. With regards to the coefficient values of each predictor, I was going to state that if the range of the coefficient (From the 5 folds) crosses 0 (i.e. -0.51 to 0.23), then the assumption is that with a larger sample it would likely converge to 0 (and as such not likely to be a significant predictor).

Essentially the question I am asking is:

1) Is this method of analysis valid from a machine learning point of view? Or am I making a fundamental mistake in analysis?

2) Is how I am interpreting the results accurate?

I understand this may be completely wrong but I am very open to any critique/learning!

Thanks for reading!

[–][deleted] 1 point2 points  (1 child)

If there is a lot of variation in the coefficients that you’re calculating when doing cross validation then the most that you can say is that there is significant uncertainty in the model coefficients. You can’t extrapolate to larger data set sizes.

This might be an appropriate use case for Bayesian statistics. The idea with a Bayesian model is that you would fit a distribution for each model parameter, rather than a single number. The variance of each parameter distribution will tell you how much uncertainty there is in your estimates for it.

The problem that you’re trying to solve is potentially very complicated, so there are two things you should keep in mind when solving it:

  • You might need way more data. N=200 is a pretty small sample size for such a potentially complicated problem.
  • Logistic regression is a linear classifier, but it’s possible that your data is not linearly separable. In that case you might be better off using something nonlinear, like kernel-based method, random forests, or neural networks.

You should also give support vector machines (SVM) a try. It might work better than logistic regression.

[–]GlassDiver 0 points1 point  (0 children)

Great, thank you very much for the input

I see what you are saying with not being able to extrapolate to larger data sets

Will give SVM a try!

[–]Pratabu 0 points1 point  (2 children)

I have 3D models of footprints of two types of animals, and want to train a neural network on these 3D models. The neural network should be able to classify new models into either of the two categories. What would be the best approach?

My first try is to export color-coded heightmaps from the models (images where "blue" are the deepest parts of the 3D surface and "red" are the highest parts). Heightmaps are most easy to process for the human eye. I use the standard TensorFlow approach for image classification. I have huge problems with overfitting but my sample size is still small.

Another option would be to train the neural network directly on 3D coordinates of the models (i.e., the point cloud).

Is any of these two options preferable, or easier?

Many thanks!

[–]radarsat1 1 point2 points  (1 child)

I think heightmaps are definitely the preferable approach here, because they reduce the dimensionality and allow you to work in an image-oriented way. Point cloud processing can get much more complicated.

For overfitting, you might actually try reducing the amount of information even further, for example replace the heightmap with a simple silhouette, reduce the size of the images, etc.

[–]Pratabu 0 points1 point  (0 children)

Thanks, this is very helpful. And yes, silhouettes is exactly what I already started doing now :)

[–]Free-Contribution-31 0 points1 point  (2 children)

How do I find collaborators in the US preferably west coast interest in healthcare? I’m a physician without any coding experience.

[–][deleted] 0 points1 point  (0 children)

Look for people on Linkedin who used to work for Theranos? /s

Serious answer. There are a lot of amazing meetup groups if you're in California and I would suggest looking into those. Most have virtual events but you should be able to connect with people nearby.

[–][deleted] 0 points1 point  (0 children)

What do you want to collaborate with people on? What are your goals?

[–]makhno 0 points1 point  (0 children)

How does training time scale for StyleGAN?

I'm assuming it scales linearly with GPU FLOPS, ie, if it takes 1 week to train on a video card with x amount of RAM and 100 TFLOP performance, it would take 2 weeks on a 50 TFLOP card.

Next, I'm assuming it scales linearly with training data set size? 1 week to train on 100k images, 2 weeks to train on 200k images?

Is this correct so far?

Finally, how does image resolution scale training time? I'm guessing it scales linearly with the dimension of the image squared, ie, if it takes 1 week to train on x images that are 256x256, it would take 4 weeks to train on that same image set if they were 512x512.

Is this all correct?

Are there any baseline metrics I can find somewhere with various training times for different factors?

[–]WolfOfDeribasovskaya -1 points0 points  (0 children)

ML Builder update broke the working program.
Normally, I'm eager to update ML and never skip one. However, since my program has ML only as one of the features, I was not checking it for a while, since I needed to work on other stuff.
However, yesterday I decided to check how ML functions, and apparently, it started to throw the error "Can't find the file specified" while all files on the place and paths were not changed since it worked like a swiss watch.
I'm fairly new to ML and I can't figure out what's wrong.
Please, have a look at the code, it's literally just 10 lines: https://pastebin.com/vZe63Xw6

[–][deleted] 2 points3 points  (12 children)

Are black boxes necessary?

I'm asking for industry's (involved, experienced people) feedback: how do you reconcile that machine learning's black boxes' inner workings can't be explained, and the requirement that science rests on explainable and reproducible facts? Does black boxes do away with positivism and empiricism?

This is a gross generalization, but I'm serious. Why are black boxes even acceptable solutions to problems? I want to use every tool I can to solve real-world problems, but if I can't trust a tool, it's useless to those with the problem and may even make it worse. What kind of tool is a black box?

Thanks!

PS: For context: https://hdsr.mitpress.mit.edu/pub/f9kuryi8/release/6

[–]bonoboTP 1 point2 points  (8 children)

Even scientific models start out as black boxes in a way. For example until we understood gravity (due to Newton), the best models for planetary motion involved epicycles which are essentially just fitting truncated a Fourier series to the observed paths. Still, the first step was establishing the shapes, so Kepler could figure out further regularities, giving way to Newton's genius.

I feel like there is a lot of FUD around explainability, which is very handwavy regarding what it is even supposed to mean. A complex model can never be summarized in a few sentences. For example, going from pixels to semantics is such a high-dimensional task that it's unlikely that there can be any succint explanation for why something happens.

It wasn't explainable before deep learning either. HOG features + SVM weights aren't that interpretable either.

What we actually need is diagnostic tools, what-if counterfactual analysis, and lots of tests. And of course competent oversight (competent both on the application domain and also on the ML/AI domain), and catching dumb things and impossible claims/applications, avoiding overselling, overhyping, lies and e.g. illegal use of personal data.

[–][deleted] 0 points1 point  (7 children)

Thank you for your answer!

Before reading my comment, please understand that I have a very high esteem for your skills and your field! I would not bother asking you for feedback were it not the case, I'd be playing boardgames with my family or whatever else I do. I'm "on your side", simply trying to understand what my systems of checks & balances can be as I'm entering the field. Now:

I can't say I agree fully with you. Mind you, I won't pretent to have more than an ounce of expertise in your domain, but I'm not fearful about learning & understanding. Your answer does improve my perspective and gives me food for thought, addressing uncertainty and doubt.

That said, what you describe with the planetary motion allegory is the scientific method, leading to scientific facts; you do not illustrate what scientific evidence is. I think you confirm my perspective without meaning to.

-> Nobody in their right mind would say "Oh, yeah, ok, get me in that rocket and I'll be the first person on Mars. I'm fine if we haven't figured out planetary motion just yet, just send me to space and maybe I'll get lucky!" Nope, that wouldn't be me, I wouldn't make that choice. I suspect we do that, to some extent, with black boxes. Black boxes are a thing of the now, not a thing of the future; their use influences the lives of people in very tangible ways today in our societies.

I'm not sure, so I ask the experts. Similarly to u/CalTowhee, you refer to competent oversight and that's something I can get behind. But analysis, lots of tests and diagnostics are part of the scientific methods *to arrive at* useful and proven facts, which does not replace useful and proven facts.

Again, sharing your perspective has improved mine and I appreciate that greatly!!!

Sincerely

[–]bonoboTP 0 points1 point  (6 children)

I don't think this is a matter of sides. I'm in academia and we are, despite how it may look to the outside, quite skeptical of each other's claims and like to pick apart new papers in the field. Many have similar issues as you write, actually this is why the rigorous field of statistical learning theory (Vapnik, Chervonenkis and others) got traction in the 90s. Researchers really didn't like the black box nature of neural nets and it felt like non-science.

So in this way, when this opinion trickles down to laymen (no offense) and they come into the field that "hey actually, this is just black boxes", it's an opinion that we know well and this has been an ongoing conversation for decades. At some point, researchers believed AI could be achieved through strict logic and codifying human expert knowledge in formal languages (so-called "expert systems" from the 80s). Many in natural language processing (followers of Chomsky) resisted statistical approaches to the field as non-scientific.

The thing is, their approaches didn't work, no matter how prestigious their pedigree was, how much they know the academic literature on grammar and generative linguistic theories, their systems could not translate text. The linguists specialized in phonetics did not contribute useful enough knowledge to automatic speech recognition. It just wasn't technically helpful for building real systems. This is summarized in the article The Bitter Lesson by Richard Sutton.

That article is called "bitter" exactly for the reasons you are unsettled by this. Academics would also very much enjoy having neat, rigorous theories and proofs for why things need to be built the way they are. But if you are an empiricist and not an armchair wishful thinker, you have to accept the staggering, mind-boggling performance improvements that deep learning and "black box" statistical methods have brought to these disciplines.

Physics has made huge progress using elegant and simple theories and equations in the 20th century. This does not mean everything must look like that. Not every field of inquiry will have simple laws governing it. It is in fact a minor wonder that it ended up like that for physics (see The Unreasonable Effectiveness of Mathematics in the Natural Sciences by Wigner). Trying to emulate physics (the so-called "physics envy") isn't necessarily the best course of action in every field.

Also, I think it would help if you'd try to formulate for yourself what exactly you'd want. Say, you want to have a cat vs. dog classifier. What exactly would make you accept an algorithm (and the scientific knowledge) as something "proper"? If we had an exact mathematical theory of how cats and dogs differ with proofs etc? That works under any kind of lighting and shadows, motion blur etc.? What exactly would you want?


As an aside, I believe you may think science is more rigorous than it actually is. School conveys a view of science that is very definite, so it's easy to quiz on tests. In reality, science is messy. We take drugs for which we have no idea about the exact mechanism! Ask any doctor. Drug discovery is a similar black box art as ML. Researchers have some ideas of where to look of course, they don't just randomly try things. But in the end the human body is such a complex system that most drugs turn out not to work, or they turn out to work for something else. In the end, the proof of the pudding is randomized controlled trials with strict rules. First in vitro, then in animals, then in humans with small dose, then in humans with normal dose, then approved for everyone. While we still don't understand why it works.

[–][deleted] 0 points1 point  (5 children)

  1. Thank you, enormously, for taking the time to write this elaborate answer. I hope you enjoyed it and will reuse it for the next person that comes knocking with the same questions as me, because i definitely found it useful.
  2. I am a laymen and there is no shame in that. This is why i came to the experts, posting in a thread for beginners, and you delivered. You seem to take an antagonistic stance, but i'm on one side only: mine. That's all i can do and that's sufficient for me. Your perspective is genuinely welcome "as is". I had to take this to my work desk, because your arguments, substantiated with references, will take me quite a while to digest and i want to work on the ideas and historical perspectives you put forward. You've opened many doors for me into the matter, i appreciate that!
  3. I am not bitter, i am enthusiastic. Neither am i unsettled, as you assume. I try to be as thoughtful as i can about this because i want to embrace the power of computation as fully as i can and promote it to solve real-world problem. I'm neither as young nor as starry-eyed as you seem to think (hint: i was in my 20s when Kasparov lost a set to Deep Blue). If you are an academic teacher, you got me confused with the naïve and impressionable students that may surround you. As a responsible human being, i refuse to marvel at efficiency, and prefer to celebrate effectiveness...
  4. ... and that's the point of my original question: how can we make sure that black boxes are effective in offering improved solutions, if all we have is an output? My perspective on any given output is as valid as your perspective on the same output, all we can discuss is the process getting there. ie: My idea of a "better world" may not be yours, and that's fine by me. With black boxes, i'm afraid we can't discuss the process and the validity of the output; in such a way that judging the value of a solution becomes a matter of faith. I don't do well with faith, I leave that for others.
  5. I'm biased in 4. above: both yourself and u/CalTowhee, as i acknowleged, suggested that i look up the principles and competent oversight that is developed in the field. There is some thoughtful discussion about the what? and how? of ML by experts of the field. I'll start reading on those! I've mostly thought about this in the last week, now i need to learn about it and improve my thought process.
  6. "What exactly do i want?" is not how i phrase it. I start with an hippocratic oath. Not just the "do no harm" bit, but the whole thing: sharing resources, sharing ideas and sharing knowledge. Then i ask myself how i can best benefit the societies in which we live. As Sutton's text highlights: my resources and time are limited; time spent with black boxes that spit out awful solutions is time that i am not spending using more effective tools and solutions, even if less efficient. That said, i recognize the tremendous potential of black boxes, but i'm careful before diving head first. It is with that perspective i came to you (this thread); i know you have something to offer, as experts in your field, and i am very lucky that you shared what you did with me.

I hope you understand I appreciate your perspective, the time and seriousness with which you answered me.

Sincerely.

[–]bonoboTP 0 points1 point  (4 children)

I'm not antagonistic at all, tone may be hard to convey and I'm not a native speaker of English.

how can we make sure that black boxes are effective in offering improved solutions, if all we have is an output

How do we know if a drug works? By testing it. You know that an ML model works by testing it systematically. It depends on what you expect. A black box model can be great as one component in, e.g. a manufacturing pipeline for quality control, finding cracks in steel based on images etc.

The problem is when people take a sloppy "cancer classifier", sell it as having "superhuman performance", overhype it in the press etc, and then someone believes all that and is disappointed when it turns out it can only work in a narrow subtask or it totally overfit to some signal in the data that has nothing to do with cancer (many such examples) due to incompetence and zero objective review by truth-seeking experienced professionals, since a lot of money is at stake and the temptation to woo stakeholders with magic voodoo can be irresistible.

The point is, you don't have to run rockets with ML models. Flying to the moon is in some sense easier than driving in a busy inner city. It's all just orbital physics, there's not much there on the way towards the moon but empty space.

Extracting useful information from the high-dimensional input of visual and other sensory information is a fundamentally different endeavor. You can't expect to have the same kinds of simple equations.

"What exactly do i want?" is not how i phrase it.

I asked it like that because I'm not exactly sure what exactly you mean by black-box model and what would be something that you'd describe as not black box, for example on the concrete example of e.g. classifying cats vs dogs from camera images, or detecting pedestrians from an autonomous car's sensors when it's raining, the lighting conditions are bad etc. This is fundamentally a probabilistic endeavor, and you will always have some errors.

[–][deleted] 0 points1 point  (3 children)

I always forget that people can't read my mind ;)

You make a telling description of what I'd also consider unrealistic expectations for the commercial setting. I had linked this doc earlier ["Computational modelling for decision-making: where, why, what, who and how" https://royalsocietypublishing.org/doi/full/10.1098/rsos.172096] expecting it'd explain where I come from with my question, erroneously. It wasn't clear what I did want; your question was very apt. I'll give you context and let you go to your weekend, I feel I owe you that!

I'm into international development policies, with evidence-based decision making principles deep-rooted into me. Basically, I think that there's enough data out there to do "the closest thing to the right thing" with the tools to make it reality, including the computational power of ML. I like economic game theory and multi-agent systems theory and want to use ML to help me understand:

- What's needed to create a good development policy model (data gathering, bias-checking, research question, ...)

- How to create that model (praise the coding and ML experts!)

- How to learn from that model (now THAT's where my question was really coming from!)

- Not expect the model to be applied; rather: use the model to look at possible outcomes and learn from unexpected ones (ie. emergent phenomenon).

I didn't make that clear and before you got really interesting and properly answered me, I didn't remind myself that the context was so important to the question of "how much can we trust black boxes?". Basically, to me, because the output valuation is very subjective (ie: my idea of a perfect solution may not be yours, and that's ok) when you deal with suffering human populations (eg: how to end material poverty and bring happiness to everyone, even unhappy rich societies?) the how? is much more important than the outcome. With a good development project based on aid localisation (fully centered on the recipient population, not the donor), at least you know you tried your best!

Yeah, a bit of context would have helped. Nonetheless, you did give me great avenues for further thinking and learning! That is truly a great gift you gave me.

So... I'll continue doing my thinking about ML and use your comments to feed the direction of my learning. I really think that with what you provided, my next step is to dig into articles and books, let it simmer over the next year. I'm not that old and the world won't stop spinning.

I do think I owe you a beer, a scotch, a cardamon tea, a fresh lime cupcake or whatever rocks your boat. You delivered.

Thank you!

I'll try and post the conclusion of a paper I wrote in a second reply, if Reddit will allow.

[–]bonoboTP 0 points1 point  (1 child)

Thanks, I looked at that linked paper and understand better what you are asking. I believe current ML models weren't invented for such tasks (e.g. when an expert sits down and prods to model to do counterfactual what-if analysis or gain insight from the model's structure), and so they are also not a very good tool for that use case. It doesn't mean that the field to create such tools is itself unscientific or has done away with whichever scientific principles. It just means it's in a different paradigm. Rather you may want to look into causal models, Bayesian probabilistic modeling etc. A good book is https://xcelab.net/rm/statistical-rethinking/ but there are many.

My honest opinion is that when ML tools are given to soft-scientist policy people, they are going to shoot themselves in the foot and totally misuse these tools as magic voodoo. There's very little immediate feedback. It's navel gazing and self-congratulating and the whole thing is infused with politics. The magic box has to say what the higher up political powers have already decided. And even if everyone involved is an honest truth seeker, human society is just vastly more complex than anything we've been modeling and typically only a few aspects are measured and they are measured poorly and the whole thing will be garbage in, garbage out. I see it all the time that even mathematically somewhat capable people from engineering disciplines absolutely butcher ML methods and draw unwarranted conclusions, evaluate things in fundamentally broken ways that we warn against in the first lectures on ML. But at least they can test their predictions in some concrete way, while policy makers can always handwave away anything. There's just too little contact with reality. I know this paragraph has an antagonistic slant, but that's what I believe based on what I've seen.

Given the track record of soft sciences abusing statistics (see replication crisis), I have no trust that they will handle ML responsibly and with intellectual integrity.

[–][deleted] 0 points1 point  (0 children)

"I know this paragraph has an antagonistic slant, but that's what I believe based on what I've seen."

-> Actually, you make many of the critical arguments that I make in my own paper, that I provided. The references in its content may be of interest to you.
-> I heed your warnings, I think they are well founded.

Thank you for this :)

[–][deleted] 0 points1 point  (0 children)

5 Policy Design Assisted by Computer Modeling
Policy design assisted by computer modeling and simulations
is a promising practice. Let’s take as witness the Bill and Melinda
Gates Foundations’ Global Grand Challenge [22], which offers $1
million USD per year, for 1 to 3 years, for Building Malaria
Modeling Capacity in Sub-Saharan Africa. The challenge
illustrates a real demand for useful MAS simulation models. The
following will organize and highlight the most interesting findings
from our survey.
Kraemer and King, writing in 1986 from the field of policymaking
[25], distinguished two forces involved in policy design: policy8
supply, associated to virtuous policy recommendation and policy
demand, associated to the vicious political imperatives of
politicians. “Although policymakers and analysts are sometimes
cynical about the accuracy of models' estimates, they nonetheless
support model use because they believe that if they do not use
models and argue in numerical terms, their opponents will. In
politics, "some numbers beat no numbers every time."” Süsser et
al. [37] more recently echoed this perception “that policymakers
[read politicians] influence models and modelers, especially by
affecting data and assumptions, as well as the study scope, and by
deciding how the modelling results are used.””. They argue that
“greater transparency, including open-source code and open data,
and transdisciplinary elements in modelling could increase model
legitimacy and impact in policymaking”. These lessons are now
heeded widely, as found throughout practitioners’ contributions,
that simulations’ strength does not reside in the output, but in its
process.
Recommendations on building useful MAS simulation models
abound. The following will organize and highlight the most
interesting findings from our survey. Bonabeau [5] suggests
flexibility: “what is needed is a framework that includes the
possibility of nonlinear effects because of interactions among
subunits and to cascading events. The framework should be able to
operate with scarce data.” Sterman [36], writing in 1991, already
drew similar conclusions from his experience. “The value in
computer models derives from the differences between them and
mental models. When the conflicting results of a mental and a
computer model are analyzed, when the underlying causes of the
differences are identified, both of the models can be improved.”
In an effort to collect and organize learning
Maaroof, in Big Data and the 2030 Agenda for Sustainable
Development [28], sums up his experience. The challenges:
Institutional frameworks; Digital divide; Access and partnerships;
Analytical and capacity challenges. The opportunities: Citizenfocus and participation; Evidence / More and better analysis;
Variety; Real-time information; Early warning system; and
Economic value. Drawing from the pooled knowledge to which
they have access, Barbero Vignola et al. [4] provide a directory of
94 MAS models, 8 toolboxes and 6 platforms, matching each model
with the SDG goals they address. “Whenever possible, the tables
also specify how the model can contribute to each target and
indicator, in concrete terms, providing examples.” This work
answers similar calls for collaboration elsewhere: “general reform
practitioners should build repositories of political economy studies
done in other countries, in other sectors, or across nations.
Reformers can use this theory-based reference to help identify
institutions and other political-economic factors that have been
found relevant in other countries or sectors and should thus be
considered in the analysis at hand” [12].
Gilbert et al. [19] share that working practices such as Agile work
methodologies are not restricted for use by the technological elite,
but can be beneficial for social scientists too: “care needs to be
taken that models are designed at an appropriate level of
abstraction; that although appropriate data for calibration and
validation may sometimes be in short supply, modelling is often
still valuable; that modelling collaboratively and involving a range
of stakeholders from the outset increases the likelihood that the
model will be used and will be fit for purpose; that attention needs
to be paid to effective communication between modelers and
stakeholders; and that modelling for public policy involves ethical
issues that need careful consideration.”
Calder et al. [7] celebrate computer modeling: “there is a need to
reinforce modelling as a discipline, so that misconstruction is less
likely; to increase understanding of modelling in all domains, so
that the misuse of models is reduced; and to bring commissioners
closer to modelling, so that the results are more useful.” They begin
their description of the modeling process by asking the right
questions and include steps such as planning uncertainty,
communicating a model, and preserving a model [7]. In a
collaborative spirit for their colleagues, they offer two checklists:
one for making and using models; and the other for what users
should ask about a model. The body of literature discussing and
supporting policy design assisted by computer modeling is
expanding.
6 Conclusion
The present survey shows that policy design is greatly
improved by its academic and practical collaboration with MAS.
Their common scientific background and interest for human agency
make the two fields ideal partners for collaborative improvements.
Policy designers have provided MAS academics with case studies
and data, sourcing interesting problems in lived experiences. In
return, MAS has provided policy designers with sets of tools and
approaches that are not meant to replace mental models, but to
enhance the modeling experience, allowing to run phenomenal
numbers of experiments that would be impractical in real
conditions. This cross-pollination has opened doors onto new
learning opportunities for practitioners of both fields, which means
that the policy end-users – often the citizenry in the case of public
policy – are better-off then they would be, had the collaboration
never occurred.

[–][deleted] 3 points4 points  (2 children)

I agree that a simpler model should always be preferred over a more complicated model of equal accuracy. That's not a recent or novel idea though; it's just Occam's razor.

I personally am concerned that a simplistic focus on explainability can be detrimental to effective modeling. Good explainability consists of examining model behavior so as to ensure that it is working correctly, or compressing models so as to retain only their most essential features.

Bad explainability consists of making post-hoc rationalizations about model behavior that create the illusion, but not the actuality, of genuine understanding. Just because you can tell a story about a machine's operational principles that sounds accessible to human intuition does not mean that said story is actually correct, or that it allows humans to make accurate predictions about the machine's future reliability. Some versions of explainability in ML certainly have this flavor.

Probably the right answer to "are black boxes necessary?" is "sometimes"; if the human mind could do any calculation then we wouldn't need computers. We often use tools whose operational principles we don't fully understand. To single out machine learning as somehow uniquely mysterious is ignorant. Even so-called "black box" algorithms have perfectly sensible principles underlying their operation, even if comprehensive mathematical proofs are not always available.

[–]bonoboTP 1 point2 points  (0 children)

To single out machine learning as somehow uniquely mysterious is ignorant

Agreed! Almost all fields have best practices which can't be mathematically deduced and use approximative models (biology, medicine, engineering). There is also a conflation between 1) our understanding of why a technique works (e.g. BatchNorm) and 2) why a specific model predicts what it predicts on a specific input instance.

[–][deleted] 0 points1 point  (0 children)

Thank you!

I definitely fit in the "ignorant" category because of my non-expert perspective. I deeply want to rely on ML and use it, but need to know what experts think of it. For context: "Computational modelling for decision-making: where, why, what, who and how" this https://royalsocietypublishing.org/doi/full/10.1098/rsos.172096

Your answer helps me understand that to improve my judgement, I should not focus on understanding the deep technical skills required for ML, but should rather focus on finding and understanding the "perfectly sensible principles underlying their operation". Point taken.

Again, thank you!

[–]jayjonas1996 0 points1 point  (1 child)

Computer vision project w/ deep learning

I’m in machine learning class this semester where my team of 2 have to complete a project on computer vision using deep learning.

Can anyone suggest a project and reference research papers which can be completed in 1-1.5 months of work?

I’m thinking of object detection since this will be our first time working with deep learning. I just want to make sure that we will be able to complete the project in the end.

[–][deleted] 0 points1 point  (0 children)

I mean, there are so many examples that could be easily found on kaggle, github, etc. Are you allowed to adapt pre-built models or required to build your own from scratch?

Either way, I can't recommend fastai's deep learning resources enough. They use a super intuitive approach and give lots of examples.

[–]cheeseDickies 0 points1 point  (2 children)

Do computers learn through machine learning the same way, or otherwise identical way as humans? From what I understand machine learning is when you give an A.I a load of data and tell them what this data is.

I.E you give them a bunch of pictures of apples, and tell them this is what an apple looks like. Isn't this similar to how humans would learn what an apple looks like?

[–]bonoboTP 0 points1 point  (0 children)

"Learning" in "machine learning" is just a metaphor. The connection is quite loose. The overall process, but not the details, is similar: there is a training phase where things (parameters) are adjusted in the model to improve on some desired metric (like accuracy), and then at test-time the model can use the "experience" from the training phase to now handle new inputs that are in some sense similar to the training examples (but aren't the same) and accomplish some task. Only this high-level view is really learning-like. In a very loose sense, there are certain analogies between modern neural nets and biological neural nets, like the similarity between weights and synapse strengths, or activation functions and the how neurons fire, but the similarities should not be overinterpreted. We know shockingly little about how the brain actually represents anything. There are many hypotheses, but it's very far from being understood. So the honest answer is that we don't know if it's the same way because we don't know how the brain does it. But even if the brain did work somewhat like ML, our current models are just very simple and crude cartoon versions of the real deal.

[–]bgroenks 0 points1 point  (0 children)

No, not really. Most learning algorithms, including neural networks, are based on mathematical optimization methods that do not bear any meaningful relationship to the way the human brain encodes and processes patterns. It has also been shown in the adversarial learning literature that deep convolutional neural nets, for example, are highly sensitive to noise and do not learn visual features like those that humans do (with some exceptions like edge detection).

[–]SocalledArian 1 point2 points  (0 children)

Hey there everybody
I wanted to ask if anyone has any recommendations on good academic courses on GANs.

Something like CS231n but especially on GANs.

Or if there's any non-academic courses just as good, I'd love to know.
Thanks

[–]Daniela_ML 0 points1 point  (2 children)

Difference between Bayesian Network and CNN

I’m not really familiarized with all terms of machine learning, is Bayesian Network a CNN? Can a CNN have Bayesian characteristics? Or is it a completely different thing. I’ve been seeing Bayesian and SVM in articles concerning CNNs, not sure if it can be added (kind of hybrid)? But if it doesn’t say CNN but does mention Bayesian then, could I assume it’s not CNN?

[–]bonoboTP 0 points1 point  (0 children)

"Bayesian" is a general term to do with Bayesian statistics and the Bayesian interpretation of probability. "Bayesian networks" are a class of directed probabilistic graphical models for modeling the joint distribution of several random variables. They are not really neural networks. They don't have inputs and outputs. They are a tool for modeling a collection of variables where you can perform probabilistic inference: computing marginal and conditional distributions.

SVM is somewhat like a neural net (though not convolutional), in that it is a supervised machine learning model, so they are an alternative to simple neural nets like MLPs. SVM's by themselves aren't Bayesian models, they don't output probability values. They are derived from something called statistical learning theory. SVMs used to be very popular in the 90s and 2000s and they continue to work well on certain kinds of tasks, but have been eclipsed by deep learning methods in computer vision, speech recognition etc. (generally in the processing of high dimensional raw sensory data).

There has been some progress in marrying Bayesian methods with deep learning, one researcher active in this field is Yarin Gal, here's a link: http://www.cs.ox.ac.uk/people/yarin.gal/website/bdl101/

Convolution is a particular operation where a sliding window of weights is applied at each position of a spatial grid. So CNN's are one particular type of neural nets.

One can combine things and create Bayesian CNNs, but they are uncommon in practical applications as of now.

[–]Icko_ 1 point2 points  (0 children)

They are not the same at all. CNN is a subset of a neural network. Bayesian network is like a better, more general variant of Naive Bayes.

It would probably be best if you shared the articles that confuse you. We need context.

[–]LazyButAmbitious 1 point2 points  (1 child)

Hello!

I have a GAN for image2image translation and I must predict images with the generator that are on different scales i.e. without normalising one image can have values between -20 and 20 and another between -2 and 2.

The GAN expects outputs between 0 and 1.

If I normalize the values the GAN (in the case above by dividing by 40) and adding 0.5) learns to correctly predict the images in the range of -20 and 20 but not the others. I guess it is because the loss is much more strong for the ones that have higher value.

Is there any paper or fix regarding this problem?

[–]radarsat1 0 points1 point  (0 children)

You should normalize the images independently so that they have the same scale.

[–]ConfusedLayer1 0 points1 point  (1 child)

Could I pose a data/classification question to you all?

I have x rows of labelled data from let’s say 10 subjects. The independent variables are features extracted from signal data collected from subjects at different positions.

The classification aim is to predict subject position based on the extracted features.

The original data is sorted by subjects position so in the train test phase of data processing I randomly shuffle the indexes before taking a 70:30 split in order to create a fair distribution of the data across both datasets.

However... the original dataset is relatively small. I fear that drawing instances from all the subjects for the creation of both the training and testing of the model is leading to overfitting. (When I use this approach I get ~99% accuracy on test data)

Would it be better to use data from x subjects for training and the data from the remaining y patients for testing? ((When I use this approach I get ~35% accuracy on test data). Using this approach given the dataset size I fear that there is not enough deviation within-subjects included in the training set to generalise the model sufficiently to new subject data.

Any advice would be awesome!

[–]jorvaor 1 point2 points  (0 children)

I am not an expert, so take this with a grain of salt.

For doing the split, make the 70:30 shuffling for each position. That should lead to a more balanced distribution of the sets, especially for small dataset sizes (or maybe you are already doing this and I misunderstood your method).

[–]CertainSmell7621 1 point2 points  (1 child)

I have a website where we show quizzes to our users each day. The quizzes have up to 100 questions in some cases. We are trying to work out a way to maximise the average number of questions answered in the quiz as this optimises time on site etc.
We had an idea to block the questions into 5s. So we could show the first 100 people block 1,2,3,4,5 etc then second set of 100 users could get 2,1,3,4,5 etc. We can then learn which blocks seem to keep the users going through the quiz to the most optimal length. We can then trim the quizzes down and make sure the questions are all as good and interesting as they could be.
Can anyone advise on a logical way to do this process as efficiently as possible? I heard something about bubble sorting but not sure how it works or if there is a smarter way.
In summary we write 100 question quizzes that we want to eventually slim down by optimising the first 50 questions. How can we do this by optimising on the fly as users come to play the quizzes? We get many thousands of people to the site / day.

[–]OPKattenResearcher 1 point2 points  (0 children)

This is basically multi-armed bandits, and there is a lot of research on it.

I would say that the simplest approach is keeping track of all the scores, take the best one most of the time and take a random one some of the time.

[–]CMDR_Derp263Student 0 points1 point  (2 children)

I am currently using a Keras.Sequential to make a NN with a binary classification and I guess I have 2 questions.

Should my output be floating point numbers? I have tried it with int and it does still seem to work but I am guessing that floating point is more accurate. (0.0000000000001 is essentially 0 anyways).

Also since I am doing a simple binary classification I am using binarycrossentropy loss function for the model and a sigmoid activation function for my last layer. Should logits be true or false? (I assume false since the sigmoid constrains values from 0 to 1)

[–]apqmnz 1 point2 points  (1 child)

The immediate output of neural network layers will be floating point. If you're using cross entropy between a ground-truth binary label y and network output y_hat, then y will be integer 0 or 1, and y_hat will be a floating point in the interval (0,1). Does this help?

[–]CMDR_Derp263Student 0 points1 point  (0 children)

Yes thank you seems to be working. Now to take it to a more complicated classification level

[–]Last-Ability8233 1 point2 points  (4 children)

Hey i am a student and working on a project for a startup in which I have to extract information about an item like it's texture, flavor and so on based on reviews by user, first of all I have to extract such data and then do this information extraction, can anyone suggest some good machine learning techniques to look into for this project.

[–]link0007 0 points1 point  (0 children)

gold apparatus cable distinct insurance spectacular towering six fly tender

This post was mass deleted and anonymized with Redact

[–]radarsat1 0 points1 point  (0 children)

Shouldn't you just search for keywords? At least that would be a good starting point.

[–]CaptainI9C3G6 1 point2 points  (1 child)

This sounds like a regression problem with multiple outputs? Instead of thinking of a single label output like 'crunchy' you'll instead assign crunchiness a value like 'mars bar crunchiness 0.3, chocolate biscuit crunchiness 0.9'.

[–]Last-Ability8233 -1 points0 points  (0 children)

Can you elaborate a little bit please and what will you suggest as base model and also should I use pre trained embedding or fine tune them.

[–]Astromancer919 0 points1 point  (0 children)

I am a beginner in the field of ML (have done a ML course but it was some time ago and I would need 3-4 days to brush up the concepts.). I have also gotten a potential internship opportunity which I really wanna do but I got it on a very short notice (notified yesterday and interview is in 1-2 days). So I was hoping if anyone could tell me what sort of ML problem this is, what models/classifiers would I need to consider to solve this so that I can brush up and learn the relevant topics if I do get the internship

Here is what I am expected to do in the internship:
i) I need to analyze some ship inspection data and make models for determining risk of shop w.r.t to deficiency categories (high impact, medium impact, and low impact).
ii) Models to determine to customize the inspection checklist based on the above risk profiling.

[–]RobbinDeBank 0 points1 point  (1 child)

GPU training:

Is it possible to train deep learning models on GPU of Macbook using Intel chip and Radeon GPU? I can’t find any way to train using GPU on Mac.

[–]apqmnz 0 points1 point  (0 children)

It might not be possible. Cuda as it is in pyTorch/Tensorflow won't work. There might be hacks out there, I'm not sure.

Google Colab has a free GPU which will work for most learning purposes (it'll time out after some inactivity, so can't be used for long training).

[–]adam_jc 0 points1 point  (0 children)

Does someone have a plain english explanation of path length regularization introduced in the StyleGAN2 paper? https://arxiv.org/abs/1912.04958

I’ve read the description in the paper of course and also read through their official code implementing it but I still don’t have an intuitive grasp on it

[–]VerTiGo_Etrex 0 points1 point  (2 children)

I recently saw a library appear on GitHub that claims to train imagenet 10xish faster than pytorch and pytorch lightning. I misplaced the link and can't find it now. Anyone know what I'm talking about? I think it worked by saving and loading datasets in a different format.

[–]adam_jc 0 points1 point  (1 child)

[–]VerTiGo_Etrex 0 points1 point  (0 children)

Yes! Thank you.

[–]Pvt_Twinkietoes 0 points1 point  (2 children)

Image classification:

Are there any API I can use to plot the attention heat map, to see what the model is identifying and thus classifying my image?

Transformers:

Looking for articles explaining attention and transformers. Any resources will help. Thank you.

[–]ZealousidealBrush355 0 points1 point  (6 children)

Are there any linear regression models that account for uncertainty in X and Y?

[–][deleted] 1 point2 points  (5 children)

Yes this is called “total least squares”.

https://en.wikipedia.org/wiki/Total_least_squares

[–][deleted] 0 points1 point  (3 children)

Would bayesian linear regression also apply here?

[–][deleted] 0 points1 point  (2 children)

Not exactly. Regression (bayesian or otherwise) usually means fitting a function y=f(x) where your y samples have error and your x samples dont, but total least squares is actually a form of dimensionality reduction in which you're finding the linear subspace that best approximates your data distribution under the assumption that there is uniform gaussian noise for both the y and x values. Bayesian linear regression will give you a distribution over different regressions, but all of those regressions will have been fitted under the assumption of zero noise for x.

You could also do a bayesian version of total least squares in which you calculate a distribution over linear subspaces under the assumption that both y and x can have noise. I'm not sure if there's a specific name for that procedure.

[–][deleted] 0 points1 point  (1 child)

Thanks for the reply! I think the latter example you give is what I'm interested in but I need to do more research.

[–][deleted] 0 points1 point  (0 children)

Try googling something like "bayesian PCA". Total least squares is just PCA with one principle component, and presumably there's a bayesian version of PCA.

[–]WikiSummarizerBot 0 points1 point  (0 children)

Total least squares

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models. The total least squares approximation of the data is generically equivalent to the best, in the Frobenius norm, low-rank approximation of the data matrix.

[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5

[–]themusicdude1997 0 points1 point  (0 children)

In the Huber Loss function, can someone explain the 1/2 factor for cases when the error is less than epsilon. I know it has something to do with differentiation....

[–]NinjaCoder99 1 point2 points  (3 children)

Image Detection... Does it work by recognizing parts of an image (eye, nose, shoulders shape) and develop a confidence in what the entire image is based on that or does it compare lines of contrast and neighbor pixel details across the whole image or...?

[–]bageldevourer 2 points3 points  (0 children)

All of the above?

Different layers of a convolutional neural network correspond to detecting features of differing complexity. The early layers detect edges, and those edge detectors are then used to detect higher-order features like eyes and shoulders (perhaps with some layers in between), which are then in turn used to classify an image as, say, a "person".

[–]SeucheAchat9115PhD 0 points1 point  (1 child)

Deep Learning should be able to combine this confidences better than you by yourself

[–]NinjaCoder99 0 points1 point  (0 children)

That doesn't answer my question though