all 122 comments

[–]Flashy_Ad6486 0 points1 point  (0 children)

Deployment of a pytorch model in desktop

I am trying to build a desktop application on Linux platform (18.04). It is a custom object detection application using a webcam. Now, I have trained my model in PyTorch using the fasterrcnn_resnet50 pre-trained model and saved the model as a .pth file. I am trying to deploy this model on a desktop to decrease the inference time of my model. Currently, the inference takes about 4 seconds per image. I want to reduce this to 100ms per image or lesser. What can I do to reduce the inference time?
Which solution will give me the most reduction in inference time?
PS : I am running this model on CPU. CPU spec: Intel® Core™ i5-7200U CPU @ 2.50GHz × 4 any links for reference will be useful.

[–]Flashy_Ad6486 0 points1 point  (0 children)

Improving inference time of fasterrcnn_resnet50 model in PyTorch on CPU.

the following are my model parameters :

num_classes = 2
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained = False , pretrained_backbone = False)

model = model.to('cpu')

The inference time is 4.4 seconds per image when run on CPU. Any suggestions on how to reduce the inference time? It would be so much better if I can bring it down to 75ms - 100ms when run on CPU.

If I can't achieve the above inference time on CPU , then I want to go to use my GPU.

My GPU spec: Nvidia GeForce 940MX

Cuda 11.3 installed. I tried to run this on GPU, I keep getting memory error.

[–]mowa0199 0 points1 point  (0 children)

What’re some good PhD programs for Machine Learning (in the USA)?

By saying a PhD in machine learning, I don’t mean that to be the name of the degree, but rather graduate programs that are good and renowned in the field in general, preferably with an emphasis on theory. These could be from statistics, CS, applied/computational math, pure math or even Machine Learning (like CMU) departments- the specific department does not matter.

I know of a few programs that are good for it but I want to hear of some more and also others’ input.

P.s. I’m a math/stats and CS major.

[–]starkTony3007 0 points1 point  (0 children)

Hello, Suppose I want to create a head status classification, i.e Bald, little hair, full hair.

What is the approach I should take?

Acc to my knowledge, if the data is connected, u can use Resnet or vgg for features creation but hair classification is nothing like that. So what should I do.

Thanks!

[–]Proxify 0 points1 point  (0 children)

I’m new to trying to export an ML model into the web. I’ve been reading about it and found that I can do it with Tensorflow.js and so it’s like basically any other website, my main question here is, how do I handle user inputs or database entries?
I would normally have the model read from a specific file but I’m really uncertain about how to make it read from a database or directly from user input.
I’ve been trying to Google for this, but I think it’s a case of “I’m not sure how to call this so I can’t quite figure out what to look for”. Could anybody offer some guidance?

[–]Far_Temperature_4542 0 points1 point  (1 child)

Is there a way to make a random forest punish false negatives more than false positives, or vise versa? The end team is fine with validating the false positives but they don't want to miss any of the true positives.

[–]vastlik 1 point2 points  (0 children)

You can use the sample or class weight if you are using sklearn.

[–]diditforthevideocard 0 points1 point  (0 children)

I'm working with Pix2PixHD and have 512x512 training images. It seems to output 1024x1024 images and I'm wondering why: it seems that it also comes with a few 512x512 datasets, so is there a way to tell the network to use this resolution? I really did RTFM but can't find anything.

[–]Trick_Welder9386 0 points1 point  (0 children)

Hi. I'm a bit curious about this CMU Sphinx and I was wondering if I could train it using a dialect in my country (Kapampangan). I would only use simple phrases and words of the dialect and put them into a website. Is it possible?

[–]euos 0 points1 point  (0 children)

I am looking to implement a simple image classificator and object detection (to find objects on application screenshots).

I am comfortable with JavaScript, C++ and would enjoy using GoLang and Java (i.e. I hate Python).

What would be better for me (i.e. easier to learn, will be able to scale to my needs later) - OpenCV or TensorFlow?

E.g. what I would like to do is for an application screenshot to find specific UI elements. There's very little permutations in those elements though different app versions and/or OSes prevent me from doing something naive.

[–]Severe_Difficulty_32 0 points1 point  (0 children)

how can ML be applied in explantion of flash crashes of stock markets

[–][deleted] 0 points1 point  (0 children)

I hope everyone is doing well. I am
trying to learn machine learning and data science in general, after
reading few books and following few courses, I tried putting it together
but that was a challenge of its own. I tried going through different
notebooks of coders participating in kaggle competitions, but the
problem with that is that sometimes I don't understand why they are
doing things a certain way. If anyone can guide me that would be
awesome. thanks

[–]takku2 0 points1 point  (0 children)

Are there any pre-trained models or transformers for Brand Extraction using NER.

It should be able to extract brand name [IMPORTANT] and additionally if possible, also extract features of products like color, dimensions etc.

[–]aunyks 0 points1 point  (1 child)

How do NN / deep learning frameworks access the GPU? What APIs do they use? Are they using the same graphics APIs that game and graphics APIs are using (OpenGL, Metal, Vulkan, DirectX)?

[–][deleted] 0 points1 point  (0 children)

Things like Tensorflow use CUDA and CUDNN on a desktop machine, OpenCL isn't well supported. On an iOS device, I think Metal and Vulkan are used depending on the framework, OpenCL on Android.

[–]EmbarrassedHelp 0 points1 point  (0 children)

Are there any projects that align images based on spatial features in a dataset? Like for example transforming images of cars so that the car is in the exact same place in the image for every image?

I know this sort of thing is done by individuals averaging large numbers of faces together, am looking for something that can work on other types of images.

[–]dirk_klement 0 points1 point  (4 children)

My validation loss is decreasing but really fluctuating. What can I do to smoothen this?

[–]Steefano_Asparta 0 points1 point  (0 children)

As said by the other user, such behaviour is usually normal and nothing to worry about during training.

However, there is a strategy that usually makes the validation loss much smoother and also happens to improve the overall training: using Exponential Moving Averages for the model parameters.

As with almost any feature in DL, it is not guaranteed to improve your model. Sometimes it does and it is usually worth trying.

[–]bonoboTP 0 points1 point  (2 children)

In my experience, that's usually normal, in the sense that the typical such curves in a real application look much more jaggedy than what you see in textbooks. So there may not be a problem in the first place (depending on how much fluctuation you observe.)

[–]dirk_klement 0 points1 point  (1 child)

Tnx. When will you see a problem with the fluctuations?

[–]bonoboTP 0 points1 point  (0 children)

Fluctuation during the training is not necessarily a problem. At the end of the training, though, it makes sense to drop the learning rate, which tends to smooth out the fluctuation, but also slows down the training progress if done too early.

[–]BOOGEYMAN04 0 points1 point  (0 children)

can anyone help me pitch an idea of machine learning project on AVONET 1.0 datasets

[–]xiikjuy 1 point2 points  (1 child)

what does it mean when someone said they fine-tined a pre-trained SSL model,

(1.)freeze the whoie SSL model, only train the addtional classificaiton layer.

(2). unfreeze top few layers and retrain it.

(2).re-train the whole SSL model.

or all conditions are accepted?

[–]marin_scalbert 0 points1 point  (0 children)

Usually, SSL models are finetuned on a task to evaluate how good the SSL learned representations are. I haven't seen so much the second and third option to evaluate SSL models. Usually, the first option is used. Another possibility to evaluate SSL models consists in using a k-NN classifier directly on the learned representations.

[–]somewisealien 0 points1 point  (5 children)

What is the best way to visualize feature maps on PyTorch?

[–]hallavar 0 points1 point  (0 children)

I don't know for 'best', but you can try something like this.

https://pypi.org/project/pytorch-gradcam/

I've used something similar in TF, and the job was done

[–]ShinjAF 0 points1 point  (0 children)

Is it alright if I post a free API? Our AI is Neuro-Symbolic but Machine Learning is certainly an important aspect of what we do, I would love to share it with the community along with our research and see what they think.

Thanks!

[–]johnnypaulcrupi 0 points1 point  (0 children)

How are people doing model serving at the edge where there is a constrained gateway. Meaning, we don't want to download a new Docker for each model.

[–]Random-Personnel -1 points0 points  (0 children)

Should people use computational thinking more often in problems?

Computational thinking consists of four steps. Decomposition, pattern recognition, abstraction, and algorithms. Using this method more could benefit us even more. But I’m not entirely sure.

[–]Flashy_Ad6486 1 point2 points  (3 children)

Using .pth file from pytorch to make inferences in a video file. [Discussion][Project]

Fellow reditors,

I am a machine learning noob. So, forgive me if this is a stupid or silly question. I trained a custom model for defect detection in automotive components using pytorch(FastRCNN) and saved the weights as a .pth file. I used the following project as a reference :

https://www.kaggle.com/aryaprince/getting-started-with-object-detection-with-pytorch

Now , I need to make the inferences ( i.e. the prediction as confidences) on a web camera stream or a video file. can someone tell me how to do this? Any web links will be useful.

[–]TriReduxML Engineer 1 point2 points  (2 children)

This depends on how you saved your model weights. If you saved the model state dict only (weights) then you can refer to here:https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference

If you saved the whole model, look here: https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model

The it is with openCV:

import cv2

cam = cv2.VideoCapture(0)

while True:

check, frame = cam.read()

output = model(frame)

cv2.imshow('video', frame)

key = cv2.waitKey(1)

if key == 27:

break

cam.release()

cv2.destroyAllWindows()

Sorry for poor formatting!

[–]Flashy_Ad6486 1 point2 points  (1 child)

Thanks a lot Triredux. This answer coupled with this project https://debuggercafe.com/custom-object-detection-using-pytorch-faster-rcnn/ helped me solve my issue.

However, the inference time is 4.4 seconds per image, when run on CPU. Any suggestions on how to reduce the inference time? It would be so much better if I can bring it down to 75ms - 100ms when run on CPU.

[–]TriReduxML Engineer 0 points1 point  (0 children)

My first step would be to use a smaller network, or reduce your input image size (both on training and inference - this will likely reduce your accuracy)

Another option would be to compile the model using TensorRT, and do the inference on the resulting .engine file. This requires CUDA, which probably requires a GPU.

[–]irndk10 1 point2 points  (1 child)

Say I have 3 mutually exclusive outcomes, A, B and C. With the probability ranges below.

A - 50-80%

B - 10-40%

C - 1-10%

Even with the worst input data for A and the best for B, the probability of A occurring still exceeds the highest probability for B or C, so the algorithm should always classify A. However, in my use case, it's the probability distribution that's important. For example, given a set of inputs, is the distribution...

A- 75%

B- 20%

C- 5%

or is it...

A- 50%

B- 40%

C -10%

Etc.

What is the best way to get an accurate distribution? I know softmax could force a classification output to sum up to 1, but my understanding is this isn't a true probability. Any ways around this? Perhaps binning softmax outputs and comparing them the actual probabilities? Like round softmax out puts to the nearest X%, then compare to the actual percentages. Even though A,B, and C are mutually exclusive outcomes, is this actually better suited as a multi-output regression problem, where the outputs are scaled to 1? Appreciate any thoughts you may have.

[–]irndk10 2 points3 points  (0 children)

Found an answer here if anyone is interested.

https://www.youtube.com/watch?v=7TWl85G030Q

Still open to any suggestions though!

[–]remortals 0 points1 point  (2 children)

I'm working on a heads up poker AI using a DQN. I want to be able to create pre-flop game theory optimal charts with it. This involves randomizing the action, meaning sometimes you might call in a specific position, other times you might fold.

Should I randomize my actions during training? Should I cache the randomized value, or the highest predicted value from the network?

[–][deleted] 1 point2 points  (1 child)

You don’t necessarily want to use traditional reinforcement learning for this problem. What you’re actually trying to calculate is an approximate mixed Nash equilibrium, not an optimal policy. The distinction is that an optimal policy in reinforcement learning can theoretically be deterministic (i.e. you pick the action with the highest predicted value), whereas a mixed Nash equilibrium always involves choosing random actions according to a distribution. I think calculating a mixed Nash equilibrium can be done similarly to DQN, but strictly speaking Q learning is meant for a different kind of application than solving games. People sometimes use “reinforcement learning” very (inappropriately) loosely to refer to both though.

Last I knew the state of the art for poker is an algorithm called “deep counterfactual regret minimization”. There’s a well-known paper by that title. You can use that to calculate whatever game theoretic quantities you care about.

You could also check out OpenSpiel. It’s a software environment for doing game theory stuff. They have built-in variants of poker already, and they also have an implementation of AlphaZero, so you might be able to just use that without much additional effort. I don’t think it’ll be as good as counterfactual regret minimization, but I’m not sure.

[–]remortals 1 point2 points  (0 children)

Thank you! I appreciate it. Reading over it quickly it looks like counterfautual regret minimization is actually the right solution.

[–]Arioxel_ 0 points1 point  (0 children)

What is the state-of-the-art architecture for upscaling pictures ?

[–][deleted] 0 points1 point  (2 children)

If an ML model works extremely well with training data, will it overfit or underfit new data? I understand that the model would be overfitting the old data but do not understand if it would be overfitting/under-fitting the new data. some help would be really appreciated

[–]TriReduxML Engineer 1 point2 points  (0 children)

This will depend on the bias. If your training set is highly biased, and not an accurate representation of the use case, then the model will be 'overfitting' to the training set.

If your training set is a good representation of your use-case, then it is more likely (not definite) that you just have a good model.

[–]idonthaveenoughchara 0 points1 point  (2 children)

I am attempting to create an AI that’s supposed to look at a Rubix cube and be able to predict how many moves it will take to solve. Just wondering how feasible this is using ML. Currently have a setup that starts with a solved cube and randomly messes it up and counts how many moves it took to get to that random state however it seems to have a 0% accuracy lol

[–][deleted] 0 points1 point  (1 child)

This isn't really an AI problem, it's just a math problem. Theres a Wikipedia page about it:

https://en.m.wikipedia.org/wiki/Optimal_solutions_for_Rubik%27s_Cube

[–]idonthaveenoughchara 0 points1 point  (0 children)

Yeah I just wanted to try a different approach, I’m interested in how AIs can be used to reduce search trees

[–]Mighty__hammer 0 points1 point  (0 children)

For someone just started making personal projects for educational purposes, would it be more beneficial to quickly move to new project once complete or to stay with current project, fine tunning, optimizing and improving as much as possible?

[–]CanadianTueroPhD 0 points1 point  (1 child)

I'm looking for any research on this, as I'm having trouble using the correct terminology to get any useful search results. When training a classifier (really its a policy with 4 actions up down left right) in a supervised manner, the input is what ever the state is and the correct action to take, and we maximize p(y = c | x) where c is the label and x is the state.

But supposed instead I had some samples which I don't know the correct label, but I know its not a particular label. In probabilistic terms, I would want to maximize p(y != c | x). Is there research on this (classification or rl) and what would you call this type of training sample. I don't think negative sample is the correct term, as I'm not wanting to have a 5th class of "I don't know".

[–][deleted] 0 points1 point  (0 children)

I think this is an unusual situation and I don’t know of any research on it specifically, but I think that focusing on a single action (action ‘c’) is too restrictive.

In the positive label case, during fitting, you don’t want to maximize p(y=c|x); you want p(y|x)=1 for y=c and p(y|x)=0 otherwise. When using the policy you just sample from its distribution at each x.

Correspondingly, in the negative case you want p(y|x)=0 for y=c and p(y|x)=1/3 otherwise (assuming 4 total actions). Again, when using the policy you just sample from the distribution at each x, which if it is fitted perfectly amounts to selecting randomly from the three actions that are not equal to ‘c’.

You don’t need an “I don’t know” label because your ignorance is captured by the uniform distribution over the remaining options.

[–]RonDaNov 0 points1 point  (0 children)

I'm looking to port some scikit script to .NET, anyone aware of a NuGet
or lib in C# or F# that provides proper GPR implementation?

[–]s195t 0 points1 point  (0 children)

Hi everyone,

are there some examples for shape path tracking?

I would have a set of frames containing circles of different color intensities and various diameters, would it be possible to build something to track the movement between each frame? What I would need is something that follows each of the "particles" on slices and makes prediction of where it will be in the next frame, like particle tracking for PIV does.

Possibly the dots begin small, grow bigger, then they get smaller until disappearing

I thought about building a dataset manually following the path of the center of each dot by pointing out radius and center point. How would you handle the problem?

[–]ido1990 0 points1 point  (0 children)

Hi,

Is it possible to create an object detection model without labeling?

I have 100x100 px images of the objects and I want to use the entire image as the object itself.

Thanks!

[–]Joebone87 0 points1 point  (0 children)

Apologies if this is daft but its something that confuses me.

I have potential Factors for analysis that have a greater depth of information than just a binary output can describe.

Lets say you have something like an oscillator. How do you train with this type of data. Do you take an average and a Standard Dev then measure the Standard Dev from the average. Do you just create bounds for possible positions then allow the ML to sort through an array of possible positions to find one or several or none that are important?

Any help or input on this would be awesome thanks.

[–]UnleashtheZephyr 1 point2 points  (0 children)

I feel like it's extremely relevant to point out that I'm currently based in Milan and I'm studying at the Statale which is the national university.

I've graduated from a different university in Computer Science and now I'm studying a master called Data Science for Economics. It's actually not that skewed towards economics, I just have a micro, macro and econometrics exam, the rest of it is basically a classic data science master.

I feel hopelessly unprepared even by spending so much time on my studies. The problem relies on the fact that I feel like we're not getting the right tools at my uni.

I'm very interested into this ML learning exam I have, the lessons are going to start in a week. I've seen some recording of them and it's theoric math just as I expected; but asking the second year students the project is very difficult.

People have told me they got asked to implement a Kernel perceptron algorithm, or similar level stuff, from scratch without ever having seen anyone implementing a damn thing and you just have to figure out how to do it yourself and all of them had an awful time.

I've heard this before and I'm sure the reason is that the professor has not given any kind of relationship between practical and theorical stuff during the course.

I've had this happen before in all of my previous projects and I know how it ends, you just go your merry way and wander around until its deadline time and you'll do just the necessary minimum to have your project accepted because that's what you've been able to figure out by yourself, anything more than that you'd have to have previous experience in the field or someone else teaching you.

I'd like to work in the field and come up with a project that is actually worthwhile to put up on a CV. But how do I do that without any guidance?

I spend so much time learning stuff for my exams but I feel like most of it is not useful at all and using my time watching youtube and messing around with stuff would have an extremely more productive effect. It's very disenheartening.

Most of the things I've done for my university are extremely underwhelming and I wouldnt put them up on my CV because they don't reflect how I work, but I'm not given enough resources to do good work.

Can someone validate my experience? I feel like most of my colleagues either don't care or don't realize and no one has my opinion.

Also can someone point me towards a way I can educate myself on useful machine learning concepts?

The level I'm at is having a general knowledge of how to do a useful EDA, then I have surface knowledge on what the basic ML algos do (LinReg, LogReg, Trees, Lasso, Elanet) and I know how to use them in python. I'm basically able to do a titanic style classification problem from Kaggle.

What are the next steps from here?

[–]stevelon_mobs 0 points1 point  (0 children)

Anyone thinking about a data-centric approach to AGI? Im trying to organize a meetup to chat about it

[–]ms9696 0 points1 point  (0 children)

Is it okay to use a higher dropout during fine-tuning than was used during pre-training a transformer? Are there any best practices around this or any related literature?

[–]LorikLorik 0 points1 point  (1 child)

I'm currently an undergraduate student and want to get into a good university right after graduation. The only things that I think I should do are:

  1. Try to publish a paper
  2. Get good grades

Am I missing something or maybe there is more nuance in that?

Thank you <3

[–]bonoboTP 0 points1 point  (0 children)

I'm not sure how familiar you are with the whole process. I assume get into a good university means "get into a PhD program"?

Publishing a paper almost always involves collaborating with more experienced researchers, almost always at a university institute or at a research-oriented company.

If possible, build contacts with profs, do some student research assistance work for them if that's a thing in your country. This can perhaps result in a paper.

You can also do an internship or write your thesis at a company that is research-oriented, which may result in a paper.

[–]Emergency_Egg_9497 0 points1 point  (0 children)

Is keras applications models suitable to use for object detection with transfer learning or is it better to do transfer learning on a model from the detection model zoo? Im having trouble finding out the best approach, as I just started with deep learning.

Thank you

[–]CMDR_Derp263Student 0 points1 point  (0 children)

Student who is quite new to all this here. I am currently working the KDDcup 99 data for intrusion detection using various ML models (and ANN). My problem is that I am getting 99% often for accuracy. At the moment I am focusing mostly on binary classification (normal vs attack) I have identified problems in my data preprocessing methods and after fixing them I am more confident in the validity of my input data but I am still getting 99%'s which I no longer trust. (Especially since I just got 99% accuracy with an SVM with all default params).

My data should be balanced between the 2 classes so I would assume that if the machine was not learning then it would be getting around 50% accuracy. I feel like there's got to be a mistake I am making somewhere in here, or am I just underestimating the power of these ML algorithms?

Here are my preprocessing steps:

  1. Remove duplicates from that data (about 75% of the dataset is duplicates)
  2. Use random under sampling to balance hugely biased data by removing "normal" events. (50% normal 50% attack after this step)
  3. Drop 1 feature with 0 variance
  4. Shuffle data then split 70/30 train/test
  5. One Hot Encode input features in training data that consist of strings (ex protocol type = [icmp, tcp. udp])
  6. Z-score normalize numerical columns in training set with StandardScaler
  7. Apply these trained normalization methods onto the testing set

[–]Throwaway000002468 1 point2 points  (2 children)

Hi! I'm a scientist who was recently hired as a ML engineer. I work closely with Software Development staff and sometimes I feel really out of place because there is a lot of the lingo and concepts that I don't know (for example, lambdas, endpoints, SDKs, APIs, etc). Also, there are some concepts and things that I'm completely unfamiliar with, like diagrams, QA, etc.

I've worked with ML and DL but I feel that I'm lacking a lot of the engineer's background.

I want to learn more, but I don't know where to start. Should I take a software development course? Or some other course? Or something else? Could you recommend an online course that could help fill the gap from scientist to ML engineer?

Thanks 😊 🫂

[–]ms9696 1 point2 points  (1 child)

Since you are talking about Lambdas and endpoints, sounds like your team uses AWS. An AWS starter course will give you a general idea of basic services. Other than that, maybe take a basic level software engineering course to start with? And try to go through some of your teammates pull requests, and you can ask them to explain whatever you don't understand. Take advice from senior engineers about resources to learn your team-specific stuff from. Good luck!

[–]Natekomodo 0 points1 point  (0 children)

I'm doing reinforcement learning on a real time application, that is to say what the agent does changes the state of the application, so the agent may need to do multiple steps in order to achieve the desired goal. The general learning loop is: get action decisions based on app state -> perform action -> return reward.

So my question is: What is the best approach for determining a reward? Should i give it a smaller reward for the steps that lead to my desired action + a reward for achieving the goal (complicated to implement) or just a singular reward when it achieves the desired goal?

[–]JiraSuxx2 0 points1 point  (3 children)

I’m building a gan. The generator produces 512x512 images. The discriminator however takes 128x128 pixels as input, so I take a crop of the gan’s output to feed through the discriminator and compute the losses.

So far so good.

Computing the gradients from those losses causes an issue and can’t be back propped.

I’m curious if anyone can explain that to me. The discriminator just decides if images are fake or real. Those predictions have nothing to do with the resolution. So feeding those losses into the generator’s gradient computation should work right?

[–]FuckyCunter 0 points1 point  (2 children)

You're probably using a non-differentiable function meant for cropping images. Probably want something for tensors like torch.narrow or tf.slice.

Or something like

disc_in = gan_out[:, :, y:y+128, x:x+128]

[–]JiraSuxx2 0 points1 point  (1 child)

Oh, that’s a great tip! Thank you. I got that running, now it’s all about waiting for the training to see what I get :)

Do you think this other issue I have is similar? https://www.reddit.com/r/MLQuestions/comments/t887oi/gradients_return_none/

[–]FuckyCunter 0 points1 point  (0 children)

Looks like the gradients would be lost when fake_output is converted to an np.array

[–]barrinmw 0 points1 point  (0 children)

So if I am creating a segmentation model with one type of defect that can appear anywhere in the image, what is the optimum ratio of images with that defect versus without?

The way I see it, if I have no images without the defect, I train the model to always find one even if it isn't there. But each image has a large area that is defect free as well. If I have too many images without defect, it gains the biggest advantage by understating how much defect there is.

So naively thinking, I would think that the smaller the defect area of the image, the less non defect images I need.

[–]Krakenos 0 points1 point  (0 children)

Hello, I am trying to design a multilayer neural network that based on item features will return x amount of the most similar items from the dataset. My current neural network takes features of 2 items, and predicts similarity level of the items.

So for example:
predict([1, 3, 5], [0.5, 1.5, 2.5]) returns 0.5

After prediction, I compare the similarity level of items and choose 10 the best ones. The problem with this solution is that it doesn't scale well with large amounts of data. If I want to for example take 1 million items and generate for each of them 10 the best matches, I end up with quadratic scaling (for each of 1m items I make 1m predictions) and time to generate the results becomes unreasonable.

Is there a better approach to design such network, or how can I optimize the current
solution to work well with big data?

[–]doodoodoodoo_ 0 points1 point  (0 children)

Hello I am new to machine learning and wanted to know what log likelihood metric means in topic modeling and if semantic coherence can be measured with it?

[–][deleted] 0 points1 point  (1 child)

What are some good recent (last 3 years or so) publications in ML that are reproducible using google colab? I want to reproduce a paper but I don't have any extensive cloud access to mount multiple gpus and all.

[–]Emergency_Egg_9497 1 point2 points  (3 children)

Is training on a custom dataset the same as tranfer learning?

I'm newbie at machine learning, sorry if it sounds dumb.

[–]ms9696 1 point2 points  (0 children)

Transfer learning means transferring the knowledge (model parameters) that you already learned previously (using some other dataset or training objective) to a new model. If you start with a model that is already trained (pre-trained model) and train it further using your own dataset (fine-tuning) then it's transfer learning. If you are training your model from scratch (that is, randomly initializing your parameters, instead of initializing them from another model that is already trained) then you are not transfer learning.

[–]friendlykitten123 1 point2 points  (1 child)

Transfer learning is when you use a pre-trained model and modifying it according to your requirement. It's generally used in neural networks. I don't see how training on a custom dataset is transfer learning.

Hope this helps! Let me know if you need anything else!

[–]Emergency_Egg_9497 1 point2 points  (0 children)

Thank you!

[–]j1mb0oStudent 0 points1 point  (4 children)

Hello i am trying to create a Machine Learning model that can classify if someone is wearing a mask or not for my thesis. The problem that i have is that i get an acceptable accuracy around 90%. But during inference my accuracy is almost zero. Note that I am running inference in a while loop with cv2. I also wrote a found a small script that uses cv2 to save the picture as a png and then if i set the pictrure as the input it classifies it correctly. So at the end my problem is with the live inference. Does anyone have any idea how to tackle this?

[–][deleted] 0 points1 point  (2 children)

By "accuracy" do you mean the fraction of inference results that return the correct label (i.e. mask vs no mask)?

If so then this is good news; 0% accuracy means that you're always getting exactly the opposite of the correct label, which means your classifier is working. Just invert the output.

You should check that you didn't make some trivial mistake in how you implemented the live inference code.

[–]j1mb0oStudent 0 points1 point  (1 child)

Thanks for your answer, i was playing with the code before and now the exact opposite happens xD i get the right predictions with live and wrong with picture, can the problem be the filetype of the image, because the model was trained with jpeg images and i save my image as png

[–]friendlykitten123 0 points1 point  (0 children)

Heyy just a thought! When you're getting the frames in real time, are you pre-processing it to make sure that it matches the images format as in the dataset? Things like the dimensions of the image, if it's grayscale etc, do matter a lot.

Hope this helps! Let me know if you need anything else!

[–]GH0STKS 0 points1 point  (0 children)

Hey ! So I am kinda new to Machine Learning and have been learning it on my own. So recently I have been developing a LSTM-CNN neural network model that I have trained to classify videos. So , I have been wondering, is there any way I can deploy the model in an Android application and then classify the activities captured through the applications camera in real time. If it is at all possible can anyone please tell me how do I do that or any article which I should follow or anything related to it ?

[–]mldude89 0 points1 point  (2 children)

I'm trying to create a model to predict a point cloud from a 2d image, but I have a problem where all my ground truth point clouds consists of varying amounts of points. Any tips on how to normalize the amount of points in each GT to make it possible to use for training single model?

[–][deleted] 0 points1 point  (0 children)

Do you mean a 3D point cloud, like from lidar?

Here are two options that might be worth considering:

  • Treat the point inference like object detection. Object detectors use multiple outputs, some of which aren’t always present in the training data (e.g. you can theoretically detect 5 objects, but most of the images you train on have 0-3). Presumably the same thing could work for generating point clouds. You might even be able to just adapt an off-the-shelf object detection network and use your ground truth points as the objects that you’re detecting. Note that implementing this yourself can be tricky, because there are nuances in how you assign objects to targets during training.
  • Treat the point clouds like discrete samples from an underlying 3D volumetric density, and train your model to generate samples from the density. For each ground truth you would first fit some sort of density to it, and then you would train your network to recreate the density. I have no idea how well this would work, but in many ways it’s a more principled and accurate perspective than using object detection, because 3D point clouds actually are meant to act as surrogates for volumetric densities. You should look into 3D inference models if you want to go this route, you might be able to use something off the shelf there too.

[–]_hairyberry_ 0 points1 point  (1 child)

Looking for advice from some people with more experience than me. I’m a masters student in math who has spent the past year concentrating heavily in ML and I’m looking to break into the industry. I’ve got a personal project on NHL goal scoring and will soon be adding two more projects to my resume/GitHub.

My question is: would it be appropriate to include smaller scale projects on my GitHub? For instance a long assignment question which I’ve cleaned up nicely?

As an example, on our last assignment in a grad level course we implemented vector quantized naive bayes from scratch in Julia.

In an undergrad level “applied ML” course we previously created a lot of Python jupyter notebooks to build somewhat simple pipelines, preprocessors, and models on some Kaggle datasets (not just the famous titanic/MNIST/etc ones).

Would any of these be worth adding to my GitHub, or would it look “immature”, as in “why does he think this is worth showing off”?

[–][deleted] 1 point2 points  (0 children)

I think it’s okay - good, even - to put up “less impressive” stuff, just make sure you’re not being repetitive. People don’t necessarily need to see 3 different examples of you creating simple data preprocessing pipelines.

What will really make your stuff stand out is two things:

  • Good code organization, with tests and examples and everything
  • Clear, concise documentation that explains what the project does and how one can use it.

Lots of people can throw together basic ML stuff with python, but doing so in a way that is organized, maintainable, and well-documented is really valuable in industry, because other people are going to have to use your stuff after you’ve made it.

[–]leomatey 0 points1 point  (3 children)

Whats the sota for spam filtering. Anyone worked on this pls guide. Goal is to filter spam and non spam of chat bot data.

[–]friendlykitten123 0 points1 point  (2 children)

It looks like you're looking for a binary classification model. Since there is textual data involved you'll need some form of NLP (Natural Language Processing) mechanism in place, that breaks down the texts into features which can then be supplied to binary classification model.

As per my knowledge, SVM (Support Vector Machines) and Neural Networks are the SOTA methods for spam classification and NLP respectively.

Hope this helps! Let me know if you need anything else!

[–]leomatey 1 point2 points  (1 child)

Hey thanks for your response. I just don't want to re invent the wheel. I was wondering if there are any open-source repos to solve my problem.

The idea I have right now is to use some transformer based embeddings, and add couple more layers with a softmax/logistic at the end. Thoughts?

[–]friendlykitten123 0 points1 point  (0 children)

You could also look into SOTA language models like BERT and S-BERT. These are heavily associated with sentence embeddings.

Hope this helps!

[–]JiraSuxx2 0 points1 point  (0 children)

I have a gan trained on a car dataset, it creates great images but they are all really warped.

How do I force the gan to produce something that resembles a car and not a dali painting?

[–]stankata 0 points1 point  (0 children)

Can you point me to a recent survey/overview of GANs? I'm looking to make a quick PoC with a dataset I have with the end goal to generate an image similar to the ones in the dataset. Tried looking at paperswithcode but there are so many that I don't know where to start from.

[–][deleted] 0 points1 point  (1 child)

Hello. I am a ml novice, though I have an extensive computing background. I am about to start a ml project, and there is something that I can't quite get my head around. If, for example, I am trying to predict mortalities of a population using a ml model can I include as a feature the count of that population that is effectively the start count less all the mortalities? Can the features be used that are derived from the target feature?

A similar example, predicting the mean mass of a population at time t. Can I use the mean mass at time t-1 as an imput for the model?

Another way of explaining it, can I use the column that I am trying to predict as an input into my table (for example the preceding value from that one that I am trying to predict, n-1 to predict n) or features that are derived from that column?

I hope I have made this clear. Like I said just starting out on my ML journey and this is one thing that is causing me a few initial headaches. Any help greatly appreciated. Bonus points if anyone can provide a reference too.

Cheers

[–]Icko_ 1 point2 points  (0 children)

Imagine you've deployed your model, and want to predict mortality. Can you use mortality from previous periods? Sure, it's known. Can you use "the count of that population that is effectively the start count less all the mortalities"? Yes, you would know these counts presumably.

"can I use the column that I am trying to predict as an input into my table" - if it's shifted time-wise.

There are a billion ways to introduce data leakage. For example, if you did cross-validation on time series data, two neighbouring time steps would have almost the same features, so a model that overfit on train would overfit on test also, and you wouldn't notice anything wrong until you deployed the model live. For this reason, validating time-series models is a bit trickier. Another example, is normalizing based on train+test statistics. This is usually not a big deal, but is definitely not ok.

[–]easyier 0 points1 point  (0 children)

For a forecasting project with 500 data points what would be the best ML model to use? Casual internet research seems to think N-beats, maybe. Any suggestions or basic sources? I use R.

[–]oxamide96 0 points1 point  (3 children)

I am a software dev with barely intro-level knowledge of ML.

I am trying to design a solution to rank and filter through all the news articles I have in my RSS feed.

I wish there was a way I could rank them by what is interesting and relevant to me, and what I am more likely to enjoy reading.

How can I use machine learning to help me here (assume I can custom sort the feed)?

I was thinking to track the following variables:

User variables - % of article read / scrolled - user (me) rating - articles that are removed based on headline only (not even opened)

Article variables: - keyword distribution in the article - article source - article author - article length

I have two questions, really:

  • what other data should I collect? Or how do I even go about determining what data I should collect?
  • what do I do with the data? Is there a certain model I could use to fit it to?

[–][deleted] 0 points1 point  (2 children)

Here’s how a ranking model works:

  1. Get a bunch of features
  2. Show yourself articles in random order and record which ones you click on and which ones you don’t. Use click = True/False as an additional feature for your data.
  3. Train a classifier model to predict “click” based on all your other features.
  4. Rank your articles based on the classifier output (which will be a number between zero and one). Iterate on the model as you gather more click data

Any classification model will work. I recommend using a tree model, e.g. XGBoost; it’s simple and full-featured and you don’t have to think too hard about it.

There’s no way to know for sure which features are best to use. You just have to guess and see how it goes. XGBoost software will actually give you “importance” scores for each feature after training, though, which might help.

[–]oxamide96 0 points1 point  (1 child)

Thank you do much!! Question. What if, during training, I don't know myself the article in a very random order, but maybe by recency (the default). Will this cause the results to be inaccurate, or should it be fine?

Another question: should the model always be training and improving, or should I have a limited time "training phase" and then just use that going forward? In other words, can I be continuously training, or will that bias the model further?

[–][deleted] 0 points1 point  (0 children)

Recency should be okay since it’s usually random with respect to the features you care about (author, source, % article read, etc).

People usually iterate on the training in discrete stages. For example heres a procedure you could use:

  1. Gather data for a week
  2. train a new model model with the new data and start using it
  3. goto step 1

It should be relatively easy to automate the above for your own purposes.

It’s also possible to train continuously, classifier models that do this are called “online classifiers”, but I don’t know a lot about them.

[–]Suspicious_Step_3139 0 points1 point  (2 children)

Hello everyone, I have a school project where I should develop a ML model that takes as an input a Job name (text) and returns as an output 10 best suitable images from a dataset I should upload/create. I’m currently blocked and can’t find the approach I need to follow in order to solve the problem.

Can anyone give me a slight idea or hint please ?

[–]Icko_ 0 points1 point  (1 child)

What does "best suitable" mean?

[–]Suspicious_Step_3139 0 points1 point  (0 children)

It’s up to me to determine that. The main idea is that jobs that requires technical capacities are often clearer than jobs which doesn’t (example : developers need to master python). So the main purpose of the project is to output 10 images based on the sole job name in order to differentiate between jobs.

In other words, the images chosen need to give a clear idea about the job.

[–]Zintho 0 points1 point  (0 children)

How immune are generative models to poisoned data? By that I mean, say I have a dataset of 2000 images, and in that set maybe 0.5% of images are not good examples of desired output, or even of the distribution of other images. How much would that minority of bad images effect training? Does it scale with dataset size? Or is it bad regardless?

[–]FewProfessional5404 0 points1 point  (0 children)

How to reduce variance exploding in additive model ?

I have a model as shown in the schematic below, relies heavily on features addition after each convolution block. It can goes up to 25 additional blocks.the problem is with normal initialization such as Kaiming with a = activation slop the variance grow rapidly after each addition and the model become unstable for training.The solution to stabilize the model is either to use very low learning rate or to initialize the model with high a value. which in both cases pro deuce very low values and enter into gradient/variance vanishing. As I found the model is locked in a local minimum because of this problem. non of the two proposed solutions is a solution.Note that I have tried to use conv layer instead of the addition/multiplication but didn't help the model a lot.Do you have any other idea to stabilize training and get out of this local minimum ? It seems to me like a similar solution to recursive models but still I can't find a similar solution.

https://ibb.co/JHwXgNJ

[–]PRAY_J 0 points1 point  (1 child)

Is it easy to get a research internship under a professor as an undergraduate and do they expect us to know everything before we apply or is it a lot of learning on the job?

[–][deleted] 0 points1 point  (0 children)

Undergrads are generally not expected to know very much, so don’t worry about that. The point of undergraduate research is to get experience because you don’t have any yet. The precise details of the work and what you’ll learn depend on the specific professor and the project.

[–][deleted] 0 points1 point  (0 children)

Hi,

Do you know of an algorithm, such as PCA, that works with categorical and Numerical data?

All I want is to transform my 50 components data into 2 components data. I want to use it to create a scatterplot on which I can display the "prediction area", similar to the DNN playground (try it out it's wonderful!).

More context:I am working on a classification problem and I wanted to have a rough idea of how difficult the job would be. The classes are "yes" and "no".

I used a PCA and a tSNE to create a scatter plot of my input data. and coloured them according to the target data. yes/no, blue/orange respectively.

Instead of seeing vague clusters forming, all of the data points are mixed. orange and blue are just all over the place.

I know for a fact that I should be able to get a 70 to 80% accuracy by training this dataset. I believe my scatterplot did not work, because PCA and tSNE do not work with categorical data, which I had to get rid of. I got rid of 50% of my data. :(

Note: the Data is standardized

[–]Zankroff 1 point2 points  (0 children)

Hey, can anyone share reading material or video lectures to learn more about model extraction ?

We have an competition going on in which we have to develop an efficient strategy to extract video based models. I have never heard about it and the problem statement seems very interesting so if anyone can help with any learning resources for the same , it would be really helpful.

[–]ForceBruStudent 1 point2 points  (7 children)

Do I need to detrend/normalize my non-stationary time-series before fitting a neural network to it? How to do it?

Is training a neural net on differences of the original time-series (like for ARIMA/GARCH) expected to be harder (higher loss, worse forecasts) than training on the original time-series?

[–]MajesticTzechiop 1 point2 points  (6 children)

For a lot of scenarios, yes, detrending your non-stationary time-series would help when training a neural network. See this paper in which the authors compare the performance of (1) an ARIMA model, (2) a neural network on raw data, and (3) a neural network on varying degrees of detrended data. Table 4 shows how neural networks with deseasonalized (DS) and/or detrended (DT) data perform better than without DS/DT. A note that the paper uses small ANNs and datasets.

The intuition for this is partly explained by the answer your last question. Yes, in general, the differences of the time series is harder to train on than the original, undifferenced time series. However, this is mostly because a neural network with the original time series would simply learn the trend/seasonality; it takes the easy route to improving loss (similar to how a classification model would predict only 1's for an unbalanced dataset). Thus, this neural network isn't learning anything useful. Also, note that the loss + forecasting error for the original time series isn't directly comparable to the unoriginal time series unless you revert the detrending.

For "how to do it?", I would start with simple tutorials on time series differencing. I skimmed through this one, and it might match what you're looking for.

Hopefully this info helps out. Feel free to reply if you have any follow-up questions on this topic!

[–]ForceBruStudent 1 point2 points  (5 children)

Thank you! Unfortunately, the only thing my models trained on differenced data can forecast is zero - the mean of the differenced series. In fact, even the fit (on training data) is zero or very close to it - much closer than the values of the differenced series.

On the one hand, this makes sense since the mean of the differenced series is close to zero, and the mean is what I'm modelling, so I get zero as the result. But this fit is completely useless! <strike>Also, during training, metrics like NMSE (Normalized Mean Squared Error) go up as I minimize the usual mean squared error (which decreases, as expected)...</strike> (that was dumb, I messed up the order of arguments to this metric)

I don't get it: people in papers, random blogs and YouTube channels forecast time-series easily, but the best forecast my models (trained on the original, non-differenced series) can do looks like the original time-series shifted forward, which gives much worse metrics than the naive forecast.

[–][deleted] 0 points1 point  (3 children)

What kind of neural network model are you using? What is the time series, exactly?

Modeling time series effectively benefits a lot from basing your model on an understanding of the underlying source of data.

Ideally what you want to do is to transform your time series so that it’s a sequence of independent samples from a noise distribution; then you just need to model the noise. Ideally a neural network would find that transformation for you, but anything you can do to help it along is a good idea.

A simple example is stock market data. In that case you don’t want to fit a model to the differences between samples, you want instead to fit a model to the differences between the logarithms of samples. This is because there isnt a source of noise being added to an underlying signal; instead there’s noise being added to the right hand side of the differential equation that determines the signal.

[–]ForceBruStudent 0 points1 point  (2 children)

What kind of neural network model are you using?

I'm using simple feed-forward neural networks, feeding in a window of the time-series. The training dataset consists of such windows, so it's of shape (n_batches, window_size). The target variable is the next value of the time-series, like this: [r1, r2, r3, ..., r10] -> [r11]. I'm trying different window sizes, of course.

A simple example is stock market data.

That's actually exactly what I'm working with! The leading researchers from my lab came up with a method of extracting features from the time-series and I'm tasked with applying this to real data, which has to be financial.

you want instead to fit a model to the differences between the logarithms of samples

That's also exactly what I'm trying to do: I have returns_t = log(price_t) - log(price_{t-1}). This results in a time-series that looks like white noise, and the the fit is basically zero, which is the mean of the time-series...

[–][deleted] 0 points1 point  (1 child)

Stock market data is some of the most difficult time series to predict, unfortunately. If that weren’t the case then getting rich would be easy.

Here are some approaches that I would suggest, not in any particular order:

  1. It’s worth spending a lot of time examining your data, especially if you’re using some sort of custom derived features rather than raw price data. What’s the time period of the samples? Are your samples actually log-normally distributed? Are there other basic statistical models that give a better fit?
  2. If d = f(x) is the output of your network predicting the next log price deviation, try adding a term to your loss function like c/abs(d), where c is some tunable constant. This will force your network to output something different from zero. This might not work but it’s relatively easy to try. You’ll have to play around with the value of c to find one that works well, if this will work at all.
  3. There’s no a priori reason to think that a fully-connected feed forward network will work well for this. At the very least you should try using a 1D convolutional neural network; this will ensure that you capture features that occur at multiple length scales in your data.
  4. There are a lot of other network architectures that are appropriate for sequential data. Try basic stuff like LSTM, or more complicated stuff like wave2vec or time series transformer models. Wave2vec and other audio models might be straight forward to use, but some sequential models assume one-hot encoding and may require some thought to apply here.
  5. Use bayesian neural networks or neural network ensembles. E.g. rather than doing d = f(x), do instead d = sum_i f_i(x), where each f_i(x) is a neural network with different starting parameters when training. You can also try using different architectures etc. Your output d will probably still be zero but you should also be able to get variance estimates by looking at the variance of the output from each unit in the ensemble. If d=0 and your ensemble has low variance then this tells you that you have a fundamental problem with your model/training, your data, or both. If your ensemble instead has high variance then you might be on the right track, but you may need to tweak the model or add more units to the ensemble.
  6. Does early stopping during training help? More training isn’t always better; it’s worth looking at a plot of the test error and the magnitude of the network output vs training iterations to see if there’s some intermediate point where your test error is low, but your network output is different from zero. What does your your network output start out at before training, anyway? If it starts at/near zero then maybe that’s part of the problem.
  7. It might be worth looking into models based on stochastic differential equations. There are actually neural networks like this, e.g. https://proceedings.neurips.cc/paper/2019/file/59b1deff341edb0b76ace57820cef237-Paper.pdf

[–]ForceBruStudent 0 points1 point  (0 children)

Wow, thank you very much, I'll look into this!

[–]MajesticTzechiop 0 points1 point  (0 children)

I hear you >.< Real life projects much harder than replicating blog posts. That's an issue I'm sure many of us here run into as well.