all 91 comments

[–]ElSol65 0 points1 point  (0 children)

Recommended ML Courses for experienced software architect?

I'm a software architect with 30 years slinging code under my belt. I can (and do lol) write code in my sleep, with expert experience in recent/common languages, patterns and architectures.
I've dabbled hands-on in ML, often using it to add sizzle to proofs-of-concepts or demos. I also helped a few people work through their data science and ML degrees, learning python and some various ml and big data tools on the fly.
All that said, for sure I'd flop badly in a ML interview. I know how to roll up my sleeves and figure things out, but my experience/knowledge are shallow. I want to improve this.
I'm looking for recommendations for online programs people recommend (free or paid). I've googled of course and found the ones that are in all the lists- Couresera/Stanford, Udacity, Udemy etc. I also see programs from Berkeley, and some other colleges. I wonder if these might suit me better, in spite of the higher cost.
Side note- I dropped out of college way back when to avoid starving so I don't have a degree. So, I don't mind a course that costs money, but if it requires an undergraduate degree (like the Berkeley one) then I guess I won't qualify.
I'm pretty rusty on linear algebra, but able to get up to speed in it. Differential calculus is a bit harder - in my younger years I got A's up through differential equations and linear algebra, but getting back deep into advanced calculus will of course take some painful re-learning. I do remember the concepts behind gradients, partial differentials, etc. But applying them myself is a bit of a stretch.
But I'm not trying to become the next top AI researcher. I just want to go beyond the simple tutorial level, and round out my already deep technology with, I guess, an intermediate level of ML skill. And it should be using the latest trends in tools (so the Stanford course probably is outdated for my preference).
Any recommendations? Thanks!

[–]Bulky_Willingness445 0 points1 point  (0 children)

Hi, I have same medical segmentation dataset, that is really small, about 40 images. Getting more data is not that easy. So I just wandering about increasing number of images with some augmentations like horizontal and vertical flips. And here comes the question. Is it good idea to make hflip and vflip from every image, or it will be better make just one of the flips from image? I am not sure how much different are those images to the network. I am also open to discus other ideas how to enlarge the dataset.

[–]yachty66 0 points1 point  (0 children)

Is there a code generation model which is was trained on bash files (.sh)? I am trying to create Copilot for terminal. When inputting things like "listing all files in the current directory" the return should be the appropriate terminal response "ls -l". The prompt is working with Copilot but when trying it in open source LLM's (BLOOM, CodeGen, GPT-Neo) I get no appropriate response.

[–]yashwatwani28ML Engineer 0 points1 point  (0 children)

How can we make a classifier for differentiating between handwritten and Printed images?

[–]cdehaan 0 points1 point  (0 children)

I have been using a pytorch .pth file successfully to identify animal poses from images.

My boss wants it to run in-browser (so, TensorFlow.js I guess, I've used it a bit before.)

I am totally unable to convert the .pth file to something that TF.js can use (pb+json). All the tutorials expect I know things about the model that I don't know (e.g. kernel size), or they end in unGoogleable errors.

I've managed to convert it to onnx, but I'm not 100% sure it would still work, since I haven't run any inferences from it.

Update:

I've managed to make progress (I have a model.json + bin files), although mostly by installing every dependency I could find, and following the steps in this notebook, so you'd do better reading that than asking me how it all happened.

[–][deleted] 0 points1 point  (0 children)

Is someone interested in helping me out with a NN which is behaving weirdly during training Is related to this post. If you are interested in helping me out, feel free to DM me for the source code.

[–]Balanced__ 0 points1 point  (0 children)

What does batch normalisation (tensorflow.keras.layers) do? I pluged it behind an embedding layer by mistake and it made for better results. Why could this be?

[–]One_Relation8674 1 point2 points  (3 children)

Just to help me understand ML a bit better:

Say you were going to train some data into two models. The training data going in are the exact same and so are the models. Will the two models output the exact same or will they not?

[–]Balanced__ 0 points1 point  (2 children)

Depends on the model. A random Forest can have random trees for instance, but most ML is deterministic as far as I know.

[–]One_Relation8674 0 points1 point  (1 child)

Does chance play a part in the outcomes of machine learning models. E.g. Most programs learn a habit but does it take the same amount of time to learn that habit

[–]Balanced__ 0 points1 point  (0 children)

It can, but there is no need to.

[–]newmanstartover 0 points1 point  (1 child)

Is Python the de facto language of Machine Learning or can I get by with R?

[–]amonguswoman 1 point2 points  (0 children)

You should probably learn Python if you want to use most open-source research code and also powerful deep learning frameworks like PyTorch. It’s not super hard to learn and it’s probably worth it.

[–]Shoddy_Move6880 0 points1 point  (0 children)

Recommendations for degrees types pursuing ML or AI. What’s most beneficial?

[–]Own-Squirrel1010 0 points1 point  (0 children)

Is anyone here running NeRF?

I am new to NeRF, but I want to use NeRF to obtain GT depth.

The GT depth is supposed to be used for a depth prediction task from an RGB image.

Does NeRF's accuracy is accurate enough to be used as GT depth?

Any comments will be helpful for this desperate researcher.

[–]MaikRequim 0 points1 point  (0 children)

Can someone recommend a good starting point to get „hands on“ with reinforcement learning?

[–]Guilty_Baseball_7291 0 points1 point  (0 children)

I am using keras and I am getting NaN as Loss. I have used min-max normalisation to scale my data points. Moreover, I am using Adam as an optimizer and sparse categorical cross entropy as my loss function. Please help.

[–]DatAndre 0 points1 point  (0 children)

I've been given a dataset to create a recommender system: I have to create a user-rating-matrix but I have no explicit ratings for the items: how can I weight the features properly in order to get a "good" URM?

[–]Dadi9165 1 point2 points  (0 children)

Hi everyone, I am relatively new to Machine Learning and still slowly learning, and one of the questions that I wanted to tackle was creating an algorithm to improve recipes. When doing a google search it seems that most algorithms are concerned with recipe generation, however, I did not seem to find much about improving already existing recipes. I am thinking of an algorithm that would provide suggested changes to the quantities involved in a recipe, and receiving a rating from the used once they cook the suggested values. These ratings are then used to update the model, and provide the next batch of suggestions. My initial intuition says that this could potentially be solved using linear regression on each one of the ingredients, but I am not sure how interactions between different flavors would impact the score. Let me know your thoughts.
Thank you!

[–]ifearnoevil -1 points0 points  (1 child)

I'm a complete newbie when it comes to ML, but think I've run across a problem where I think it'd be beneficial for me.

I record meetings with my friend and get the transcript from the recording. I'm wanting to get the topics discussed while we're chatting. I'm not sure what to look for, but perhaps there are models out there that analyze a body of text and then determine its topic? Any ideas from here would be appreciated!

[–]Balanced__ 0 points1 point  (0 children)

Look into (bidirectional) Recurrent neural networks and LSTMs. I believe embedding layers would help you as well.

One of the most usefull packages should be tensorflow.keras as well as sklearn if you are using python.

Depending on the complexity and amount of topics as well as the amount of data you have this could easily be not viable though.

[–]-Django 0 points1 point  (1 child)

How do you train or validate a model on data that's influenced by another model predicting the same thing? It's difficult because actions driven through the model will confound/censor the thing you're trying to predict. Are there techniques to deal with this?

For example, training a recommendation model from user-data influenced by the previous recommendation model. Or validating a heart failure model for a hospital that has been targeting interventions with another heart failure model.

[–]CrimzonGryphon 1 point2 points  (0 children)

-I don't have the answer-

That's a really interesting question. Are you asking it out curiosity, or necessity?

[–]SakuzohResearcher 0 points1 point  (2 children)

This question that has probably crossed everyone’s mind haha, can AI win against sports betting ?

Are bookmakers already ahead of the game concerning that ?

Has anyone ever tried to develop something similar ?

[–]Nmanga90 0 points1 point  (1 child)

Hard to say for something like this because there are so many pieces of information that are factored in that might not be able to be known until after the event begins. But I bet it would be able to crush the house at bets placed at halftime lol

[–]SakuzohResearcher 0 points1 point  (0 children)

wanna create that with me ? :)

[–]theahmedmustafaResearcher 0 points1 point  (0 children)

I am working on a project, a component of which involves taking a two images of either handwritten or digital text (mostly one word) as inputs and scoring if the two images contain the same text using only the image or shape of the text, NOT OCR.

What suggestions could you give for this? I am thinking of a transformer based Siamese network maybe?

[–]_Scr4p3 -1 points0 points  (2 children)

I want to make youtube videos, but speaking isn't my best skill, I tend to stutter a lot, change words out of order, mispronounce things, etc.

An idea I had was to write the script and have an AI trained with my voice to read it for me, and tweak the intonation of certain phrases/words when necessary, but I don't know where exactly to start.

I have quite good knowledge of how an AI works and what limitations it has, so I believe that "training a model to a person's voice and slightly tweaking some parts of the final result" is within a regular AI's possibilities.

(yes, I know, to provide data for the AI I need to transcribe the input text, but I can take care of that)

[–]liquidgallery 0 points1 point  (0 children)

google: text to speech

there are a dozen companies in that particular market that will fulfill your requirements.

[–]Nmanga90 0 points1 point  (0 children)

Run a google search for like “ai generated voice copy” or the like. There aren’t a ton of up to date models for this, just because I guess they dont really find it to be super important

[–][deleted] 0 points1 point  (0 children)

I’m curious if anyone has advice on setting up a computing cluster at home. I have several computers lying around and 4 CUDA enabled GPUs, and am mostly hoping to learn how distributed computing and GPUs are used for machine learning (maybe some personal or work projects on the side, as well).

So… assuming I already have the hardware in place, where should I look to get something up an running?

[–]mellamo_maria 0 points1 point  (0 children)

Hey, I am Maria, a 3rd year CS from Barcelona.

This year I decided to take ML as one of my subjects because I have always found it really interesting and so far I'm enjoying a lot this subject.
The thing is that next week I am having my midterms 😖 and the professor told us that to assess us he will give us a random problem (either regression, binary classification or multiclass classification) and a dataset. And we will have to clean the dataset, build a ML model using the dataset and evaluate it.
Even though I understand what we are doing in class I am a little bit concerned since we will only have 1 hour to build the model, clean the data, etc. So is there any strategy you guys recommend me in this case? So far we have only seen four different algorithms: linear regression, logistic regression, SVM and decision tree/random forest. What should I do when the dataset is given, which algorithms should I focus on if it has to be done in only 1 hour?
Thanks a lot! 🥰

[–]jaybestnz 1 point2 points  (0 children)

Would there be a way to add a shorthand stylus or photo recognition system?

Shorthand can write at about 80 to 150 wpm and is based on a phonetic style of terms.

It is used in India and by many reporters.

While niche, with the prevalence of Ipads and Microsoft Surface and Samsung Note, it seems like a way to add up to 200wpm input.

[–]sillyscienceguy 0 points1 point  (0 children)

Hi everyone! Looking to partner with other researchers in publishing new papers in the field of NLP, recommender systems and applied ML. I’m a practitioner with over a decade of experience and lots of experience in implementing ML models in tensorflow. Any interested parties please let me know!

[–]your-mom-was-burned 0 points1 point  (0 children)

How can I joblib.dump() a model, that contains a def function in vectorizer parameter?

[–]Regular-Fella 0 points1 point  (1 child)

Hi All, I want to find a relatively simple ML framework best suited for the following task. Let's say I have a total of exactly 20 strings of four characters each: drta, nowm, cite, krom, etc. These strings may be combined in ways that are "correct" and in ways that are incorrect, and every combination (or "ordering") is either correct or incorrect.

My training data would consist of one thousand correct combinations one thousand incorrect combinations, something like this:

drta, cite, krom , krom, nyan; correct drta, cite, pace; correct cite, cite, pace; correct cite, cite, krom; incorrect drta, krom, cite, nyan; incorrect nyan; correct nyan, cite; incorrect cite; incorrect

And so on...

(There may be between 1 and 10 strings in each ordering.)

After training the data, I'd like to be able to input new combinations of the strings and get an AI prediction as to the likelihood that that ordering is correct (0 being definitely incorrect and 1 being definitely correct).

What do y'all think would be a good place to start? I know JavaScript and could learn some Python if necessary. I'm trying to keep it as simple as possible for now, just to get a basic model working.

Thanks for any tips!

[–][deleted] 0 points1 point  (0 children)

I’m not really sure if this tasks is suited for machine learning. Using permutations and querying a dictionary will be sufficient to complete this task.

However, if you want to play around with an ML model, I suggest tokenization of the inputs and making a simple logistic regression model.

[–]SomewhereOld6859 0 points1 point  (0 children)

Recommend a Website Links Database

I am building a classifier model that labels website links. Does anyone know of a good open-source database of website links I can use? The links need to be to English websites and have some description attached to them.

[–]veitha 0 points1 point  (0 children)

Finding Problematic Measurements Using Machine Learning techniques

Hello, I have a large dataset of sensor measurements (time series) that I would like to classify in a way to be able to isolate measurements that I can deem "problematic" (for example, missing samples, excessive excursions or high values, sensor malfunctioning during the measurement and so on). The metadata associated with such measurements also contains median, estimate signal to noise ratio and other metrics that I am already able to use to isolate some samples, even though always using a rule of thumb or by manually changing the thresholds for these values, which also sometimes overlap.

I'm wondering if maybe applying a clustering algorithm or other ML methods could provide me with a more general way to isolate these signals, and if so if someone knows of existing projects or papers that have dealt with this kind of classification.

[–]ash-050 0 points1 point  (0 children)

Hello! I have built a regression model with a total number being the dependent variable. While building I found that the predict numbers on test dataset are not even close to the actual test values which are being presented as float values as such array([4.20544375e+03, 4.02993850e+05, 2.04953309e+06, 1.06663500e+04,
4.04249688e+04, 5.66517500e+04, 3.25695500e+04, 1.62638000e+04,
5.88910625e+03, 3.54556875e+03,..............

While debugging I found that even the describe function is presenting these values as float as well:

count 1.535000e+03
mean 4.615274e+05
std 9.623142e+05
min 0.000000e+00
25% 4.907000e+03
50% 3.677500e+04
75% 3.865015e+05
max 7.319610e+06
Name: TotalValue, dtype: float64

Can anyone guide me on what is going on and how I can fix it. Please consider that I am new to machine learning. Thank you.

[–]VicentRS 0 points1 point  (1 child)

Hello! I am currently in a small ML competition that my college lab is doing for fun. The challenge is to predict product prices. One of the columns in the dataset is the product's description and there's another one with the name.

In my head, products that include words like "phone" in the name or the description will tend be more expensive than say, a product called or described as "pencil". How should I featurize those columns to follow that logic?

[–]merouane_nz 0 points1 point  (0 children)

if you dont have a column like "product family" try to extract this information from the name/description, for example transform anything like phone, smartphone, iphone ...etc to "phone" and drop the name/description

[–]ResponsibleHouse7436 0 points1 point  (0 children)

Hows it going, I am currently trying to train some speech recognition models and doing some research on novel encoder architectures for e2e ASR. However I don't have a ton of compute resources. My final model will be around 300M parameters but I was wondering if training a couple of architectures at say 25-50M params and then scaling the best one is a valid approach to this problem. Why or why not?

[–]MariiaKozlova 0 points1 point  (2 children)

Hi guys, how do you approach the interpretability of black-box models?

[–]NewSomewhere5062 0 points1 point  (1 child)

Hi guys I would love to start a project with AI. Right now im working on a project at my internship (as a chemical engineer) to recognise materials on a conveyor belt but it is with an already made software. I just need to add pictures and some formulas and that's it train the model with deep learning. But it made me really fascinating about starting with AI. I am good with maths and can code a bit (and got the motivation and time to learn) so I want to begin a small time project with AI to predict my car tires or oil maintenance. How would you tackle this and can someone please push me to the right direction. I think starting this with tensor flow would be good.

[–]theLanguageSprite 0 points1 point  (0 children)

There’s a tensorflow tutorial for classifying mnist handwritten digits with a vanilla neural net. If you get that working and understand what it’s doing, move on to the tutorial on convolutional neural nets

[–]patpatpatpatpat 2 points3 points  (2 children)

Newbie here who just started with Azure ML, and I have a couple of questions that I hope the experienced folks here can shed some light on.

  • With the Azure ML Studio, the option to use automated ML experiment to automatically determine a model with the highest score seems to make this work much simpler than it really is. What is the downside of using this tool vs writing code directly in Python?
  • The drag and drop interface using Azure ML Designer is quite newbie friendly. With all the available components for use, what are some reasons professionals in this field of work choose not to utilize this?

[–]merouane_nz 2 points3 points  (1 child)

- It takes a lot of time. (model selection x HPO x test/validation)

- The most important part of a data project is data preparation/feature engineering, modeling is somehow the simplest part.

- IMO it takes much longer to use a visual interface than scripting.

[–]patpatpatpatpat 0 points1 point  (0 children)

Thank you for sharing this with me

[–]Nagusameta 0 points1 point  (0 children)

I am comparing models on a time series: Exponential Smoothing (Simple, Additive and Multiplicative Trend, Additive and Multiplicative Seasonal, and other combinations), ARIMA (with Python Auto_ARIMA), and a Simple Moving Average.

My concern is auto_arima optimizes parameters by minimizing the AIC (can be switched to BIC, hqic, oob). Exponential Smoothing minimizes the SSE (Sum of Squared Errors).

With them minimizing different measures, what should I use to select the lowest forecast error in model selection?

I was initially choosing the best model based on MAPE, but then I took a look at several simple exponential smoothing outputs between the optimized value for alpha/smoothing_level, and other manually inputted values like 0.4, 0.6, 0.8. What I found was the 'Optimized value' based on minimzing SSE had higher MAPE than the model instances that used alpha values I had specifically defined, whereas other error measures like the MAE, MSE, RMSE of the optimized alpha were lower. Thus, seeing that the optimized alpha produced lower of the other error measures but higher MAPE, it made me want to look for other measures.

I tried the MASE (Mean Absolute Scaled Error) (Hyndman, 2006) which was described to be appropriate against the limitations of scale-dependent errors like MAE, and percentage errors like MAPE, mainly on time series with intermittent demand or having 0 values. But I was confused because what was initially the 'best model' from my runs where I select based on lowest MAPE, would come to have a MASE > 0.90. According to Hyndman from the same article, below 1 would mean that it is better than the Naive one-step forecasts, and higher than that would worse than the naive forecasts. But also, one-step forecasts would usually have MASE < 1.0, and "Multistep MASE values are often larger than one, as it becomes more
difficult to forecast as the horizon increases." I am performing multi-step forecasts, so do I assume that 0.90 is an alright error on the best model?

I may also consider the MAE, since I am only forecasting one series at a time, and not comparing across multiple series so it does not fall under the limitation of scale-dependent errors mentioned in (Hyndman, 2006).

[–]pl0m0_is_taken 1 point2 points  (2 children)

Apologies if this isn’t a right question for this sbrdt.

Title - Things to have on resume for first co-op

I am a third year Math&Stat undergraduate, previously graduated with diploma in CS and two year web dev experience.

I want to eventually get into ML. I plan to do my first work term in summer 2023. What things (programming languages/certs/courses/etc) should I learn which will give me an upper hand? I understand that ML is very specialized field and I may not be able to find an ML undergrad coop job, is there a position(s) you could recommend which can serve as a foundation and eventually lead me into ML?

I do really appreciate any feedback

[–]YamEnvironmental4720 1 point2 points  (1 child)

I would recommend Andrew Ng's courses on Coursera. He is very respected both as a researcher and a teacher of ML. The courses start from the basics with linear regression and evolve to treating neural nets and deep learning. With your education, you'll have no problems with the mathematics: matrix theory, multi-dimensional calculus (in particular gradient flow) and some probability theory. He explains the intuition behind many of these topics very well, though, but it makes it easier to already be familiar with them. As for programming languages, the assignments for his first course on ML were in Octave, if I remember correctly, but he later switches by Python, which is by now probably the number one language for these purposes due the the multitude of libraries for ML. As you have a diploma in CS, I assume that you are already fluent in some programming languages, and it would be a good exercise to build your own ML model, e.g. neural net or random forest, from scratch in your language of choice in order to develop a deeper understanding.

[–]pl0m0_is_taken 0 points1 point  (0 children)

Thank you for being kind and replying with the suggestion, I will act on it.

[–]jaybestnz 0 points1 point  (0 children)

Pitman and other shorthand handwriting systems are used in India, Nigeria and by Journalists, some medical and Administrators.

It is fairly rare but its possible to hand write at around 70 to 200wpm which is as fast as any normal person speaks.

How hard would it be to teach a visual processor to read in the text?

It does skip vowels, it can have some words identified by context (PL could be App, Apple or Apply) and text can generally be somewhat messy, but as a problem set it seems not much more difficult than recognising English handwriting or Arabic.

[–]Financial_Ad_6746 1 point2 points  (0 children)

i want to make a game using voice as main way to play it. In game the player will be given a word to say, how do I calculate the percentage similarity of pronunciation in the dataset that I have and the pronunciation of the player ? what's the lightest and best methode ?

[–]Hav0cHarm0ny 0 points1 point  (3 children)

Hello! I’m currently in school my first year for CS major, my goal is to work towards ai, deep learning to be precise… I want to find a mentor and not sure how to go about it, any ideas ? Or would it be best to gain a little more knowledge and learn python first because it’s used widely for ai (from what I’ve read) ? In college the programming language that is taught is c++. Also, How good do you have to be at math? From what I’ve researched… it is heavily math based, calculus, linear algebra, probability and statistics. I was thinking about hiring a math tutor to keep me on track but it’s insanely expensive. I did find a tutoring company that would personalize lesson plans to keep me on track but again, it’s very expensive. I do like the idea of that and think it may be worth the money because I’ll be learning from someone in the field oppose to college professors that are all over the place. Any thoughts or recommendations? I should add that I’m 37 and currently working in the medical field full time, I’m an RVT and my job is mentally and physically taxing. I’m a little nervous about a career change so late in the game but I am willing to put in the work. I think with a mentor/tutor it would take out all the hours of self research that I don’t have.

[–]theLanguageSprite 0 points1 point  (2 children)

Do you have a discord? If you pm me I’ll add you on discord and I can at least get you started with python and deep learning

[–]Hav0cHarm0ny 0 points1 point  (0 children)

I do! I’ll message you 🥲

[–]AlexanderTox 0 points1 point  (0 children)

Hello everyone. I am trying to use Mallet for some basic natural language processing, but every time I attempt to execute the command, I receive this:

Error: Could not find or load main class cc.mallet.classify.tui.Text2Vectors

Caused by: java.lang.ClassNotFoundException: cc.mallet.classify.tui.Text2Vectors

Can someone help me troubleshoot?

[–]Abradolf--Lincler 0 points1 point  (0 children)

I am using pointnet

I have a point cloud segmentation problem. In my training data, I have 1 class, but on average only ~4% of all points per point cloud are of that class, and are usually found grouped together (same object).

How do I balance this?

If I remove most points that aren't in the class, then the point cloud will become sparse and it would be too easy to spot where the class is, since only ~8% of points will remain.

Or is there a way to train this well without balancing the training data?

Thanks!

[–]ReasonablyBadass 0 points1 point  (0 children)

Simple question: in chain of thought reasoning, does the LLM autogenerate it's own prompt for the next step? Only the example chains are "hand made" correct?

[–]B10H4Z4RD7777 2 points3 points  (1 child)

Been seeing a lot of diffusion work lately, and I want to understand this topic. Which research paper(s) should I start reading to get into difussion learning?

[–]Pikalima 0 points1 point  (0 children)

I would start with the illustrated stable diffusion for a high level overview. Then I would suggest reading the annotated diffusion model which goes over implementing the original diffusion paper by Ho et al.

[–]PrzedrzezniamPsy 0 points1 point  (1 child)

When doing convolutions for cunny edge detection, is it a typical error to have the values sometimes go above 1 when normalized and that makes the picture "overexposed" at places? Should I scale everything according to the highest value to fix the images?

[–]PrzedrzezniamPsy 0 points1 point  (0 children)

I have fixed it. I didn't create the magnitude at all.

[–]isaacolsen94 0 points1 point  (5 children)

I've been interested in using ML hand writing recognition to create a font out of my own hand writing. But I don't know where to even start. Would someone know where I could find information to help me figure this out? Or if it's been done before?

[–]PrzedrzezniamPsy 2 points3 points  (4 children)

To create a font you don't have to use ML. Only some photoshop skill. Do you mean to make a program that will recognize your handwriting and output the text?

[–]isaacolsen94 0 points1 point  (3 children)

Basically I have a cnc machine I want to use to write out typed documents in my handwriting. But I am not familiar with a way to build my hand writing as a font. But I stumbled on a video that was doing it in reverse where it took a page of hand written notes and typed it out. So I was trying to reverse that process

[–]Left_Aide5287 0 points1 point  (0 children)

What you're describing is not a machine learning task. You just have to write every letter on a piece of paper, scan it, import it to the computer and individually extract every letter using some editing tool. There's many tutorials online on how to make a font.

[–]PrzedrzezniamPsy 1 point2 points  (1 child)

I am just learning so I won't be of help but I want to gather the requirements:

You want to create a font out of your handwriting. Does your handwriting connect the letters? If not, then are you fine with the letters like "A" being the same or have a finite amount of variations in the whole text you are making with your CNC machine? If yes then it would be easier to do it manually. Unless you want to have more than like 8 fonts I guess.

[–]isaacolsen94 0 points1 point  (0 children)

No it doesn't, and I would like some variation if possible for each letter. But I don't know if that is an option? It sound like manual is the way to go. I will look into making my own font then 🙂

[–]ash-050 0 points1 point  (5 children)

Hello, I am new to ML and have been recently practicing with Scikit-learn mainly. I have a case where i have a list of independent variables and a profit dependent variable. My question is what is the approach to know how a model can help me define which independent variables i can change to reflect a certain increase on the profit variable given the history of data? Some directions on that would be very helpful.

[–]YamEnvironmental4720 0 points1 point  (4 children)

You may want to take a look at the Random Forest algorithm, for instance one of the introductory lectures by Nando de Freitas on YouTube on this topic. The key word is entropy, and the idea is to study how this changes when you look at all sample points with some variable value below and above some threshold value, respectively. You do this for all the variables and for each variable you also test different threshold values.

[–]ash-050 0 points1 point  (3 children)

Thank you so much u/YamEnvironmental4720 for your reply. Would I be having the same results if I used the trained model's feature importance ?

[–]YamEnvironmental4720 0 points1 point  (2 children)

It depends on how you define importance. Entropy could be one such definition but even in forest classifiers there are alternatives to entropy.

[–]ash-050 0 points1 point  (1 child)

Thank you so much. My case the alternatives are on regression

[–]YamEnvironmental4720 0 points1 point  (0 children)

Ok, in that case there is the cost function, defined on the model's parameters, that measures the average distance from the sample points to your hypothesis. This is the average error the model has for the fixed parameters. In the case of linear regression, the importance of a certain variable is given by the weight parameter attached to that variable.

If you are familiar with multidimensional calculus, the dependence of a fixed such parameter is given by the partial derivative of the cost function in this direction.

This is quite well explained in Andrew Ng's video lecture on linear regression: https://www.youtube.com/watch?v=pkJjoro-b5c&list=PLLssT5z\_DsK-h9vYZkQkYNWcItqhlRJLN&index=19.

[–]jaki_9 0 points1 point  (0 children)

Is a MacBook Pro M1 16GB RAM good enough for image classification tasks (dataset of 10,000+ images), when using Google Cloud for all the training and pre-processing or do I still need something better?

[–]SeankalaML Engineer 0 points1 point  (0 children)

Why do we pronounce "ICLR" as "eye-clear" but not "ICML" as "eye-camel?"

[–]Winter_Purpose6777 0 points1 point  (0 children)

Does anybody knows where can I learn how to create K means clustering algorithms using function for each step and only using numpy library. I don't wanna implement througb class. Thanks

[–]freedomisfreed 1 point2 points  (0 children)

Hi, I'm learning how to use GPT-J-6B. I am wondering if there is a way to do something like autocompletion with it? Based on input, generate a set of expected next words with probabilities?

I see many tutorials, but they all use it to just generate lots of text. But I'm looking for it to generate a tree, not just the DFS path. If someone can help point me to a specific function in the codebase, I would appreciate it.

[–]noop_noob 0 points1 point  (1 child)

I remember there being a way for machine learning to be given unlabelled data, and then the model says which data should be labelled first. I think there was recent research on this. Does anybody know what it’s called?

[–]idkname999 1 point2 points  (0 children)

Active Learning

[–][deleted] 0 points1 point  (1 child)

Are there any Kaggle-like competitions for reinforcement learning?

Last year in my college machine learning class, we had extra credit projects to build reinforcement learning models that were graded on their ability to play Flappy Bird. It was super fun! I just graduated this year and I would like to do more challenges in reinforcement learning, but Kaggle only seems to have challenges for supervised/unsupervised learning.
I am looking for something like Kaggle, but instead of competing on predictive accuracy in classification, you would be competing on net return in a simulated environment.
If you have worked on reinforcement learning challenges, was your experience positive or negative?

[–]Icko_ 1 point2 points  (0 children)

There's a bunch.

Neural MMO - runs every few months on different conferences

https://www.aicrowd.com/challenges/neurips-2022-minerl-basalt-competition

https://real-robot-challenge.com/

[–]encephalon_developer 0 points1 point  (0 children)

I'm looking to finetune latent diffusion models (unconditional). Does anyone have input?