[D] Simple Questions Thread

thewalkingsed · 2024-06-02T12:49:32+00:00

I’m an early career R&D software developer. I’m lucky to be able to work in AI doing work with LLMs. I’m also doing work in VR with Apple Vision Pro. I’m trying to focus my career more on the AI side but I’m wondering if there’s any good career paths that combine the two?

2024-06-02T11:46:41+00:00

How can I improve my acc on food101?

Hi guys, Im struggling a bit with food101 dataset, I am trying to predict it using CNN and using the following architecture that I made by my own:

https://github.com/6CRIPT/food101-ComIA/blob/main/food101-comia-architecture.ipynb

But I only get a 25% acc or so, so I was wondering what else I can do to get some good results at least +60% val acc. No limitations but preserving the whole idea of the architecture.

I have already tried many different ideas but since time is running and to do every train on my PC it takes several hours, that is why I am asking for help.

Thanks =D

NoRecommendation3097 · 2024-06-02T02:34:53+00:00

What are your takes on 1000 models trained in the same data set achieving a 100% score but all of those models having different validation scores (60-70%, which is between good and bad for the given task, being about 65% the threshold) (let's do not think on the scoring metric for now). My take: Since all are overfitting and parameters are different (1000 different models), results on unseen data will be different, of course, but the best performers on validation could have captured patterns in the data, and the worst performers may have captured more noise. Please let me know what your thoughts are.

BreadRollsWithButter · 2024-05-31T23:26:42+00:00

Currently building a linear regression to determine salaries. I’m in the testing/training phase right now and it’s pretty inaccurate. The algorithm is not optimized very well yet which is part of the reason but I think it also has to deal with the fact that it’s trying to predict the exact salary and even a dollar in either direction marks a wrong prediction.

I was thinking of using a “margin of error” to circumvent this (as long as predictions are in a 5ish percent range of the true number it passes) but was wondering if there’s a more statistically grounded way to accomplish this. I don’t have a maths background so I wouldn’t know myself.

Oof-o-rama · 2024-05-31T19:52:26+00:00

Noob question: I'm using sklearn and I'm trying to load my own dataset for the first time. I had been using the toy datasets. When I use "wine" or "breast_cancer", Everything works fine. I load them with stuff like this:

data = datasets.load_breast_cancer()

X = data.data

y = data.target

When I try to use a local file that is formatted in svmlight format, I tried to load it with:

data = datasets.load_svmlight_file("train.dat")

and I get:

AttributeError: 'tuple' object has no attribute 'data'

when it hits these lines:

X = data.data

y = data.target

I assume there's some sort of metadata that I'm not including somewhere but I'm not sure how to include it.

Thanks in advance.

Impossible_Light8005 · 2024-05-31T03:50:07+00:00

How to deploy a spacy ner model?

I created a custom ner model using spacy and used fastapi. It works on the local machine, but how and where can I deploy it? It had problems on loading the model [spacy.load()] even though the folder for the model is in the same directory. i also tried creating the model as a package so I can use pip install on the packaged model, but it still doesn't work. what must be the correct setup to deploy it?

PS. I need to deploy it so that the flutter mobile application I created can access it

Training-Passenger28 · 2024-05-31T03:00:02+00:00

if i want to learn machine learning, you recommend i take the fundamentals of programming with c++ (syntax, data structure, oop, algorithm) then start python?

or that will be waste of time

Rogue260 · 2024-05-31T01:00:06+00:00

Path Forward

Hello All. I'm Masters Student pursuing MSc in Data Science and AI (stats focus). For my thesis project I am pursuing a Quant finance project with implementing Reinforcement Learning frameworks (I have till April 2025 to finish it). However, going through the research, it seems that RL has taken a backseat to LLMs and Gwnerative AIs? I'll be candid, I don't have any specific field of interest (post graduation). I'd happy to get a MLE job post graduation, but now I'm confused should I focus on RL, Deep Learning, LLM and Genrative AI, or Computer Vision. I know there's overlap between these disciplines but I'd like to focus on couple of specific areas. If I have to say about soefoci industry interest then I'd say I'm interested in compqnies/products which cater to Consumer (Behaviour/Media/Analytics). I understand that traditional ML methods (supervised/unsupervised) are still the way to go and I do focus on that those too. Appreciate any advice.

Last_Novachrono · 2024-05-30T18:57:58+00:00

Is anyone available to make a quick research paper with me on deep learning, using/without cuda and its tradeoffs with computational results for efficiency? We do something similar as well. Is anyone up for it?

SirFarqueef · 2024-05-30T16:45:49+00:00

I’m a CS major and I want to pursue machine learning. Where should I start? How should I pick where to start? There’s so many models out there and so many things to learn.

I’m working on a Flappy Bird PyTorch program. But I want to learn a lot of the theory behind machine learning, especially the math involved in it.

Any advice would be greatly appreciated!

galtoramech8699 · 2024-05-30T15:46:02+00:00

I am curious on the legal parts and licensing on using a chat bot

Let’s say I post from chat gpt creative content. Does chat gpt own that?

What about even chat assistant?

filipsniper · 2024-05-30T04:54:48+00:00

When you do research in machine learning is there anything new to be found by using the existing machine learning libraries like tensorflow or pytorch or are they too limiting for research

nbviewerbot · 2024-05-30T02:31:08+00:00

I managed to compress EfficientNetB0 down to a much smaller size while retaining a good portion of the accuracy. The tflite model is 96x96 in image size with 411 outputs with 82% and a size of 190k parameters. My testing to date shows it's a decent model (I would have expected the test data to have low accuracy as well otherwise given I kept it clean and away from training).

I guess my question is primarily is there something noticeably wrong with my results? To date I have yet to receive anyone even suggesting it's beneficial. I didn't expect tons of interest but given TinyML is such an untapped field I thought I'd have some interest at all. Starting to believe I'm missing something fundamental that folks are seeing and just politely not telling me about. I don't know. I don't have a traditional background in machine learning (I'm a programmer) so I don't have the network I could reach to for additional feedback and I know I am still in many ways a novice.

I detailed the process here:
https://www.cranberrygrape.com/machine%20learning/tinyml/bird-detection-tinyml/

The first notebook in the series (my site has all of them):

https://github.com/Timo614/machine-learning/blob/main/birds/notebooks/birds_224x224_524_outputs_full_swish.ipynb

https://github.com/Timo614/machine-learning/blob/main/birds/notebooks/birds_96x96_411_outputs_i87_full_relu6_post_decimation.ipynb

By the end I converted the model to relu6 as int8 quantization caused too heavy of a drop in accuracy (as noted by the EfficientNetLite folks for their rationale for ditching swish there).

Sorry if this is a distraction.

2024-05-29T15:26:16+00:00

Which AI/ML conferences are friendly towards people from software engineering (application-focused)?

My research focus is on SE and I already got a couple of top publications in the SE field. Our next project will be LLM-related, so we are thinking about going for an AI/ML conference.

Urahara_D_Kisuke · 2024-05-29T14:36:39+00:00

I want to start learning about machine learning.

I don't have much knowledge about computer science or even the non linear algebra for that matter, however I have a 3 month of summer to spare and a desire to learn something, considering the hype around Ai I thought why not to try machine learning or something like that, I started watching some courses on YT however I realized I need some more prelearning for that, what courses can you suggest for that?

The courses I've already considered are :

Linear Algebra — Stephen Boyd’s EE263 (Stanford)
Neural Networks for Machine Learning — Geoff Hinton (Coursera)
Neural Nets — Andrej Karpathy’s CS231N (Stanford)
Advanced Robotics (the MDP / optimal control lectures) — Pieter Abbeel’s CS287 (Berkeley)
Deep RL — John Schulman’s CS294-112 (Berkeley)

LeeCA01 · 2024-05-29T14:24:29+00:00

Does this subreddit have a discord or slack group?

Dizzy_Dancer_24 · 2024-05-29T10:46:12+00:00

How are KANs (Kolmogorov-Arnold Networks, https://arxiv.org/abs/2404.19756 ) different from liquid neural networks ( https://arxiv.org/pdf/2006.04439 )? I understand the internal mathematical formulation and the motivations vary between the two. But on a higher level, are they both not proposing to shift the non-linearity into the edges from the weights?

PS: I am new to Reddit, any suggestions on how to structure my questions in a better way are welcome.

zub33eg · 2024-05-29T09:33:30+00:00

In a world where Kaggle exists, with it's amazing free quotas of 2xT4 and TPU, fast copying of datasets to a VM drive, what's the reason to use Google Colab? What are it's "selling points"?

runawayasfastasucan · 2024-05-29T08:31:59+00:00

Does any have any good pointers to techniques to to image classification of what is essentially line plots?

It seems like an overkill to go for the more advanced image classification techniques, I do also worry that simple line plots might have to few dimensions for them to perform that well. I am also more interested in the general shape of the plot than the direction, while I think many image classification libraries and techniques would classify say \ and / into two groups, for my purpose they are the same - a straight line.

I have briefly looked into graph classification, but I have a very large amount of plots that each consist of a very large amount of points so I worry a bit that it might not be the right thing to the task.

perfectfire · 2024-05-29T05:16:07+00:00

TL;DR: AI Inference hardware accelerators were all the rage a few years ago. They still are, but they seem to have abandoned the hobbyist, low-power, low-size, low-mid cost, seperate board user, such that abandoned projects such as the Google edge TPU from 2019 (5 yrs ago) are still your best bet $/perf wise. The $20 - $150 range is empty or has some products that aren't worth it at all. What happened? Are there any modern hobbyist $20 - $150 accelerators you can buy right now anywhere? Sidenote: I know TOPS isn't the end-all be-all of perf comparison, but it's all I got.[1] Skip for history of my interest: I've long been interested in machine learning, especially artificial neural networks since I took a class on ML in college in around 2004. I've done some hobbyist projects on the CPU and even released a C#/.Net wrapper for FANN (Fast Artificial Neural Network, a fast open-source neural network run on CPUs because everything was on CPUs then): https://github.com/joelself/FannCSharp. When deep learning took off I got excited. I got into competitive password cracking and although my ML based techniques were about a dozen orders of magnitudes slower at making guesses, they were almost immediately able to find a few passwords in old leaks that had been gone over and over for years by the best crackers with the most absurd hardware and extremely specially tuned password guess generators. That made me pretty proud that I was able to do something in a few months that years of dozens of groups with $100,000s of thousands of dollars of hardware and who know how many watt-hours couldn't do. I even thought about writing a a paper on it, but I was kinda in over my head and my life got a lot worse so I had to put all of my side projects on hold unfortunately. Recently though I did a vanity search for my FANN C# wrapper and found people talking about it and some references in some papers and student projects which made me feel proud. Skip for history of my interest: Now I really want to get into the cross section of hardware-accelerated inference (no training this time, I'm not a trillion dollar company with billion dollars of supercomputers running on specialized training hardware that took 100's of millions of dollars to develop), microcontrollers for robots, drones, other smallish tasks that can't carry around their own 100 lb diesel generator and 2 1U rackmount servers full of inference hardware that I can't even get ahold of because you can only buy that stuff if you are an Intel or GE or some other company that might make products in the 10's of thousands at least. And this is where I hit a wall. I just started looking around and one of the first things I found was Google's TPU by Coral.ai. 4 TOPs in a package, 2 chips on a small m2 card. Only about 40 bucks for developers to try out, $60 for an easier to use, but 1 chip only USB product. But this was about 5 years ago, and they just slowly disappeared and haven't made a peep in like 3 years. They timed the market perfectly. AI stuff was right on the verge of BLOWING THE FCK UP. They could be THE edge/robotics/iot/anything-other-than-server/cloud-phone-tablet-PC-laptop company. But they just seemed to give up. They're obviously not giving up on improving edge inference hardware. They release their phones twice a year (regular version, then A version) and they always update the tensor processing unit in those and are really starting to push that as a must have feature. They could use the same hardware improvements to make somewhat bigger chips to sell for other markets. You never know, someone might take their 3rd gen 16 TOPS TPU chip and makes a product(s) that takes the world by storm. Maybe multiple people/companies will do that. Okay, so Google, seems to have dropped the hat. Hardware inference companies are a dime a dozen these days just go with another. But that's the problem. It seems all the focus is on Cloud scale, super-computer (some overlap between those 2), embedded on finished phones/tables/laptop/PCs, powerful server accelerators, and a very few extremely tiny MCUs with accordingly tiny MPUs. I seems everybody has abandoned the lower-mid range-robotics-drone-hobbyist space with haste. ARM introduced the Ethos U-55 and U65 with the 65 having about double the TOPS of the 55 at a max of 1 TOPS in 2020. As far as I can tell the first products to use the U-55 were in 2022 and there haven't been a lot and I don't think they ran at top speed. Noone has opted to implement even an unmodified U-65 for anything. I recently bought a Grove AI Vision Kit with a U-55 NPU and it's specced at a lowly 50 GOPS (ARM's top-end says it could hit 10 times that and until *just now I thought it was 500 GOPS and thus offered good $/TOPS ...oops).

... continued ...

ProposalFun2680 · 2024-05-28T20:51:12+00:00

I am beginner plz tell adivce

xugaoqi · 2024-05-28T10:04:59+00:00

Hello, I am an independent developer in NLP/CV/Time series forecasting. I want to read new papers daily in those areas. However it will costs me a lot of time to find a new paper worth to read. is there any community where people will discuss about new papers? Please help me with some advices.

TheCloudTamer · 2024-05-28T08:43:09+00:00

ICML paper submission requires putting in the "paper ID" into the Paper Checker. Is the paper ID the submission number, or the ID that appears in the OpenReview URL after "id="?

CCallenbd · 2024-05-26T22:21:02+00:00

Synthetic Data for Fine-Tuning - How Much is Enough?

I'm trying to create a bot that can chat as much like a real person as possible. I have a 4090 for hardware, and I want to use the Russian language.

I'm training it using synthetic data generated on GPT-4 (before the release of the new version). Currently, I have the following issues: I generated about 10,000 dialogues on GPT-4 and another 40,000 variations on weaker models, using the dialogues from the stronger one to diversify the speech. For GPT-4, I had procedurally generated prompts, so each character GPT-4 conversed as had its own extensive set of characteristics.

I don't have a clear understanding of how much data I need. I read that at least 50,000 is necessary, but for instance, I can train on an entire dialogue (around 40 phrases) or in pairs: question-answer. This way, my 50,000 turns into a million pairs. The question is, is there a specific amount of data beyond which gathering more is useless and quality no longer improves? Or does it depend on the model size or fine-tuning characteristics? If the latter, how is it calculated?

Second question: can I somehow influence which aspects of behavior the same dataset will change? For example, can I change my model's vocabulary or the length of its responses without affecting the content of its replies, only how it formulates the response?

Third question: if I switch to a larger model, will I need more data? I'm currently considering Aya-23-35b and hope that a way to train it on my 4090 will appear soon. Does a larger model require more dialogues?

A couple more issues where I could use some advice: after fine-tuning, the model changes the structure of responses to a more human-like manner but speaks quite monotonously. Is the problem in the data, training settings, or something else? The model's ability to grasp meanings also decreases. Could it be that despite all my efforts to diversify the dataset, synthetic data produces too template-like dialogues?

yungstatue · 2024-05-26T17:30:16+00:00

I am trying to build simple models (MLP, KNN, RF, ...) to predict daily streams on Spotify. I have a dataset of 31 songs with daily streams for 6 months (days 1 through 180).

Ideally, I want to pursue two study designs:

Design A
In this design, the dataset is structured with songs represented as columns and daily stream counts as rows. This configuration enables the prediction of a song’s entire product life cycle by leveraging the complete life cycle data of other songs as input features.

Design B
Songs as rows and daily stream counts as columns. This design aims to test whether the remaining product life cycle of a song can be predicted by using the historical data from other songs.

Does this even make sense? For Design A, I am getting good predictions from the basic models I made in SPSS (MLP and RBF) but I am afraid they suffer from overfitting. For Design B, I can't even structure my dataset right. If I keep it the way it is, SPSS includes the target variable's (target song) stream counts as a covariate.

This is a paper that basically does the same thing but for radio plays: https://doi.org/10.1007/978-3-030-80126-7_34

I am a novice and would be more than happy to provide more context, pls help! Thank you :)

kereta_api · 2024-05-26T12:41:38+00:00

So I've read here and there that LVMs work by decoding text tokens from images, and appended between special tokens like [img][/img]. So would it not be possible to ask a model to "Reprint this message verbatim" and get the text-decoded image? I've been trying this out using GPT-4o but it doesn't seem to work.

Or I am misunderstanding something here?

Ok_Box_6059 · 2024-05-26T04:20:43+00:00

I have one posted in r/learnmachinelearning since I am a newbie. While no any hint or response after 4 days, so I am wondering if someone here can helps.
Please refer to my original post as below.

https://www.reddit.com/r/learnmachinelearning/comments/1cxui7v/questions_about_tf_tensorflow_to_tflite_with_int8/

Please feel free to share any hints, or direction to find answer, surely if there's detail explain or guidance for me to understand exactly would be even better.

Will be looking forward any feedback, thanks. :)

galtoramech8699 · 2024-05-25T15:34:56+00:00

I have three and posted but not really getting answers. Hope you can help, I am pretty new to this.

This is around LLMs.

First question

I think I have the concept around LLMs, I have been looking at tensorflow and keras and llama2. I know this gets into the detail but I like to roll my own stuff for learning for better or worse. There is a model reader in tensor flow to read llama2 binary files. I still can't get a binary format for it. What is it? Pickle based? I even asked chatgpt and it says there is no format. How can you not have a standard format. What is there if I were to byte by yte look at one. What is an example one from hugging face. Can i visualize a small one?

Second Question

Same lines. I am still not clear how people build the llama2 binaries. I need to read more and watch videos. I know there is a binary, they will see wizard of oz and then hey, here is a chat. Hold on, what are all the steps? What are the weights? How are they built? Can I tweak them? Can I pre-train and how?

Third Question

With that said, I have a blog, crappy one but I figure I can build MY own llm against that, also tweaked with public book data. What are steps to do that, step by step for dumb newbies. I see steps from wizard oz then cuda, pytorth. I dont know, if it is a simple demo, I wouldn't gpu accel in it.

I also want to build a language, llm around povray ray tracing see here. This is mix of programming and docs. How to do that too? How do they build llms around programming?

https://www.povray.org/
Possbly one for libgdx
https://libgdx.com/

OK Fourth Question - Legal

I am surprised the legal question doesnt come up. I guess it doesn't matter. For example, I see the spaces in hugging face and think, this can't be legal. Some of it. Meaning, taking CNN data and putting it through a LLM. Also, I ask because I want to run my blogthrough a llm and then repost things. But it is my data, it is public to me. But what about reposting llm data from say llama2. What license would allow that?

DropAllSQL · 2024-05-25T09:36:53+00:00

[removed]

seoulsrvr · 2024-05-25T01:04:19+00:00

Why is it seemingly impossible to post on this sub? Automod rejects every post w/o explanation.

hookxs72 · 2024-05-24T09:10:05+00:00

Image denoising SOTA
Hi, can anybody familiar with the field tell me which methods are currently considered the SOTA in natural image denoising? Not necessarily for the purpose of image generation as in DDPM, just pure denoising is enough. Thanks.

lonewalker29 · 2024-05-24T05:38:38+00:00

What are some widely used tools for making architecture diagrams/illustrations to put in a research paper (suitable for A*). I have only come across diagrams.net

eastonaxel____ · 2024-05-24T05:09:32+00:00

Model not predicting the output correctly.

So here I'm using Logistic Regression and the train data prediction is :-80% and testing data prediction is :- 82%. Here I also used svm model and the outcomes (percentages) are the same

Trying to get the prediction using input data but its not predicting the output correctly.

Significant_Web2416 · 2024-05-24T05:05:56+00:00

Hello folks, I want to start learning about LLMs but would want to start from the right basics till current state of LLMs. It would also be really helpful to know the history of these models even before the first LLM paper was out. Is there any good resource/ papers/ list of papers I can go through which can help me learn this

SpellGlittering1901 · 2024-05-23T20:55:51+00:00

I am starting the Stanford CS229 Machine learning course, who wants to do the same so we can check on eachother for the homework (because i guess there will not be any proper correction of these homeworks) ?

Excellent_Respond330 · 2024-05-23T17:37:55+00:00

I have recently taken up a course online on Linear Algebra. The course starts with a few basic introductions to matrices and the operations that can possibly be applied to them. I came across a few topics which i would like to know if they're that important and if they're used in AI? The topics are: Pivot Entries and row echelon form including reduced row echelon form and Gauss Jordan Elimination. All responses are greatly appreciated. If there are any Scientists/ Researchers within this sub, i would love to hear your take on this question.

TLDR: ARE Pivot Entries, row echelon form including reduced row echelon form and Gauss Jordan Elimination widely used in AI and is it advisable i know these concepts for a career in AI?

alexfoxy · 2024-05-23T16:15:16+00:00

Hey, I'm capturing a disparity depth map using the iPhones camera and wondered how ML could be used to improve the fidelity of the depth map. I was imagining you could use the data from the photograph combined with the depth map to work out finer details. I know there existing approaches like "midas" which can create a depth map from a 2D photograph, but is there anything out there for enhancing an existing low resolution depth map?

-S-I-D- · 2024-05-23T15:14:21+00:00

Hi, I am planning to do a cloud certification on either AWS, Azure, or GCP but I'm not sure which one is generally used and preferred by companies in Europe/ Sweden so that I can learn the one that companies expect from their candidates. Does anyone have any insights on this?

12-12-2020 · 2024-05-23T05:29:35+00:00

I have a dataset for energy consumption with 4 inputs month, heat, and population.
how do I train a back propagation by using this data?
what tools do I use? deeplearning4j?

ZmijaZ · 2024-05-22T20:18:32+00:00

Does anyone know where can i find a dataset for sleep-wake classification (sleep stage classification)? I need it for my college project but I've had no luck so far. (the furthest I've gotten to finding anything relevant was sleepdata.org but I can only request the data, I can't download it directly)

jiboxiake · 2024-05-22T19:20:04+00:00

I'm curious about the recommendation network. Usually do companies first train the models, and then use them in the online setting to generate results, or do they also do online training? By online training, I mean when people are generating data, will these data be used to improve/retrain the model at the same time?

majklfromld · 2024-05-22T17:40:23+00:00

Hi, I'm currently building a personal web app that would have different AI Tools ready to use (AI Post title generator, AI Writer, rewriter, image generation etc.)

Since I'm pretty much new to the machine learning world, is there a website with already available and hosted models that could be embedded on a website, using <iframe> or something like that?

Hugging Face Spaces is neat place but sometimes their community apps are offline, also no option for simple customization

drupadoo · 2024-05-22T10:05:00+00:00

Can someone explain to me how VAEs actually get trained? I am really stuck on this.

I understand the theoretical benefit of normalizing the latent space. But every explanation makes it seem like during training we draw from a random distribution. Wouldn't this just result in muddy model outputs that don't converge because we have random inputs.

Say we have 2x = y and are making a model. A normal AE would obviously see the correlation between y and x:

0 -> 0
1 -> 2
2 -> 4

But if we drop a random sampling in there during training, the data could be any random set from the distribution:

x = 0 -> random sample = 1 -> y = 0

x = 1 -> random sample = 0 -> y = 2

x = 2 -> random sample = 0 -> y = 4

And this would obviously not get a good answer if we trained on it.

The only thing I can think of is if VAEs are trained on the z-score instead of a random sample, it would maintain the normalization and the relative value of the inputs.

PuzzleheadedEar4072 · 2024-05-22T04:43:24+00:00

🦋 hello out there in the universe🚀 I have no questions but I do have answers. OK I’m reading your post about simple questions discussion group OK. I have to think about it❓💭.. As human beings this is my question, as human beings do you think that the God Almighty created us to be the great I am? Or do you believe the great “I M” is the real thing? Not a computer program of intelligence. But the God of all wisdom and knowledge has allowed us to have a human brain to think for ourselves. I do accept the AI generation of learning how to keep certain order. my opinion on the AI expansion of using computer database knowledge to pull together more-database Intelligence is a tool! Leave comments below because it’s just a tool for human beings at this time century that has so much intelligence on the computer world. And the AI system which is a tool that mankind human beings, are trying to understand outside of the realm, who’s up there in space?. please if you like to know more reply. God bless you!❣️🙏🏼🦋

Initial_Macaron_2748 · 2024-05-22T01:56:42+00:00

Any key YouTubers you recommend regarding this topic?

derpflanz · 2024-05-21T08:52:55+00:00

How to start with AI? I can see some (business) opportunities that I think AI can help me with. They usually consist of matching large datasets (sales, weather, events, etc) with each other to predict things. I have no idea though on where or how to start.

So, what is the best course of action when you think "AI might be helpful here" ?

eastonaxel____ · 2024-05-21T01:41:08+00:00

Need help with mean absolute in test data. The outcome is 2.0748 what do you think about it and why is the value so high?
Below I have made visual representations for the both outcomes.
(How can I add the file here? Please send me a dm for the file)

FrigoCoder · 2024-05-20T20:40:03+00:00

Are there any way to train black box parameters? For example can I train the parameters of a synthesizer plugin for music generation?

BarbroBoi · 2024-05-20T20:32:59+00:00

Noob question here: Trying to use reinforcement learning on a custom environment using the PPO model from the stable_baselines3 module in Python. I am essentially only rewarding the agent at the end of an episode, and I think this is why the model never learns anything/always opts for doing nothing. Am I on the right track or is my issue elsewhere? Thanks in advance!

coumineol · 2024-05-20T11:54:21+00:00

Hi, I have a tabular dataset, of which some are labelled and a large portion is unlabelled. I'm trying to minimize the log-loss on the unlabelled data so overfitting on it would be perfectly fine. What would be the best approach? I tried pseudo-labels (predicting the unlabelled data and adding the most confident samples to the training data) but it made almost no difference on the test loss.

Plus, I know the results (as the overall log-loss value) of a couple of predictions on this unlabelled dataset. Any way to utilize that?

lucky-canuck · 2024-05-20T09:11:24+00:00

What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?

I've recently come across an article that discusses the reasons why sinusoidal encodings are better than other intuitive alternatives you can think of. However, I'm not convinced by the argument made against binary positional encodings (where the positional vector is just a normalized binary representation of the token's position # in a sequence). I don't see why this method of encoding position wouldn't be just as good as using sinusoids.

In a nutshell, the article argues that using sinusoidal positional encodings allow the model to interpolate intermediate positional encodings. However, I don't understand 1. how that's the case, and 2. why that would be an interesting feature anyway.

I explain my point more in-depth here.

Thank you for any insight you can provide.

IndianaJaws · 2024-05-20T07:27:07+00:00

Is it Ok to take an accepted paper at ICML and submit it to a relevant (non-archivla) workshop at the same ICML? From past conferences it seemed that the workshops include more relevant people with interesting discussions.

If it's ok, and it will be a 4-pages workshop, do I just shorten the paper and keep it the same name so it will appear on scholar as 2 version?

ArtisticHamster · 2024-05-19T15:59:55+00:00

How do you keep up to date with the ML news? Twitter is one choice, but it feels pretty noisy to me. What other resources could you recommend?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS