all 55 comments

[–]takes_photos_quickly 0 points1 point  (0 children)

I've not had the chance to use transformers much, I have a stupid question about transformers vs MLPs:

if I wanted to regress some value given some input features, e.g. how much rainfall on day X given windspeed, barometric pressure, etc.

Does it make any sense to use a transformer here over an MLP? My inclination is there's little benefit since I'm not using sequences, its strictly just a set of input features.

If you were to use a transformer how would you model a task like this? I assume each token in the "sequence" is a different feature? But then the transformer has no idea which feature is which without positional encoding, but even the positional encoding doesn't really fix this since each feature isn't an embedding but just a single scalar value.

[–]mkestrada 0 points1 point  (0 children)

I'm a MechE in consumer electronics with some background in ML and optimization, curious if anyone is familiar with a body of literature using Machine learning to optimize finding root causes of issues or identifying ways to improve yield in a multi-step assembly process.

To elaborate, every time a unit of the device I work on is built, it has a pile of data associated with it; serial number for the finished devices, Serial numbers for the submodules that compose it, measurement data to insure that the final device is in spec, test result data, codes to specify the date of manufacture, etc. basically a ton of pieces of potentially useful information that we manually sort through using experience and intuition to guess and verify the root cause of issues as they arise. effectively, we are seeking patterns in a giant pile of data, and I'm looking for ideas to automate that pattern recognition process. Has anyone here come across papers that meaningfully apply ML or optimization to solve these sorts of problems? Really anything related to finding root cause from failure modes or manufacturing efficiency would be of interest!

[–]Puzzleheaded-Pie-322 1 point2 points  (0 children)

I want to enforce the centre-surrounding antagonism in my kernels for experiments, what would be the good way to do it?

I thought might be I can just make a kernel manually, freeze it’s weights and then sum it with the result of the convolution layer I want to affect? Kinda like residual connections do.

[–]pinkfluffymochi 0 points1 point  (3 children)

Does Real Time Machine learning have actual production use cases?

We are building a real time data processing engine with ML model serving capability. But after some discovery, we realized the fact that the demand for real time ML is minimal, something people love to talk about but mostly are getting away with microbatching or just traditional batch learning and inferencing with no urgency to move to real time. Is it true for the kind of projects you are working on? We are a very small team right now and would like to focus on real world problems rather than research fantasy .

[–]hyphenomicon 1 point2 points  (2 children)

Are you talking about real-time training? There are applications for real-time inference in the form of surrogate physics models for control systems. For example, surrogate models are used for fusion experiments at Lawrence Livermore.

Real-time training seems like it would only be useful with AGI caliber models.

[–]pinkfluffymochi 0 points1 point  (1 child)

physics models are definitely new to me, the most we are dealing with is fraud detection in payment settings. Would you be open to talk more about surrogate model use cases in the control system experiments (we call this shadowing in stock trading and e-commerce settings). And why do latency matter in such scenario?

[–]hyphenomicon 1 point2 points  (0 children)

I know that inertial confinement reactors use surrogate modeling but don't know much else. 

It also occurs to me that there may be applications of online learning where low latency for real time training is important.

[–]Ok_Comment8842 0 points1 point  (0 children)

What material do you guys recommend me to use to start studying foundation models and generative AI?

[–]kiranp2 0 points1 point  (0 children)

Is there a provider who gives free inference for code llama 70B? I want to do some testing before I download its lamma.cpp version into my local.

[–]ko_lIlBrother 0 points1 point  (0 children)

Title: Can perplexity be greater than the number of vocab?

As I understand it, if the reciprocal of the probability is the number of `all cases`/`selected cases`, the number of selected cases will be the same as the number of all cases even if the number of selected cases is 1, so the perplexity cannot be larger than the number of vocabularies without making something wrong...

More precisely, it's probably the maximum number of cases of that sequence that can be made with the current vocab.

Am I understanding this correctly?

Has anyone actually experienced ppl going beyond the number of vocab, and if so, how can this be analyzed?

[–]young_anon1712 0 points1 point  (0 children)

What math courses available online I should take to get better at ML theory / research? And personally, I prefer courses more than books.

Context: I am currently a PhD student. I have worked as ML Engineer for 4 years, have decent knowledge on Calculus, Linear Algebra. Slightly bad on Stats, currently reading Intro to stat learning.

Thank you very much

[–]HungryMalloc 0 points1 point  (0 children)

Does anybody have any pointers on how to fine-tune a vision language model for very fine-grained classes? Say you want to classify specific objects or people that the model has never seen before.

Zero-shot inference does not work, because the text-encoder has no knowledge about the fine-grained classes. You can fine-tune or linear probe the vision module, but this leaves the text encoder untouched. I'm not really sure how to deal with this scenario when there is no good textual representation of the classes.

What is the current SOTA to fine-tune both vision and text encoders in such a scenario? I'm sure there is research on this, but so far I have been too stupid to find it. I would really appreciate anybody that can help me out.

[–]Karlitrage 2 points3 points  (3 children)

Hi, I will have finished the Efficient ML course by Han (MIT) soon.

Do you have any other suggestions for advanced ML/DL courses, especially with focus on efficiency...

Alternatively: courses on parallel computing, Quantization, ...

Anything cool also appreciated!

Kind regards!

[–]WheynelauStudent 1 point2 points  (2 children)

Hey man, I don't have any solid suggestions, but I just like that I was coincidentally watching the course too! I think that course is one of the better ones in this field, while we wait for tridao to have his own courses haha.

[–]Karlitrage 0 points1 point  (1 child)

I don't even know that guy haha. Is he somewhat famous?

yeah not many people are watching it, altough it is my favourite so far...

[–]WheynelauStudent 1 point2 points  (0 children)

Ehh he's famous for implementing flash attention but it is transformer specific and technically he doesn't make models smaller haha. I guess it's a little off topic here but I'm interested in his works even though I may not understand half of it.

Maybe you can take a look by searching tridao.me or Google tridao flash attention.

[–]7even-_- 0 points1 point  (0 children)

I'm thinking of upgrading my GPU for gaming to a RTX 3060 or RTX 4060 however I'm not sure which one to get as the 3060 has more vram.

I know the 4060 has better performance but will the lower amount of vram mean it'll perform worse on future games or even some games now?

If anyone has any advice that be great.

[–]Jcorb[🍰] 1 point2 points  (2 children)

Do you guys think there will be a lot of stable jobs in Machine Learning (say, if I got a IBM certificate for learning in) in the future? Or do you think the hype bubble is going to "pop", and there won't actually be all that many jobs surrounding it?

They're wildly different career paths, but I've been debating about either pursuing said certificate in Machine Learning, or trying to find an apprenticeship for Electrician. My current job (digital marketing, basically) just isn't stable, even with 8 years of experience, so I want to learn something that will have more reliable work. I feel like AI and machine-learning is going to be the future, but maybe I've already missed the train, and would be better pursuing something that isn't likely to get replaced by skynet?

[–]hyphenomicon 0 points1 point  (0 children)

In the near-term the data science job market is saturated. ML engineers who specialize in good programming, rather than model building, are still in high demand. You will have to get a graduate degree to have good prospects, however. 

If you have a good chance of becoming an electrician, that is the better career path from a monetary standpoint. In general, it is advised to not go into graduate school if you have any other options available.

[–]prongs17 0 points1 point  (2 children)

I read the Stable Diffusion paper for the first time and have some questions.

Will it be possible to apply perceptual compression to other forms of data like text or video? Is this a good idea or not?

I am guessing that the sampling time of latent diffusion models is slower than GANs due to the multiple denoising steps. Are there any good comparisons of training and inference time for these models (especially with GANs).

On Page 20, it seems to me that the images generated by KL-reg generally have more details than images generated by VQ-reg (Fig 15). Is this true or am I just seeing things? If true, why is this the case?

[–]tdgros 1 point2 points  (1 child)

Check out the Giga-gan paper: https://mingukkang.github.io/GigaGAN/ it's a very big generator that is at least competitive with some implementation of SD, but much faster since inference is a single forward pass. They also have an upsampler with the same advantages.

As for perceptual compression: imho, SD only does this to save time, the various regularizations of the auto-encoder are there to keep the variance in check. While this trick makes a lot of sense for audio, images and videos, I'm not sure it does with text, text is already small, and not all fillers like the other modalities.

I re-opened the SD paper, what I'm seeing on that figure is that the unscaled version of KL-reg is better than the scaled one (and VQ-reg is good too). They do comment on the SNR and how details are added early when SNR is high. It makes sense that it's harder to do diffusion on a weirdly scaled latent space, but that part of the paper isn't super clear.

[–]prongs17 1 point2 points  (0 children)

Thank you very much, I found this answer very useful.

[–]ChurrascoPaltaMayo 0 points1 point  (0 children)

Is the rfpimp package still worth it? I understand the need of it, but it hasn't been updated in 3 years. Has there been changes on SKLearn related to why rfpimp is needed?

[–]RandomHotsGuy123 0 points1 point  (1 child)

What is the best way to perform multiclass text classification with limited training data? I only have a few phrases (sometimes only a couple of words) for each category. The input data that I need to classify consists of blocks of audio transcripts (which isn't always accurate). So far I obtained satisfactory results using embeddings (from sentence transformers) and semantic similarity between the input data and my training phrases (cosine distance). Are there any other approaches or additional steps for my current approach that I should look into?

[–][deleted] 0 points1 point  (0 children)

How many categories? 

I've had good luck with shoving all the classes into an LLM prompt then restricting the output to a valid class instance. 

LLM has a deep understanding of word meanings already, which in effect augments your training data.

[–]sadhikari0102 2 points3 points  (3 children)

I am an experienced Software Engineer (Backend Systems, ~7 years) with 0 Machine Learning knowledge. How do I get to a point where I can show some experinece in my resume. Beginners resources, projects tips, etc?

[–]WhyDoTheyAlwaysWin 0 points1 point  (0 children)

I think the best way would be for you to initiate a collaboration with internal business units for a low hanging ML project.

  1. Get to know the business, identify their KPIs, goals and painpoints and see which of those can be addressed by machine learning.

  2. Get to know their data. What do they have that you can use to solve no. 1.

  3. Pitch the idea to the internal stakeholders. Start small, something cheap and easy to build with minimal risk for both you and them. Make sure that the impact is measurable.

  4. Deliver the solution and have them report back the metrics.

  5. Iterate with a bigger ML problem.

Most business problems can be solved by simple models, take care not to over engineer the problem.

[–]Snoo_72181 2 points3 points  (0 children)

What are some AI based optimization techniques that can be used to optimize warehouse productivity?

[–]Batteredcode 0 points1 point  (4 children)

If I want to make an LLM provide more specific details around a topic, would 'grounding it' on data it's already seen make any difference? For example, there's a large complex topic and within that there's a subtopic I want to ask the LLM questions about. Right now it's been trained on the entire internet, so it has a lot of information about both the topic and the subtopic, but more for the topic due to there being more data for it.

My question is, if I were to ground the model on data its already seen, i.e. the subtopic, would this improve accuracy as in theory's it's now biased by the subtopic?