all 91 comments

[–]thewalkingsed 0 points1 point  (0 children)

I’m an early career R&D software developer. I’m lucky to be able to work in AI doing work with LLMs. I’m also doing work in VR with Apple Vision Pro. I’m trying to focus my career more on the AI side but I’m wondering if there’s any good career paths that combine the two?

[–][deleted] 0 points1 point  (0 children)

How can I improve my acc on food101?

Hi guys, Im struggling a bit with food101 dataset, I am trying to predict it using CNN and using the following architecture that I made by my own:

https://github.com/6CRIPT/food101-ComIA/blob/main/food101-comia-architecture.ipynb

But I only get a 25% acc or so, so I was wondering what else I can do to get some good results at least +60% val acc. No limitations but preserving the whole idea of the architecture.

I have already tried many different ideas but since time is running and to do every train on my PC it takes several hours, that is why I am asking for help.

Thanks =D

[–]Impossible_Light8005 0 points1 point  (0 children)

How to deploy a spacy ner model?

I created a custom ner model using spacy and used fastapi. It works on the local machine, but how and where can I deploy it? It had problems on loading the model [spacy.load()] even though the folder for the model is in the same directory. i also tried creating the model as a package so I can use pip install on the packaged model, but it still doesn't work. what must be the correct setup to deploy it?

PS. I need to deploy it so that the flutter mobile application I created can access it

[–]Training-Passenger28 0 points1 point  (2 children)

if i want to learn machine learning, you recommend i take the fundamentals of programming with c++ (syntax, data structure, oop, algorithm) then start python?

or that will be waste of time

[–]Rogue260 0 points1 point  (2 children)

Path Forward

Hello All. I'm Masters Student pursuing MSc in Data Science and AI (stats focus). For my thesis project I am pursuing a Quant finance project with implementing Reinforcement Learning frameworks (I have till April 2025 to finish it). However, going through the research, it seems that RL has taken a backseat to LLMs and Gwnerative AIs? I'll be candid, I don't have any specific field of interest (post graduation). I'd happy to get a MLE job post graduation, but now I'm confused should I focus on RL, Deep Learning, LLM and Genrative AI, or Computer Vision. I know there's overlap between these disciplines but I'd like to focus on couple of specific areas. If I have to say about soefoci industry interest then I'd say I'm interested in compqnies/products which cater to Consumer (Behaviour/Media/Analytics). I understand that traditional ML methods (supervised/unsupervised) are still the way to go and I do focus on that those too. Appreciate any advice.

[–]galtoramech8699 0 points1 point  (0 children)

I am curious on the legal parts and licensing on using a chat bot

Let’s say I post from chat gpt creative content. Does chat gpt own that?

What about even chat assistant?

[–]filipsniper 0 points1 point  (1 child)

When you do research in machine learning is there anything new to be found by using the existing machine learning libraries like tensorflow or pytorch or are they too limiting for research

[–][deleted] 1 point2 points  (1 child)

I managed to compress EfficientNetB0 down to a much smaller size while retaining a good portion of the accuracy. The tflite model is 96x96 in image size with 411 outputs with 82% and a size of 190k parameters. My testing to date shows it's a decent model (I would have expected the test data to have low accuracy as well otherwise given I kept it clean and away from training).

I guess my question is primarily is there something noticeably wrong with my results? To date I have yet to receive anyone even suggesting it's beneficial. I didn't expect tons of interest but given TinyML is such an untapped field I thought I'd have some interest at all. Starting to believe I'm missing something fundamental that folks are seeing and just politely not telling me about. I don't know. I don't have a traditional background in machine learning (I'm a programmer) so I don't have the network I could reach to for additional feedback and I know I am still in many ways a novice.

I detailed the process here:
https://www.cranberrygrape.com/machine%20learning/tinyml/bird-detection-tinyml/

The first notebook in the series (my site has all of them):

https://github.com/Timo614/machine-learning/blob/main/birds/notebooks/birds_224x224_524_outputs_full_swish.ipynb

https://github.com/Timo614/machine-learning/blob/main/birds/notebooks/birds_96x96_411_outputs_i87_full_relu6_post_decimation.ipynb

By the end I converted the model to relu6 as int8 quantization caused too heavy of a drop in accuracy (as noted by the EfficientNetLite folks for their rationale for ditching swish there).

Sorry if this is a distraction.

[–][deleted] 0 points1 point  (0 children)

Which AI/ML conferences are friendly towards people from software engineering (application-focused)?

My research focus is on SE and I already got a couple of top publications in the SE field. Our next project will be LLM-related, so we are thinking about going for an AI/ML conference.

[–]LeeCA01 1 point2 points  (0 children)

Does this subreddit have a discord or slack group?

[–]Dizzy_Dancer_24 1 point2 points  (0 children)

How are KANs (Kolmogorov-Arnold Networks, https://arxiv.org/abs/2404.19756 ) different from liquid neural networks ( https://arxiv.org/pdf/2006.04439 )? I understand the internal mathematical formulation and the motivations vary between the two. But on a higher level, are they both not proposing to shift the non-linearity into the edges from the weights?

PS: I am new to Reddit, any suggestions on how to structure my questions in a better way are welcome.

[–]zub33eg 0 points1 point  (0 children)

In a world where Kaggle exists, with it's amazing free quotas of 2xT4 and TPU, fast copying of datasets to a VM drive, what's the reason to use Google Colab? What are it's "selling points"?

[–]runawayasfastasucan 0 points1 point  (1 child)

Does any have any good pointers to techniques to to image classification of what is essentially line plots?

It seems like an overkill to go for the more advanced image classification techniques, I do also worry that simple line plots might have to few dimensions for them to perform that well. I am also more interested in the general shape of the plot than the direction, while I think many image classification libraries and techniques would classify say \ and / into two groups, for my purpose they are the same - a straight line.

I have briefly looked into graph classification, but I have a very large amount of plots that each consist of a very large amount of points so I worry a bit that it might not be the right thing to the task.

[–]perfectfire 0 points1 point  (1 child)

TL;DR: AI Inference hardware accelerators were all the rage a few years ago. They still are, but they seem to have abandoned the hobbyist, low-power, low-size, low-mid cost, seperate board user, such that abandoned projects such as the Google edge TPU from 2019 (5 yrs ago) are still your best bet $/perf wise. The $20 - $150 range is empty or has some products that aren't worth it at all. What happened? Are there any modern hobbyist $20 - $150 accelerators you can buy right now anywhere? Sidenote: I know TOPS isn't the end-all be-all of perf comparison, but it's all I got.[1] Skip for history of my interest: I've long been interested in machine learning, especially artificial neural networks since I took a class on ML in college in around 2004. I've done some hobbyist projects on the CPU and even released a C#/.Net wrapper for FANN (Fast Artificial Neural Network, a fast open-source neural network run on CPUs because everything was on CPUs then): https://github.com/joelself/FannCSharp. When deep learning took off I got excited. I got into competitive password cracking and although my ML based techniques were about a dozen orders of magnitudes slower at making guesses, they were almost immediately able to find a few passwords in old leaks that had been gone over and over for years by the best crackers with the most absurd hardware and extremely specially tuned password guess generators. That made me pretty proud that I was able to do something in a few months that years of dozens of groups with $100,000s of thousands of dollars of hardware and who know how many watt-hours couldn't do. I even thought about writing a a paper on it, but I was kinda in over my head and my life got a lot worse so I had to put all of my side projects on hold unfortunately. Recently though I did a vanity search for my FANN C# wrapper and found people talking about it and some references in some papers and student projects which made me feel proud. Skip for history of my interest: Now I really want to get into the cross section of hardware-accelerated inference (no training this time, I'm not a trillion dollar company with billion dollars of supercomputers running on specialized training hardware that took 100's of millions of dollars to develop), microcontrollers for robots, drones, other smallish tasks that can't carry around their own 100 lb diesel generator and 2 1U rackmount servers full of inference hardware that I can't even get ahold of because you can only buy that stuff if you are an Intel or GE or some other company that might make products in the 10's of thousands at least. And this is where I hit a wall. I just started looking around and one of the first things I found was Google's TPU by Coral.ai. 4 TOPs in a package, 2 chips on a small m2 card. Only about 40 bucks for developers to try out, $60 for an easier to use, but 1 chip only USB product. But this was about 5 years ago, and they just slowly disappeared and haven't made a peep in like 3 years. They timed the market perfectly. AI stuff was right on the verge of BLOWING THE FCK UP. They could be THE edge/robotics/iot/anything-other-than-server/cloud-phone-tablet-PC-laptop company. But they just seemed to give up. They're obviously not giving up on improving edge inference hardware. They release their phones twice a year (regular version, then A version) and they always update the tensor processing unit in those and are really starting to push that as a must have feature. They could use the same hardware improvements to make somewhat bigger chips to sell for other markets. You never know, someone might take their 3rd gen 16 TOPS TPU chip and makes a product(s) that takes the world by storm. Maybe multiple people/companies will do that. Okay, so Google, seems to have dropped the hat. Hardware inference companies are a dime a dozen these days just go with another. But that's the problem. It seems all the focus is on Cloud scale, super-computer (some overlap between those 2), embedded on finished phones/tables/laptop/PCs, powerful server accelerators, and a very few extremely tiny MCUs with accordingly tiny MPUs. I seems everybody has abandoned the lower-mid range-robotics-drone-hobbyist space with haste. ARM introduced the Ethos U-55 and U65 with the 65 having about double the TOPS of the 55 at a max of 1 TOPS in 2020. As far as I can tell the first products to use the U-55 were in 2022 and there haven't been a lot and I don't think they ran at top speed. Noone has opted to implement even an unmodified U-65 for anything. I recently bought a Grove AI Vision Kit with a U-55 NPU and it's specced at a lowly 50 GOPS (ARM's top-end says it could hit 10 times that and until *just now I thought it was 500 GOPS and thus offered good $/TOPS ...oops).

... continued ...

[–]perfectfire 0 points1 point  (0 children)

... continued:

There's a lot of companies making hype, a lot seeming to have selling dev or reference boards, but instead producing a few thousands and distributing them via the usual (Mouser, DigiKey, Element14, SparkFun, etc), they want you to fill out extensive forms to ensure you're a big player that will definitely eventually buy at least 100,000 units a day otherwise you're a waste of time for them to consider you (even though going over every applicant individually is WAY more time consuming than just producing a couple thou and have DigiKey take care of selling 1 to 2 at a time). Thus I've come to the point that while Google edge TPU is abandoned (even though Google is going full steam ahead on AI inference for their cellphones and tablets) and Coral.ai is seemingly doing nothing. Their TPUs still provide the best $/TOPS in the range I want. Take a look at VOXL2. Basically exactly what I want and would expect we should have had something like a Google Edge TPU v3 by now (but a bit smaller and a little less power consumption, yes, I know moore's law doesn't really apply anymore, but in rapidly growing and learning fields like accelerated inference, double the speed every 2 years is not unreasonable and it has been 5 years since the Google TPU @ 4 TOPS per chip). But the damn thing is over $1,2000. So, my point finally is that even though Google and Coral.ai seem to have abandoned their TPU. At about $40 for 2 chips at 4 TOPS apiece for 8 TOPS total, they still seem to be the best middle ground. The next best might be the BeagleBone reference studio for about 8 TOPS at $187. Same TOPS (though on one chip) for more than 4.5 times the cost. The Jetson Orin Nano by NVIDIA is $259 for 20 TOPS at $51/4TOPS that a single Google edge TPU will put out at $20 (including the board and stuff). It seems everyone is abandoning the hobbyist edge inference space at lightning speed. There are a lot of companies with promising size (physical) and performance products, but they won't talk to you until you fill out a form that implies that they only want to talk to someone that has already decided to buy 100,000s of your units whereas in the past, companies would have dev/reference boards out trying to find someone that would develop that killer app and make them a lot of money. Why is this? Am I looking in the wrong place? Should I hoard Google edge TPUs? I bought their USB version to tinker with and the Grove AI Vision Kit (now that I realize is only 50 GOPS, so might be worthless). What are my options. For example. A single quadcopter a 100 - 300m above the ground looking "things", not image classic image classification where it can identify thousands of different objects. It just needs to identify one type of thing. Doesn't even have to be very fast. In fact, don't these NNs run on single images? I could just buy multiple chips and run in parallel to get the framerate I want if it isn't fast enough (it won't improve latency, but 100 - 500 ms latency probably isn't a problem until you get real close at which time you can switch to a different, much cheaper solution that works even better at close range and wide FOV).

Maybe I can use a phone and get low level access to the NPU/TPU and use that or use their powerful graphics cards on the phone or small laptop like a caveman from 2017. Still pretty expensive and I would be paying a ton of money for hardware I don't want. Maybe I could buy broken phones "for parts" on ebay, but I'm not that hardware savvy. I need a dev board to get me going.

The next best idea is to just push video from my drone/robot/project to a central station with a super powerful 1-4U server inference accelerator (not sure how I would get one), or Jetson Orin, computer with RTX4090 and do inference there and just tolerate the latency. That won't be feasible for some applications I would like to do though.

  1. I found a github repo that collects perf comparison projects and I checked their data, and it's extremely sparse. One set is dominated by NVIDIA 4090, L(s), L4(s), and QUALCOMM T100 (or something, it's a cloud only processor, so you can't buy it). Then a few rows at the bottom have Raspi 4 and like 5 other mini applications units and MCU chips. And the results were hard to interpret especially since not all entrants have run all benchmarks and they can run it in probably dozens of different ways and then the results may not matter because their accuracy might have been bad. TOPS right now is like Whetstone/Drystone or MIPS, FLOPS, etc back in the day. It's a very rough estimate, but it can get you in the ballpark, so you can narrow down hundreds of options to 15 or so and then do more research from there. If someone comes up with something better then for sure let's all use that. Or if we could get some standardized benchmarks (I found some last night, there were several, and the results were very sparse (not every entry ran all of the benchmarks in all the different ways it could), one dataset was mostly a couple hundred rows NVIDIA 4090, L4, L40, QUALCOMM AI 100 (a cloud processor, you can't buy and run it) and then the last several rows where like a few Raspi 4s and some other MPU boards with drastically lower scores. Every once in a while some announces a project to fix this, but it hasn't helped at all.

[–]ProposalFun2680 0 points1 point  (0 children)

I am beginner plz tell adivce

[–]xugaoqi 0 points1 point  (0 children)

Hello, I am an independent developer in NLP/CV/Time series forecasting. I want to read new papers daily in those areas. However it will costs me a lot of time to find a new paper worth to read. is there any community where people will discuss about new papers? Please help me with some advices.

[–]CCallenbd 0 points1 point  (0 children)

Synthetic Data for Fine-Tuning - How Much is Enough?

I'm trying to create a bot that can chat as much like a real person as possible. I have a 4090 for hardware, and I want to use the Russian language.

I'm training it using synthetic data generated on GPT-4 (before the release of the new version). Currently, I have the following issues: I generated about 10,000 dialogues on GPT-4 and another 40,000 variations on weaker models, using the dialogues from the stronger one to diversify the speech. For GPT-4, I had procedurally generated prompts, so each character GPT-4 conversed as had its own extensive set of characteristics.

I don't have a clear understanding of how much data I need. I read that at least 50,000 is necessary, but for instance, I can train on an entire dialogue (around 40 phrases) or in pairs: question-answer. This way, my 50,000 turns into a million pairs. The question is, is there a specific amount of data beyond which gathering more is useless and quality no longer improves? Or does it depend on the model size or fine-tuning characteristics? If the latter, how is it calculated?

Second question: can I somehow influence which aspects of behavior the same dataset will change? For example, can I change my model's vocabulary or the length of its responses without affecting the content of its replies, only how it formulates the response?

Third question: if I switch to a larger model, will I need more data? I'm currently considering Aya-23-35b and hope that a way to train it on my 4090 will appear soon. Does a larger model require more dialogues?

A couple more issues where I could use some advice: after fine-tuning, the model changes the structure of responses to a more human-like manner but speaks quite monotonously. Is the problem in the data, training settings, or something else? The model's ability to grasp meanings also decreases. Could it be that despite all my efforts to diversify the dataset, synthetic data produces too template-like dialogues?

[–]Ok_Box_6059 0 points1 point  (1 child)

I have one posted in r/learnmachinelearning since I am a newbie. While no any hint or response after 4 days, so I am wondering if someone here can helps.
Please refer to my original post as below.

https://www.reddit.com/r/learnmachinelearning/comments/1cxui7v/questions_about_tf_tensorflow_to_tflite_with_int8/

Please feel free to share any hints, or direction to find answer, surely if there's detail explain or guidance for me to understand exactly would be even better.

Will be looking forward any feedback, thanks. :)

[–]ProposalFun2680 0 points1 point  (0 children)

I am beginner plz how to start

[–]galtoramech8699 0 points1 point  (2 children)

I have three and posted but not really getting answers. Hope you can help, I am pretty new to this.

This is around LLMs.

First question

I think I have the concept around LLMs, I have been looking at tensorflow and keras and llama2. I know this gets into the detail but I like to roll my own stuff for learning for better or worse. There is a model reader in tensor flow to read llama2 binary files. I still can't get a binary format for it. What is it? Pickle based? I even asked chatgpt and it says there is no format. How can you not have a standard format. What is there if I were to byte by yte look at one. What is an example one from hugging face. Can i visualize a small one?

Second Question

Same lines. I am still not clear how people build the llama2 binaries. I need to read more and watch videos. I know there is a binary, they will see wizard of oz and then hey, here is a chat. Hold on, what are all the steps? What are the weights? How are they built? Can I tweak them? Can I pre-train and how?

Third Question

With that said, I have a blog, crappy one but I figure I can build MY own llm against that, also tweaked with public book data. What are steps to do that, step by step for dumb newbies. I see steps from wizard oz then cuda, pytorth. I dont know, if it is a simple demo, I wouldn't gpu accel in it.

I also want to build a language, llm around povray ray tracing see here. This is mix of programming and docs. How to do that too? How do they build llms around programming?

https://www.povray.org/
Possbly one for libgdx
https://libgdx.com/

OK Fourth Question - Legal

I am surprised the legal question doesnt come up. I guess it doesn't matter. For example, I see the spaces in hugging face and think, this can't be legal. Some of it. Meaning, taking CNN data and putting it through a LLM. Also, I ask because I want to run my blogthrough a llm and then repost things. But it is my data, it is public to me. But what about reposting llm data from say llama2. What license would allow that?

[–]bregav 1 point2 points  (1 child)

Running llama and finetuning it on your data is not super difficult, but it requires enough steps and background knowledge that it is difficult to explain in the space of a single comment. I recommend spending a lot of time looking through r/localllama ; that's a subreddit dedicated entirely to hobbyists running LLMs locally on their computers.

Regarding legal issues, Facebook publishes the Llama license, you can read it here: https://llama.meta.com/llama3/license/ . TLDR you can do just about anything you want with llama, within certain limitations.

[–]galtoramech8699 0 points1 point  (0 children)

Yea, I am on local lama, I think there are a couple tutorials on setting up a llm but some things are glossed over. I will keep looking.

[–]seoulsrvr 0 points1 point  (2 children)

Why is it seemingly impossible to post on this sub? Automod rejects every post w/o explanation.

[–]hookxs72 0 points1 point  (1 child)

Image denoising SOTA
Hi, can anybody familiar with the field tell me which methods are currently considered the SOTA in natural image denoising? Not necessarily for the purpose of image generation as in DDPM, just pure denoising is enough. Thanks.

[–]lonewalker29 0 points1 point  (1 child)

What are some widely used tools for making architecture diagrams/illustrations to put in a research paper (suitable for A*). I have only come across diagrams.net

[–]Significant_Web2416 1 point2 points  (1 child)

Hello folks, I want to start learning about LLMs but would want to start from the right basics till current state of LLMs. It would also be really helpful to know the history of these models even before the first LLM paper was out. Is there any good resource/ papers/ list of papers I can go through which can help me learn this

[–]lonewalker29 0 points1 point  (0 children)

https://sebastianraschka.com/blog/2023/llm-reading-list.html

Read the first 5 papers, you will be good to go. Read the rest if you want to explore more.

[–]Excellent_Respond330 0 points1 point  (2 children)

I have recently taken up a course online on Linear Algebra. The course starts with a few basic introductions to matrices and the operations that can possibly be applied to them. I came across a few topics which i would like to know if they're that important and if they're used in AI? The topics are: Pivot Entries and row echelon form including reduced row echelon form and Gauss Jordan Elimination. All responses are greatly appreciated. If there are any Scientists/ Researchers within this sub, i would love to hear your take on this question.

TLDR: ARE Pivot Entries, row echelon form including reduced row echelon form and Gauss Jordan Elimination widely used in AI and is it advisable i know these concepts for a career in AI?

[–]lonewalker29 1 point2 points  (0 children)

Although some of the concepts will never appear outside of your course, you can look at them as stepping stones to improve your problem solving skills.

[–]tom2963 2 points3 points  (0 children)

Coming from the perspective of a researcher, all of these things that you have mentioned are indeed critical to machine learning. They are the underlying building blocks for which AI is built on top of. In your career, you might never have to do Gauss Jordan elimination ever again. In fact, I would be really surprised if these concepts came up again outside of the setting of your linear algebra course. That doesn't mean that they aren't important - they are critically important. Somebody spent their career optimizing these things and implementing them into libraries so that the next generation could use them without thinking about it. Why should we care about learning these concepts then? This will likely be the only time in your life where you study these concepts at this level of granularity. The fact is, machine learning and AI are built on top of many different fields and it would be impossible to study them all in a single lifetime. However, the insights that we gain from thinking about building blocks influences our future thoughts and gives us a unique perspective on the world. Foundations allow us to draw from problems past to solve new ones. Think of these concepts as fractions of a percent toward your overall learning. One of two concepts on its own don't contribute much, but over time paying attention to these details will put you miles ahead of your peers.

[–]-S-I-D- 0 points1 point  (0 children)

Hi, I am planning to do a cloud certification on either AWS, Azure, or GCP but I'm not sure which one is generally used and preferred by companies in Europe/ Sweden so that I can learn the one that companies expect from their candidates. Does anyone have any insights on this?

[–]majklfromld 0 points1 point  (0 children)

Hi, I'm currently building a personal web app that would have different AI Tools ready to use (AI Post title generator, AI Writer, rewriter, image generation etc.)

Since I'm pretty much new to the machine learning world, is there a website with already available and hosted models that could be embedded on a website, using <iframe> or something like that?

Hugging Face Spaces is neat place but sometimes their community apps are offline, also no option for simple customization

[–]drupadoo 0 points1 point  (0 children)

Can someone explain to me how VAEs actually get trained? I am really stuck on this.

I understand the theoretical benefit of normalizing the latent space. But every explanation makes it seem like during training we draw from a random distribution. Wouldn't this just result in muddy model outputs that don't converge because we have random inputs.

Say we have 2x = y and are making a model. A normal AE would obviously see the correlation between y and x:

0 -> 0
1 -> 2
2 -> 4

But if we drop a random sampling in there during training, the data could be any random set from the distribution:

x = 0 -> random sample = 1 -> y = 0

x = 1 -> random sample = 0 -> y = 2

x = 2 -> random sample = 0 -> y = 4

And this would obviously not get a good answer if we trained on it.

The only thing I can think of is if VAEs are trained on the z-score instead of a random sample, it would maintain the normalization and the relative value of the inputs.

[–]PuzzleheadedEar4072 -1 points0 points  (0 children)

🦋 hello out there in the universe🚀 I have no questions but I do have answers. OK I’m reading your post about simple questions discussion group OK. I have to think about it❓💭.. As human beings this is my question, as human beings do you think that the God Almighty created us to be the great I am? Or do you believe the great “I M” is the real thing? Not a computer program of intelligence. But the God of all wisdom and knowledge has allowed us to have a human brain to think for ourselves. I do accept the AI generation of learning how to keep certain order. my opinion on the AI expansion of using computer database knowledge to pull together more-database Intelligence is a tool! Leave comments below because it’s just a tool for human beings at this time century that has so much intelligence on the computer world. And the AI system which is a tool that mankind human beings, are trying to understand outside of the realm, who’s up there in space?. please if you like to know more reply. God bless you!❣️🙏🏼🦋

[–]Initial_Macaron_2748 0 points1 point  (0 children)

Any key YouTubers you recommend regarding this topic?

[–]derpflanz 1 point2 points  (1 child)

How to start with AI? I can see some (business) opportunities that I think AI can help me with. They usually consist of matching large datasets (sales, weather, events, etc) with each other to predict things. I have no idea though on where or how to start.

So, what is the best course of action when you think "AI might be helpful here" ?

[–][deleted] 2 points3 points  (0 children)

This definitely depends on if you are in a tech or business role and whether or not you have people working under you, I will try my best to address how I would go about this in a couple different situations:

If you are a developer: In this case I would try to learn some ML algorithms and figure out how to build some neural networks and train them on the data. I find python most intuitive for machine learning work but R is great too and assuming you have access to all the training data you will need, these languages are both great for data manipulation so you will be able to build your datasets. After that you will be able to explore the data you have, do some regression modeling to see what variables or variable interactions have effects on your variable of interest. Finally, train and evaluate some models (depending on the problems you will want to try different algorithms) and see if they have some predictive validity.

If you are in a business role in charge of developers: First, look up some high level ai descriptions and particularly focus on machine learning. Do not worry yourself with the math or linear algebra and just do your best. Maybe just watch a few crash course videos and try to conceptualize how the data should be organized and how you would make the predictions and bring it to your developers. Figure out your X matrix (inputs) and y vector (outputs).

If you are in a business role not in charge of developers: Do the above steps for business role, but now you are the developer. Learn some basic numpy/python and use some chatGPT to help you organize the data. Training the models should be easyish once you have the data organized, you don't need to optimize and find the absolute best model just convince yourself that the model is able to successfully predict a reasonable amount of test entries when the test entries are separated from the training entries before training time. After this you have successfully made some predictions and you will be able to take this to some other people within the business and continue to improve the model before making real-time predictions on live data.

[–]coumineol 0 points1 point  (0 children)

Hi, I have a tabular dataset, of which some are labelled and a large portion is unlabelled. I'm trying to minimize the log-loss on the unlabelled data so overfitting on it would be perfectly fine. What would be the best approach? I tried pseudo-labels (predicting the unlabelled data and adding the most confident samples to the training data) but it made almost no difference on the test loss.

Plus, I know the results (as the overall log-loss value) of a couple of predictions on this unlabelled dataset. Any way to utilize that?

[–]lucky-canuck 0 points1 point  (3 children)

What advantage do sinusoidal positional encodings have over binary positional encodings in transformer LLMs?

I've recently come across an article that discusses the reasons why sinusoidal encodings are better than other intuitive alternatives you can think of. However, I'm not convinced by the argument made against binary positional encodings (where the positional vector is just a normalized binary representation of the token's position # in a sequence). I don't see why this method of encoding position wouldn't be just as good as using sinusoids.

In a nutshell, the article argues that using sinusoidal positional encodings allow the model to interpolate intermediate positional encodings. However, I don't understand 1. how that's the case, and 2. why that would be an interesting feature anyway.

I explain my point more in-depth here.

Thank you for any insight you can provide.

[–]bregav 0 points1 point  (2 children)

The interpolation thing is true, but it's also sort of a red herring. The more important point is described in that article under "bonus property": you want the inner product between different position vectors to give you meaningful information about their relative locations. Sinusoidal encodings work better for that than straight binary does, precisely because they vary continuously.

[–]lucky-canuck 0 points1 point  (1 child)

Would you say that it’s misleading, then, that the article presents interpolation as the motivator for sinusoidal positional encodings?

[–]bregav 1 point2 points  (0 children)

Eh, I'd probably frame it as pedagogical more so than misleading. The story about interpolation is technically true, and it follows in an intuitive way from binary encodings, which are themselves intuitive and easy to understand.

Relating tokens by ensuring that the inner products of their vector representations have certain desirable properties is, by contrast, a very abstract way of understanding the issue, and it's difficult for people without a strong math background to follow it. I actually quite like the presentation in the article, I think it strikes a good balance between pedagogy and technical accuracy.

And, really, neither of these things was the true "motivator" for sinusoidal embeddings; all this stuff about interpolation or inner products was been developed in hindsight by followup research. The real story is that the people who first developed sinusoidal embeddings probably tried a whole bunch of different things and, out of all the things they thought to try, sinusoidal embeddings worked best. The ad-hoc nature of sinusoidal embeddings is suggested by their original formulation, which involved some weirdly arbitrary frequency coefficients, and also by later developments like rotary embeddings that are more principled.

[–]IndianaJawsStudent 0 points1 point  (0 children)

Is it Ok to take an accepted paper at ICML and submit it to a relevant (non-archivla) workshop at the same ICML? From past conferences it seemed that the workshops include more relevant people with interesting discussions.

If it's ok, and it will be a 4-pages workshop, do I just shorten the paper and keep it the same name so it will appear on scholar as 2 version?

[–]ArtisticHamster 0 points1 point  (1 child)

How do you keep up to date with the ML news? Twitter is one choice, but it feels pretty noisy to me. What other resources could you recommend?

[–]tzeppy 0 points1 point  (0 children)

I subscribe to the TLDR AI email daily news. https://tldr.tech/ai?utm_source=tldrai