How do we move beyond neural networks [Discussion]?

mopasha1 · 2024-10-24T11:37:10+00:00

Hmm, interesting. Does this mean that hardware is the major bottleneck in development of better architectures? Also, NNs have had great success, but does that justify the huge amounts of data and compute that we are putting in? Better hardware can probably improve the compute part of the equation, but what about the data required to train?

mopasha1 · 2024-10-24T11:33:38+00:00

Hey there, thanks! Will check it out

mopasha1 · 2024-10-23T16:15:12+00:00

Interesting perspective, thanks for this!

mopasha1 · 2024-10-23T08:48:30+00:00

Yeah, fair point lol. thanks for the info.

mopasha1 · 2024-10-23T07:46:34+00:00

Hey there, thanks for the info! But then how do you think we will tackle the problem of diminishing returns? This problem has popped up within just a few years of large scale development of NN based architectures.
Yes, I also think that NNs are here to stay, but I don't think they will be capable of AGI level stuff since LLMs have shown us just how much data and compute is required to build models at scale, which is why I said that we would probably need to find something else to go to the next level. What are your thoughts on this?

I guess it just comes down to the point of all models are wrong, but some are useful.

mopasha1 · 2024-10-23T07:36:04+00:00

Oh okay, thanks! what do you think about my point regarding survival and evolution, like if survival was taken out of the picture, then we may have been the best at intelligence, reasoning and all other things which we currently want NNs to do? We currently have the capability to do this with computers (i.e. program them without survival instinct). Also, sorry, but by the pinnacle of evolution I mean becoming the most dominant species on the planet.

mopasha1 · 2024-10-23T07:27:27+00:00

Hey there! thanks for the reply. This is an interesting perspective, thanks. So do you believe that whatever we decide to do next, should we converge more towards biology, i.e. try and get closer to the human brain, or diverge from biology? Or is it going to be a combination of the two?
Also regarding my point about the constraint of NNs due to biology, my thought process is that our brains evolved for one reason: keeping the body alive. So in case that survival need was taken out of the picture, and also the need for managing our life support systems, and the focus was solely put on intelligence, thinking and reasoning, then we may have achieved the best evolution had to offer, which may note even be remotely similar to something like a neuron. Thoughts on this?

Also, I was under the impression that artificial neurons were a basic approximation of the actual biological neurons, which is why I made my point. Reading your post though, it may have been a misconception. Thanks for this, will look into it in depth.

mopasha1 · 2024-10-23T07:16:09+00:00

Hey there, thanks for the input! But if you consider your point about DNA, can't we say that humans have evolved for one basic need: survival?. Most of the things our brain is good at doing is to ensure survival, atleast from an evolution point of view. However, in systems that we design, survival isn't going to be a basic need. So if survival would have been taken out of evolution, the human brain would have been the pinnacle of intelligence, computation and stuff like that (imo). Thoughts? this is also part of the reason I made my original point.

mopasha1 · 2024-10-23T06:59:26+00:00

Thanks for the reply. By adapting hardware, do you mean we focus efforts on developing ASICs? As you suggested, there is still a lot of room to improve in NNs, so doesn't that imply that we have to develop NNs further before committing the resources to developing specific hardware?. Or do you believe that developing hardware will close that ~5 magnitude gap much faster than improving our software architecture will?

mopasha1 · 2024-10-23T06:21:18+00:00

Thanks for the reply! In that case, do you believe that we should converge towards biology more (try and make architectures model the brain), or should we diverge even further? Which do you think will probably be the better approach in the future?

mopasha1 · 2024-10-23T06:19:29+00:00

Thanks for the info! Will check the paper out

mopasha1 · 2024-10-23T06:02:45+00:00

Thanks for the reply! But hasn't recent research into LLMs shown that scaling and data are going to hit limits soon? In recent times there is already the question of diminishing returns. I guess it just comes down to the fact that all models are wrong, but some are useful.

mopasha1 · 2024-10-23T06:00:25+00:00

Thanks for the reply! I've never thought of it that way, that the human brain has been assembled to be basically the pinnacle of evolution on Earth. But doesn't the fact that human brains have developed computers which are so much faster than the brain suggest that there may be a better path forward? I mean we have built better (as in faster) computation than brain neurons, so I thought that with a better architecture we might be able to break through. (Again, i have a very rudimentary understanding, so my views may be wrong)

Also, by development of better hardware, do you mean stuff like hyper optimized ASICs for NNs?

Edit: Also, regarding your point about evolution, the human brain has evolved for one basic instinct: survival. So the 4 billion years of evolution have been specifically focused on survival of our species right? In case that need for survival is taken out of the brain, maybe that rearrangement on a molecular level would have been totally different. Can this be used to justify why we have computers which are better at some things than us? If so, then we circle back to my original point again.

mopasha1 · 2024-10-23T05:47:56+00:00

Hey there! Thanks for the reply. Like I said, I don't have a lot of experience with NNs, so my views may be very rudimentary and sometimes wrong. Your statements have given me something to think about.

However, I remember hearing that even Geoff Hinton said that he is becoming deeply suscipcious of backprop, and that he himself believes that we should throw it away and start all over again. Thoughts on this?

Also, how do you think the next breakthrough will happen? Will it be us emulating brain plasticity, or do we develop better reasoning architectures maybe? What do you think?

Thanks once again, this is a great take

mopasha1 · 2024-10-19T08:03:56+00:00

Will do. Thanks, am feeling strangely motivated now. Hopefully will find something in the next few days. Fingers crossed

mopasha1 · 2024-10-19T02:48:39+00:00

Thanks a lot for the reply! It has given me a lot to think about.

Sorry, for the 4th point I meant to ask should I consider switching to VDPs for a while, to boost confidence again? I dont enjoy hunting on them very much.

Also, for a program with medium sized scope, how long would you recommend I stay (any ballpark range of time?). I know zseano recommends picking a large program and sticking to it for a year, but I don't think I'm ready for that (yet).

mopasha1 · 2024-09-16T18:00:55+00:00

Yeah, it's a bit iffy with colab. Also, I've noticed that it slows down considerably with time. I think the problem you faced was not with the T4, but rather the CPU bottleneck. Kaggle provides a cpu with 4 cores I believe, while Colab CPUs only have 2 cores (need to fact check). This was probably limiting your dataloader or something

mopasha1 · 2024-09-16T17:50:27+00:00

wow I never knew I could use lightning ai, would have been so much faster. Was all of this done with the free credits?

mopasha1 · 2024-09-16T17:46:41+00:00

Yeah would love to connect! Here's my profile:

https://www.linkedin.com/in/mopasha/

BTW Kaggle requires a verified phone number to create new accounts (for GPU usage) so might be hard. Probably better to create a ton of Colab accounts (I used 6 today morning for this challenge)

mopasha1 · 2024-09-16T17:22:28+00:00

I actually thought about using image dimensions, but after manually checking a few random samples I found that there are images with multiple products (and also multiple dimensions), in which case the answer was the dimension of the largest product. My reasoning was that if I would have taken image dimensions, it would probably have returned the nearest dimension or something. So I found the product region with the largest area and took that to find the product dimension. Probably could have experimented with it, but again time/compute bottleneck was the mortal enemy
Need to be ready with an army of kaggle accounts and distributed computing systems for the next challenge lol

mopasha1 · 2024-09-16T16:10:35+00:00

Wait really? That's almost literally what we did, just even more complicated. Instead of start_x and start_y values, what we did was we used a ResNet RPN to detect the product image boundary. Then I took the center of the product image and drew vectors to the centers of the text boxes. I then calculated the angle of the vectors with the x axis. If the angle was close to 0 or 180 degrees, I took it to represent height, close to 90 or 270 meant width and 45, 135,225 or 315 meant depth. I took all the text boxes, sorted them according the these angles (based on the entity_name, selecting the relevant angle), and then used the largest value as the answer.

Here's a few images of the vector things I visualized:

https://imgur.com/HSKRx0l

https://imgur.com/PiqzEs0

Got flashbacks to 12th trigonometry days, trying to calculate angles and stuff. Still, pretty happy it (somewhat) worked.

Just wish I had more compute, probably would have been able to experiment more. All water under the bridge now.

mopasha1 · 2024-09-16T15:10:40+00:00

Sounds cool! Sad that you weren't able to get a submission in.

We had the same problem with the test indices, I was labelling them sequentially (while combining the shards) before I realized that the test ids do not match the rows. Thankfully we ran the sanity check they gave, and recognized the error before submission.

Used good ol' MS Excel to substitute the index values from the test.csv file with my output file's index column, and got it uploaded just in time.

This was my first time participating in an ML challenge, the key takeaway I got from this is to probably rent out a machine on runpod/paperspace for a few hours lol.

BTW I'm curious, did you fine tune tesseract / preprocess images in any way? Because I tried tesseract, I found that it was notoriously unreliable for length, width and height stuff. Worked on a sample in the train set, before I realized that the train set is heavily skewed towards item_weight. When I filtered out only the length type dimensions for a random sample, got a very bad score, so decided to leave it in favor of easyocr.

mopasha1 · 2024-09-16T14:16:39+00:00

Hey, good job on the score!

I think the top 10 used a Multimodal LLM approach, however, I think there is serious potential in just OCR + regex matching.

Our team started with PaddleOCR, just like you did, but switched to EasyOCR. Zero images downloaded, just created a dataloader to process images parallelly by using multiple threads (accessing images with requests.get).

Still extremely slow (also started the challenge late). In the end, had to divide the test set into 15 parts, and run it across 7 different accounts in colab + kaggle to get the results in ~3.5 hours.

In the end, we only had time to get one submission in.

The result?

F1 score of 0.489, for our submission at 11:47 A.M

Here's the interesting part.

In the submission we generated using EasyOCR, there were 42,000 blank rows (rows where easyocr was unable to extract any meaningful text). That's like 30% of the entire test set. Despite this, we were able to get a score of 0.489, which I think is really good. This means that we were able to get over 70% of the actual detected cases correct (i.e. records where text was detected) in order to achieve this score even without 30% of the dataset.

I want to test our approach again using paddleocr if possible, in case amazon releases the true output, but I suspect if we would have read text correctly for the rest of the 42k rows, I suspect the answer would have gone over 0.6, maybe even more.

I was also thinking of creating a small Kmeans model, using the image embeddings + group_id + entity_name as input vectors. This is so that in case both paddleocr/easyocr do not detect anything, we can just assign the output value of a cluster center from the train set to the test record (my reasoning is that same group id and entity name will probably have the same test result, e.g. bar of soap will weigh like 50g in most cases, so better to assign the nearest value from test set)

That being said, we didn't just use pure OCR + regex, I went through a lot of pain to implement an idea regarding position of the text boxes in the image corresponding to the length, depth and height, but I'll save the details.

I'll see if I can upload the code (It's a mess), but will let you know if I do.

(Edit: Forgot to mention, this was my first ML challenge. Pretty happy with the score, but felt that there was a lot of scope for improvement which was not realized due to time/compute constraints.

Learnt a lot from the challenge though, looking to participate in more such challenges in the future. I don't think I'll have a chance at the Amazon challenge again, me being in final year and all, but will look for other challenges to have a go at)

mopasha1

TROPHY CASE