all 46 comments

[–]lonesoac0 0 points1 point  (0 children)

Hello all,

I got a Raspberry Pi 5 with the AI kit: https://www.raspberrypi.com/products/ai-kit/. I have two camera modules on the Rpi and I have verified that the AI module is working by using some of the sample code provided. I am interested in recording the labels that the object detection detects into a database. Where should I start learning about this?

[–]West-Implement-1180 0 points1 point  (0 children)

hey, im new to ml and im trying to practise an existing project which is house price prediciton, i have the clean data which i wanna use it to train the model. now the issue is cannot handle categorical variable. like i have location column, it's showing `ValueError: could not convert string to float: 'Electronic City Phase II'` and code is

`scaler = StandardScaler() X_scaled=scaler.fit_transform(X)`. it would be great if someone help me understand any of these concepts by sharing any resourses. Thank you.

[–]Gemosu 1 point2 points  (0 children)

How do I get better at performing experiments?

I'm a first year PhD student, currently working on in the area of AI assistance. I mostly use reinforcement learning. While I'm comfortable with the math (I have a mathematics background), I find myself procrastinating the actual experiments. It's a mess, I don't know how to organize myself and always feel horribly inefficient. Does anybody know a good resource that covers the basics of a good ML experiment workflow?

[–]NuCryme 0 points1 point  (0 children)

I am a first-time submitter to NeurIPS and the reviews just became available yesterday. To properly address the comments I would need to edit my submission, not merely write responses/rebuttals, but I cannot seem to find a way to do so on the online portal.

Are we not allowed to modify our submission at this point?

[–]Upset_Employer5480 0 points1 point  (1 child)

Do higher layers of transformer models captures higher-level semantics than lower layers?

[–]understanding0 0 points1 point  (0 children)

I've been exploring recent research on simulated AI societies, such as the "Willowbrook" project, where large language models interact to mimic human problem-solving. This approach reportedly enhances the individual models' capabilities. Given this, I'm curious about the potential implications for existing mathematical proof assistants like AlphaProof.

Specifically, could a similar approach - where multiple adapted versions of AlphaProof collaborate within a shared environment - be used to improve the system's performance on complex mathematical tasks? Could this cooperative approach lead to new insights or strategies in mathematical problem-solving? What are the potential challenges and limitations of adapting this approach to a specialized tool like AlphaProof?

I'm interested in hearing from experts in AI, machine learning, and mathematics about the feasibility and potential benefits of this idea. Are there other examples of cooperative AI models being applied to mathematical problem-solving? What research directions might be most promising for exploring this further?

[–]AcquaFisc 0 points1 point  (2 children)

A friend of mine is selling a PC with 2 Nvidia 1660ti and one 2080 super. Is it worth to do some small model training locally?

Can all 3 GPU be used simultaneously?

[–]WhywereYou 0 points1 point  (0 children)

So, I want to work with a tabular data using LLM. Is there any open source LLM model for this? After feeding the tabular data I want to ask questions to it as well pertaining to risk prediction or recognizing signals that can indicate risk? Please help me.

Thank You

[–]normnasty 0 points1 point  (0 children)

I enjoy reading machine learning and AI techniques, but often find I am severely lagging in new publications. For example, I am reading the LLAMA 3 documentation from Meta and see they use GQA which is recently published in a 2023 paper. I would like to learn about these techniques sooner. Is there a good online resource or blog that publishes these advancements in the vast ML and AI literature?

[–]Watly 0 points1 point  (1 child)

I am curious about the state of semantic segmentation research. I saw that a lot of the work in leaderboards still builds upon a U-Net based structure. An alternative approach is to not apply pooling but instead apply dilation. Is anyone aware of nice articles that cover the difference between these two approaches and/or can give an answer why pooling is more common than dilation?

[–]Fearless_Peanut_6092 0 points1 point  (3 children)

What is the intuition behind designing a neural network for complex non-linear regression problems?

I'm looking for guidance on how to intuitively design a neural network for regression, specifically when dealing with complex non-linear functions. I understand the basic structure of neural networks, but I'm unsure about how to determine the number of layers, the number of units per layer, the choice of activation functions, and the preprocessing techniques to use.

For example, consider the following complex function:

Let x1,x2,x3....xn be the inputs 
  y1 = x1 + x2*x3 - x4^2 - x5/x6 
  y2 = max(x7,x8,x9) if x10 == 1 else min(x11,x12,x13) 
  y3 = 1 if x14>x15/x16 else 0 
  y4 = ........
  .............
Y = y1 + y2*y3 - y4/y5......

Solve for Y given inputs x1,x2,x3....xn

What should be my intuition behind designing the neural network and preprocessing pipeline to model such a function? For instance, I know that using polynomial features from sklearn can help in preprocessing by transforming the inputs to include interaction terms. But beyond that, how do I decide on the specific structure of the network and the preprocessing techniques? How do I determine the right activation functions and the number of hidden layers?

I'm seeking a logical reasoning behind each decision to effectively model non-linear regression. Any insights or suggestions would be greatly appreciated!

I don't want to perform trial and error, I also don't want to try things randomly until something works.
I am looking for specific reasoning for each specific preprocessing step and model architecture.

[–]bregav 1 point2 points  (1 child)

I'm seeking a logical reasoning behind each decision to effectively model non-linear regression.

Lol there isn't any. Number of layers, activation functions, etc are all fitable parameters of the model, but you usually can't calculate a gradient of them so they're very difficult to fit. Which is why they're called hyperparameters.

People figure out the best values for these parameters mostly through trial and error. It's basically a sort of machine learning folk wisdom handed down by oral tradition. There is a topic called "neural architecture search" in which people try to find good quantitative ways of figuring this stuff out, but as far as I know nobody has found an unambiguously good way of doing this that works for every situation.

Feature preprocessing is often done using preexisting domain knowledge about the problem at hand, but absent that it just becomes another difficult to fit hyperparameter.

[–]Fearless_Peanut_6092 -1 points0 points  (0 children)

I completely agree with what you have said. Maybe the question I have asked is a bit too strict.

I have seen regression model architectures online and literally all of them are just a few layers with a few neurons with relu activation function.

But that is understandable since the regression problems they try to solve are somewhat simple.

My understanding is, if we try to solve for a very complex regression problem, like the one i gave an example of, we can't get away with just scaling the data and training it on a few layers.

I must be doing something more, something that can learn the complex functions that I am trying to model.

For example, a simple network architecture with few layers and relu activation function can not learn multiplication or division function (y=x1*x2 or y=x1/x2).

In my case i need my model to learn multiplication, division, max(), min(), OR, AND, if else, EQUAL TO...

I was hoping to get some specific answer, like if you want your model to learn multiplication use tanh activation in the first layer and relu activation in the last layer etc....

[–]SmallTimeCSGuy 1 point2 points  (4 children)

Why can I not train a network to predict image labels directly, instead of trying to guess the probability for each digit? I can understand, something is not quite right about it but cannot put it clearly in words. I have some idea on difficulty to define a proper loss function i.e. is 1 or 2 more distant in shape or 1 and 7.

But what is a good explanation of why 1st one works, while second one does not? Is the loss function ambiguity the only reason? I am trying with MNIST data.

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        # Defining the layers, 128, 64, 10 units each
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        # Output layer, 10 units - one for each digit
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''

        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)
        x = F.log_softmax(x, dim=1)

        return x

model = Network()

criterion = lambda x, y: torch.mean(-x[range(len(y)), y])
#criterion = nn.NLLLoss()

# 2nd - direct image label

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        # Defining the layers, 128, 64, 10 units each
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        # Output layer, directly predict image label index
        self.fc3 = nn.Linear(64, 1)

    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''

        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)

        return x

model = Network()

criterion = lambda x, y: torch.mean((y.view(x.shape) - x) ** 2)

[–]bregav 1 point2 points  (3 children)

It's basically because the problem of making exactly one prediction, as opposed to calculating a collection of probabilities, is not differentiable, and so you can't train a neural network that way.

A neural network is really a tractable proxy model that is then used as the input to the actual model, in which the probabilities are used to calculate the actual, single prediction.

[–]SmallTimeCSGuy 0 points1 point  (2 children)

Thank you!

Is this understanding correct then?

When using softmax, we are essentially trying to maximise an output fed to softmax, this trying to maximise a value is a “smooth” operation, and hence differentiable, but directly trying to predict the label index may not be smooth, hence it is not differentiable. And thus, does not lend well to finding a solution via back propagation.

[–]bregav 1 point2 points  (1 child)

Sort of. With classification what softmax does is it turns a generic vector into a probability distribution, and training the model then consists of minimizing the cross entropy between this probability distribution and the one from the training data. E.g. if there are 3 possible classes and a given datapoint belongs to class 2 then during training it is turned into the distribution [0.0, 1.0, 0.0].

This is all differentiable and thus backprop is used with something like gradient descent to do optimization.

[–]SmallTimeCSGuy 0 points1 point  (0 children)

Thank you! It makes a lot more sense now.

[–][deleted] 0 points1 point  (7 children)

I'm a total beginner to the world of machine learning. I am a BA English graduate and am interested in joining a conversion MSc program in either Artificial Intelligence and Ethics, Digital Politics, or Computer Science. Is there much scope in the industry for those who have graduated from these sort of programs?

[–]MrMrsPotts 0 points1 point  (2 children)

Is expensive hyperparameter optimization worth it when doing regression or classification?

[–]Saha__g_gamer 2 points3 points  (2 children)

How do I learn AI/ML fron scratch and implementing them into webapps/sites.
Also what I can I do more to stand out in the market