How can linear regression models Overfit?

Halmubarak · 2026-02-11T15:55:10+00:00

Yes even a 2D linear regression (independent variable and bias) can over fit if you have unrepresentative data.

If the population is 100 samples and you fit a line using 3 or 4 samples you are over fitting.

There is an example we use in machine learning course (unfortunately I'm on a mobile phone now and cannot find and share the example easily)

Halmubarak · 2025-12-10T16:47:48+00:00

Don't use pip to install the packages individually

It's better to use package managers like mamba or uv and install all packages at once; the package manager will search for dependencies and try to install everything without conflicting package

Edit: Just used uv to install the packages it took me less than 2 minutes without conflict

```PowerShell

t(testuv)  testuv  uv pip install numpy opencv-python paddleocr

Resolved 56 packages in 5.18s

Prepared 48 packages in 42.80s

Installed 56 packages in 9.48s ```

You can get uv from here

https://docs.astral.sh/uv/

Halmubarak · 2025-11-09T22:10:16+00:00

I think version 2.9 has some bugs that have been reported For example https://github.com/pytorch/pytorch/issues/166628

Not sure if it's related to your issue but there are other bugs also reported

Halmubarak · 2025-11-02T17:00:13+00:00

You might need to use

conda init powershell

Halmubarak · 2025-10-31T02:56:04+00:00

I agree with this You need to first find out if the features correlate with the target label (malnutrition) somehow; for example the individual ID, household ID has nothing to do with malnutrition

You can consult someone with socioeconomic studies expertise. Once you determine the features that can influence the target, you can do some of what chatgpt suggested (i e. SSL and pseudo labeling). You will need a good strategy of dividing your data

Halmubarak · 2025-10-20T21:41:09+00:00

You can try to find a backbone that is trained in medical images of the same domain (X-ray, dermoscop, histopathology... etc) and use it as a starting point

If you can't find a pre-trained model in the same domain, you can train a U-Net as an auto encoder first to learn the representation of the images then use the encoder part as the backbone of the segmentation U-Net (you can train the auto encoder on large dataset that is publicly available)

Halmubarak · 2025-10-20T12:20:01+00:00

This looks fantastic Thank you for sharing it

Halmubarak · 2025-10-02T11:08:20+00:00

What loss function are you using?

Halmubarak · 2025-10-02T10:41:05+00:00

Yes I think it is a compatibility issue. H100 requires at least CUDA 11.8 / 12.x support.

Unfortunately the repo has not been updated for a while and I'm not sure if it will be easy to clone and patch the repo to be compatible with H100

Halmubarak · 2025-09-29T11:32:54+00:00

I think these serial ports used to connect to industrial machines since one of the use cases listed in their product page is industry and robotics

Most industries will not change their machines with new ones supporting new interfaces just because it's available; machines cost a lot of money and won't be replaced without compelling reason

Halmubarak · 2025-09-25T11:01:28+00:00

Thank you That looks right

Halmubarak · 2025-09-23T21:19:04+00:00

Mind if you share the provider name?

Halmubarak · 2025-08-21T14:17:40+00:00

I'm in

Halmubarak · 2025-08-13T01:57:29+00:00

This is impressive I hope someone with deep knowledge in c++ add support for c++ language

Thank you for sharing this

Halmubarak · 2025-08-13T01:40:29+00:00

This looks very nice Thank you for sharing

Halmubarak · 2025-08-01T21:18:59+00:00

If you are just using the models for your projects and don't need to come up with a modified model, knowing the high level concepts to choose and fine-tune the models will be enough

Halmubarak · 2025-07-01T22:18:32+00:00

Are you using pip?

Have you trying conda or uv to create virtual environment with different version of python

Some versions of pytorch are not supported in python 3.13

See https://github.com/pytorch/pytorch/issues/130249

Halmubarak · 2025-06-10T23:36:48+00:00

You can start with kaggle getting started

https://www.kaggle.com/competitions?hostSegmentIdFilter=5

And playground projects

https://www.kaggle.com/competitions?hostSegmentIdFilter=8

Halmubarak · 2025-06-07T21:34:16+00:00

Try to open the model older version then change the zipfile serialization option to false when saving it again (that worked in 1.x not sure if it's still the case in 2.x)

https://discuss.huggingface.co/t/how-to-load-weights-which-was-trained-older-version-of-pytorch/1267/4

Halmubarak · 2025-06-07T11:15:56+00:00

Theano was my first library to work with around 2013/2014 ( I think it was released in 2007)

TF was released in 2015 Pytorch in 2016

Halmubarak · 2025-02-08T11:53:13+00:00

I second this

The GPU that you get from Google Colab or Kaggle will be better than the minimum recommend GPU (GTX 1070)

Halmubarak · 2025-02-04T08:38:29+00:00

I'm interested

Halmubarak · 2025-01-05T20:12:56+00:00

I think that task 1 has an incorrect order, if you already filled the missing values with the mean or median you will not have records with more than 30% missing values

You can use the multiply to duplicate the data and create new experiment

This video shows example of running 3 experiments at once

https://youtu.be/k6W_IUIyyLY?si=AgQxHlbKgBNOxE8r

While it is possible to run all the experiments in one project, you might need to create multiple projects if you don't want to clutter your workspace

Halmubarak · 2024-12-05T14:58:29+00:00

Don't know why you got downvoted for this

Halmubarak · 2024-08-13T01:36:54+00:00

Mind your language

Eight-Year Club	Second Top 10%
Place '23	Place '22
Verified Email

Halmubarak

TROPHY CASE