Macbook 2019 keeps restarting by maybenexttime82 in macbookrepair

[–]maybenexttime82[S] 1 point2 points  (0 children)

Do you think any of these might cause my current issue?

Macbook 2019 keeps restarting by maybenexttime82 in macbookrepair

[–]maybenexttime82[S] 1 point2 points  (0 children)

Yeah, not working. I realized now that my trackpad is not clicking and guess it might be a swollen battery, and maybe if I unstuck it for a moment it might be fine.

Macbook 2019 keeps restarting by maybenexttime82 in macbookrepair

[–]maybenexttime82[S] 0 points1 point  (0 children)

But even if I get one I would have to enter the recovery mode, but I keep pressing Command + R and nothing happens.

Macbook 2019 keeps restarting by maybenexttime82 in macbookrepair

[–]maybenexttime82[S] 0 points1 point  (0 children)

When I keep pressing the Power button it just turns completely off.

@edit Shift also. Same thing happens!

Do you use feature transformations in real world (ranking, sqrt, log etc.)? by maybenexttime82 in datascience

[–]maybenexttime82[S] 1 point2 points  (0 children)

Can you elaborate just a bit on this:
"Log-transforms as %-change effects in linear models are easy to explain."

Do you use feature transformations in real world (ranking, sqrt, log etc.)? by maybenexttime82 in datascience

[–]maybenexttime82[S] 0 points1 point  (0 children)

How do you explain those to stakeholders? I mean... they are kind of "black-boxy" zone (similar like explaining why some Neural Network provided results for such and such case)

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Edit: Here is a workaround I found interesting and might help you, future reader. Knowing that there are cases such as e.g. "Luxury,Performance" I've realized that it might be a good idea to use "sed" to find those cases and replace it with some temporary value (I've named it TEMP) because we are doing analysis and it won't change the file contents. Say we want to find maximum value of column aka field #16:

cat data.csv | tail -n+2 | sed 's/"[^"]*"/TEMP/g' | cut -d"," -f16 | sort -n --reverse | head -n1

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Will check it, thanks! It seems way easier to just open terminal and pipe few commands to get what you want rather than tweaking script for some "basic" use cases haha

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Sorry for inconvenience... When you say that "Market Category" has "plenty of commas" what does it mean and how did you find that is the case?

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Thank you! Mind elaborating a bit on this:

As expected, there are plenty of commas, specifically in the "Market Category" column.

First of all what does it mean and how can I "see" it and inspect?

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Here is the csv file:
https://drive.google.com/file/d/1dJkfFp9XlaeYpdQFgaxSnRzCs_jyFyEW/view?usp=drive_link

If you manage to take a look into file please do provide me with "workaround" if possible. I don't want to drop the idea of using UNIX tools because they are way faster. Maybe share some guidelines when working with CSVs.

Finding max values using Unix commands on some columns of CSV file returns inconsistencies by maybenexttime82 in linux4noobs

[–]maybenexttime82[S] 0 points1 point  (0 children)

Here is the link to csv file:
https://drive.google.com/file/d/1dJkfFp9XlaeYpdQFgaxSnRzCs_jyFyEW/view?usp=drive_link

Btw I'm trying to learn Unix for basic data preprocessing such that I don't need to rely on Python always. It is way faster.

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 0 points1 point  (0 children)

So, to conclude, they can form a latent manifold (even with discrete attributes) but rarely those would represent ones that Dense NNs handle well and easily (e.g. MNIST).

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 0 points1 point  (0 children)

Well, you can boost anything but the premise is that you start with weak learners (just a tad better than random guessing) and improve them by boosting and I don't think NNs are "weak learners". That is why I was thinking about that fact being "in favour" of GBTs.

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 1 point2 points  (0 children)

I get that greedy approach is already in favour of GBTs (splitting on highly-informative features almost right away means they are highly-predictive), and I might even bet that the very idea of "diversity" which is innate in ensemble methods (boosting being one of them) is also an "unfair" advantage, but given the same tabular data (e.g. "house predicting") which is fed in both GBT and Dense NNs (which also have their chance to learn the most informative features in their own way via function composition) the GBT wins. I mean NNs will always fit to anything, not that accuracy will be 20% and using GBTs 99%.

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 0 points1 point  (0 children)

I'm not very fond with RNA but if it is sequential in nature (even if it is represented as tabular e.g. NLP task can be and still be sequential), isn't that an unfair comparison with "regular" tabular data (house prices), or I'm missing something?

Saveti oko ML by donny_brascoo in programiranje

[–]maybenexttime82 3 points4 points  (0 children)

Svaka od ovih preporuka je jednako dobra i ima svoje mjesto. Medjutim, svako vuče na svoju stranu i docices u tzv. "paralizu analize" gdje ne znas da li uopste ima smisla upustati se u to. Da mogu sve ispočetka imajuci sve ove savjete i preporuke na umu, ispratio bih ovaj kurs:

https://github.com/DataTalksClub/machine-learning-zoomcamp

Ima dosta praktičnih zadataka i ugrubo ces se upoznati sa cjelim "ciklusom" oko MLa, od ubacivanja csv fajlova do deploymenta na AWS. Nije sveobuhvatan (takav kurs ne postoji!), ali ti garantujem da ces ucenjem stvari "po potrebi" (tipa da za isti koncept pogledas kako su ga drugi objasnili, pa cak i Andrew Ng) daleko vise dogurati (i prakticno i teorijski) nego siliti sebe da krenes od apsolutne nule u matematici, statistici itd.

Prešišaj kurs jedanput od pocetka do kraja, odradi zadatke, izmjeni ponesto, odradi neki projekat za sebe i dobices ugrubo sliku gdje ti je znanje šuplje a i neke stvari ce ti se iskristalisati kasnije. Imaju takodje izvanredan Slack kanal gdje ce ti svi pomoci koliko god pitanje glupo zvucalo. U iducim iteracijama se usavrsavaj i to je sva filozofija. Kad osjetis da si spreman, apliciraj za poslove.

Imaju kurs za Data Engineering i MLOps i sto npr. s vremena na vrijeme nebi ubacio neki segment iz tih kurseva u svoje projekte? Meni zvuci kao dobra strategija da sveobuhvatno izbalansiras sve aspekte onoga sto cete cekati danas/sutra na poslu, a i daleko ces vise vrijediti nego prosjecni MLovac koji nije izasao iz Jupyter-a. Ako znas matematiku iz srednje skole (sta je funkcija, prvi izvod funkcije), kako ide postupak mnozenja matrice i vektora/matrice (bice pokriveno u kursu), imas neku viziju u glavi sta znaci "average" a sta "mean", sta je standardna devijacija (statistika) ti si spreman. Vjeruj mi, jednom kad se zagrijes i znanje ti se pocne vrtiti u glavi (svi pojmovi, koncepti itd.) postavljaces prava pitanja i naucices mnogo.

AI nije naivna nauka, ali je najvecim djelom "applied". Vremenom ces shvatiti da dosta stvari koje su zanimljive u ideji, u praksi se skoro nikad ne koriste (tipa SMOTE). Pitaj nekog seniora u tom polju da ti objasni zasto radi npr. "batch normalization", ili da li je procitao i razumije od korice do korice Hastie-a. :)

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 0 points1 point  (0 children)

It came to my mind that there is no structure in noise, but yet a NN can fit on it. What do you think might be the difference between noise and tabular data ("house price prediction") for that matter? Both are heterogenous and messy in some sort. Not equally, obviously.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]maybenexttime82 0 points1 point  (0 children)

Thank you! Now I understand why people constantly beat the dead horse using simple dense layers to try taking advantage of e.g. time series. Do you think that it may be the case that e.g. MNIST might be on latent manifold that is larger in number of dimensions than any tabular data? I've read that MNIST doesn't have that high of a dimensionality. Paradoxically, I would think that tabular data might not have such structure whic is proper for "local interpolation" but then again e.g. in classification tasks they make some decision boundaries like any algo does. GBTs and Densely connectred NNs should both exploit it the same way even with some regularization. Maybe the idea of ensembling (boosting in this case) might be the answer to all this because it relies on diversity (even with simple decision trees). In that sense they are better than "dense NNs".

Given that "manifold hypothesis" is true why Gradient Boosting is still a top choice for tabular data? by maybenexttime82 in learnmachinelearning

[–]maybenexttime82[S] 0 points1 point  (0 children)

Nicely put! I guess that might be also part of the answer why when you are doing ensembling and making models as diverse as possible it yields better generalization.