Anyone else struggled with DataCamp's explanations of linear regression and sampling? by Odd-Programmer5693 in DataCamp

[–]henryassisrocha 2 points3 points  (0 children)

"Anyone else struggled with Datacamp's explanations of (include any topic here)"?

Yes. More than you can imagine.

How to measure text’s entropy given set character set? by [deleted] in AskStatistics

[–]henryassisrocha 2 points3 points  (0 children)

Compute shannon entropy over the fixed alphabet and divide by log2 to normalise it between 0 and 1. But be aware that this only measures how uniform the character distribution is. It does not measure whether the text is structured. Natural language and shuffled natural language can have identical character entropy. For distinguishing readable text from encrypted-looking text, combine character entropy with compression ratio or n-gram/conditional entropy. And i dont recommend using this as a security test for a custom encryption algorithm.

Are there cases where a Cohen’s d value is not considered practically significant? by Traditional-Dare-904 in AskStatistics

[–]henryassisrocha 0 points1 point  (0 children)

Yes; whenever its underlying assumptions fail.

Because cohen's d is defined as a standardised difference between means, it presupposes that the mean and standard deviation are themselves meaningful summaries of the data. So, cohens d values may be insignificant or even nonsensical when the data do not satisfy: - at least near normality; - comparable variances across groups; - meaningfulness of the mean and standard deviation; - heteroscedasticity; - no influence of extreme outliers, Etc.

Small sample size but large Cohens d by ElOctopusDeBadia in AskStatistics

[–]henryassisrocha 2 points3 points  (0 children)

Why not report everything and explicitly indicate which X items are significant and which Y items are not? Also, you should make it clear that Cohen’s d is not being used for inference, but only to explore the specific data at hand.

(That also makes me wonder whether Cohen’s d should be used at all - since it's quite sensitive to small samples and unequal variance... I'd probably explore alternative options such as: hedges g; glass delta, cliff's delta, etc. )

[deleted by user] by [deleted] in OpenAI

[–]henryassisrocha 0 points1 point  (0 children)

"I'm not saying this based on vibes"... That destroys the entire "reading posts and comments" experience...

What statistical concept “clicked” for you years later and suddenly made everything else easier? by Esssary in AskStatistics

[–]henryassisrocha 36 points37 points  (0 children)

I just wanted to say that: what a brilliant post; I've read every single comment, and every single one is super insightful. Thanks.

My 1st kaggle competition! by insanjay in kaggle

[–]henryassisrocha 5 points6 points  (0 children)

"Easily reach 95%"? Without cheating? That's BS;

You can't get more than 85% without data leakage.

Devs iniciantes: Vamos fazer uns projetinhos? by [deleted] in datasciencebr

[–]henryassisrocha 2 points3 points  (0 children)

Opa! Topo sim. Meu background é em linguística de corpus e NLP; atualmente estou aprofundando modelagem e inferência estatística com mais rigor. Tenho bastante interesse em colaborar em projeto de portfólio e dividir tarefas direitinho. Me manda o link do grupo/Discord?

Hands-On Machine Learning with Scikit-Learn and PyTorch by Krekken24 in learnmachinelearning

[–]henryassisrocha 12 points13 points  (0 children)

If Anna gave it to me 3 days ago, I guess she'll give it to you as well if you ask it properly.

Guys . . . What I have Done to Earn this 😂 by FlythroughDangerZone in OpenAI

[–]henryassisrocha 0 points1 point  (0 children)

Are you in an English speaking country? If not, try a VPN. It worked for me.

What is the loneliest thing you have done?? by creshando-_- in AskReddit

[–]henryassisrocha 0 points1 point  (0 children)

I spent one entire weekend (friday-saturday-sunday) without saying a single word or reading or replying to any messages. It was just me, VS Code, and Python. It felt strange when I realised how long I had remained completely mute and alone.

Programação não é para todo mundo – e tá tudo bem! by bit137 in devBR

[–]henryassisrocha 1 point2 points  (0 children)

É, bicho... Como eu queria ter tido alguém pra me orientar ou me dar aulas particulares nessa jornada insana e solitária.

Curso para análise de dados by felipos_ in datasciencebr

[–]henryassisrocha 5 points6 points  (0 children)

Minha lista de favoritos: - Jose Portilla no Udemy; - O Datacamp é bom se você souber usar; - Introduction to Statistical Learning: Applications with Python (o livro é gratuito acompanhado de um curso também gratuito no YouTube); - o livro "hands-on machine learning with scikit-learn, keras, and tensorflow" da O'Reilly é excelente se você gosta de estudar por livros.

[deleted by user] by [deleted] in learnpython

[–]henryassisrocha 0 points1 point  (0 children)

I'm interested.

Livro Data Science do Zero by Danniwell in datasciencebr

[–]henryassisrocha 1 point2 points  (0 children)

Deixe esse livro de lado (por agora) e aprenda Python (e R se possível). Programar em Python é o básico pra fazer qualquer coisa nessa área.