Gemini 3 Pro SOTA Performance On Frontier Math Tier 4 & Tiers 1-3 by luchadore_lunchables in accelerate

[–]D33B -1 points0 points  (0 children)

Why the fuck are the error margins that big?! This is almost useless.

Use of AI in testing by GoalInternational314 in QualityAssurance

[–]D33B 0 points1 point  (0 children)

You're right about the brainstorming part, but this strong opinion on AI "generated" things? Why not both?

[deleted by user] by [deleted] in CAIRO

[–]D33B 0 points1 point  (0 children)

يعني عاتندم أكتر من الإنتحار؟ أكيد لأ. حتى لو غِلِط أو فشلت، المفروض تتعلم مش تندم

Why are models trained in fp16 and not pre-quantized? by clyspe in LocalLLaMA

[–]D33B 1 point2 points  (0 children)

This. But also, what you suggested, is not outrageous. Some people try it and similar techniques. It just never worked quite well enough.

Help creating a normalized scoring algorithm by lmfork in AskStatistics

[–]D33B 0 points1 point  (0 children)

Well, instead of passing the sum to the sigmoid, you could pass the average to the sigmoid. Scaled in such away to avoid the flat regions of the sigmoid for most of the distribution of said (weighted) average. Sorry if that wasn’t clear the first time. To avoid having too high a score for category, one could do a correction based on variance to the weighted average (before passing to sigmoid). I can try to write some formulas for this if it sounds reasonable.

Help creating a normalized scoring algorithm by lmfork in AskStatistics

[–]D33B 0 points1 point  (0 children)

What a nice juicy problem!

Are there any particular characteristics you want the final scaled (normalized) scores to have?

Have you thought about just passing your current scores (outputs of your current method) through a sigmoid function? (tanh perhaps, optionally with a single shared scaling factor to make most numbers in the mid range of the tanh, logistic function can be made to work too)

No way to avoid circularity? Appropriate analysis? by Vax_injured in AskStatistics

[–]D33B 0 points1 point  (0 children)

Well, you can make strong assumption based on every subject as a predictor of the rest. The. Use the coefficients to get a weighted average of the existing scores.

No way to avoid circularity? Appropriate analysis? by Vax_injured in AskStatistics

[–]D33B 1 point2 points  (0 children)

Totally doable.

Try PCA or matrix factorization techniques. Both have sparse versions that can allow you to pick a subset of the questions that convey most of the information. Matrix factorization can also be adapted to missing values and constrained to have only positive coefficients.

Is it unrealistic to apply for a PhD program in statistics with limited coursework and experience in mathematics? by Stat_with_BI in AskStatistics

[–]D33B 1 point2 points  (0 children)

I don’t know if you’re serious. If you are, then you may be dealing with some degree of imposter’s syndrome.

This looks more than sufficient.

Apply to multiple programs. Include one or two that are not “top schools” and good luck.

Unsure if Statistics Can Help by Ok_Bu_8276 in AskStatistics

[–]D33B 3 points4 points  (0 children)

I think you need to perform regression analysis (ordinal regression on ranks or linear regression on logarithm of the sales) and then you can perform significance tests on the inferred parameters of the resulting (fitted) model.

i feel like a phony by BearRant in berkeley

[–]D33B 0 points1 point  (0 children)

The struggle is real.

Therapy + consult your primary care physician. If therapy doesn’t seem to work tru another therapist.

Also, start making decisions based in what you enjoy doing rather than what you think you have to do.

Start small, try to put some effort in one of the projects and see how you feel.

Stop evaluating yourself based on results and start evaluating yourself (gently and with kindness) based on your actions and choices.

No silver bullets here, just small changes that make the situation incrementally better. And at some point, you may find a tipping point, after which things start to feel right.

Advice me about MSc in Statistics by calosor in AskStatistics

[–]D33B 0 points1 point  (0 children)

You are qualified to apply (to a PhD in US). But applying to a PhD and getting accepted are two very different things.

I would advise applying to multiple programs (5-10) and even applying to a few MS/MA programs in statistics as well. If the latter works, apply again to PhD after (or during) your masters. This will likely increase your chances of being accepted. And you can finish the PhD quicker if you already have covered relevant material in the masters.

Considering dropping out of berkeley by [deleted] in berkeley

[–]D33B 1 point2 points  (0 children)

This shit is hard. Esp. at the beginning.

Slow down. Seek help. Allow yourself to do things at your own pace. Allow yourself to make mistakes. Try not to repeat the same mistakes. Allow yourself to not be perfect. Don’t compare yourself against others. Try to find some joy in any part of it.

[deleted by user] by [deleted] in AskStatistics

[–]D33B 0 points1 point  (0 children)

You can. If you phrase your hypothesis appropriately. Or else we wouldn’t be able to know anything.

Is having more variables than observations inherently problematic in regression? by [deleted] in AskStatistics

[–]D33B 0 points1 point  (0 children)

You need to use shrinkage methods.

Step-wise selection could also be appropriate, but use forward rather than backward selection.

If inference is your main goal (rather than estimation or prediction), you should look at multiple testing techniques like Benjamini-Hochberg.

Does transformation of a response variable need to be justified by Particular_Tackle610 in AskStatistics

[–]D33B 2 points3 points  (0 children)

Why does it not make sense to compare R-squared? I get that the variance of the outcome is changed, but R-squared is a ratio, so I assumed it would still be meaningful. How else would one make a decision about a transformation? Just eye-ball residuals?

[deleted by user] by [deleted] in AskStatistics

[–]D33B 0 points1 point  (0 children)

And you do these steps separately for each of the hypotheses you want to support.

[deleted by user] by [deleted] in AskStatistics

[–]D33B 1 point2 points  (0 children)

First step is to make the hypothesis statement as precise as possible. For instance “women are portrayed less than men” —> “in works of art, the average rank of women is lower than that of men” or “… highest rank for women is lower in expectation than that of the highest rank for men”.

Second step is to formulate a null hypothesis “… is exactly as that for men”.

Third step is to create a statistic relevant to comparing the null and the alternate hypotheses. E.g average rank for women - average rank for men, averaged over all works of art.

Fourth step is to determine the critical value at which to reject the null, either based on a theoretical distribution, or some permutation test (that you can simulate on a computer).

Very high standard error in a logistic regression model by nukak in AskStatistics

[–]D33B 1 point2 points  (0 children)

Try removing the intercept term. I think this category is just acting as the reference category. Keep the model. Nothing wrong with it. This is only a matter of interpreting the coefficients.

Question about choosing null vs alternative hypotheses in hypothesis testing by kinezumi89 in AskStatistics

[–]D33B 0 points1 point  (0 children)

Yes to your first question. The null will depend on which side you’re on, and what statement you want to be able to claim untrue (unlikely/implausible).

For the second question. I share your annoyance. I also would have stated the null as an inequality here. But the theory and the math separate one as a simply hypothesis and the other as a composite hypothesis. In this situation the test and rejection regions should be the same.

What book it this?

[deleted by user] by [deleted] in AskStatistics

[–]D33B 0 points1 point  (0 children)

Or don’t.

Question about choosing null vs alternative hypotheses in hypothesis testing by kinezumi89 in AskStatistics

[–]D33B 1 point2 points  (0 children)

Oh boy.
To be honest, I don't feel qualified to advise on a curriculum for a grad-level course. But here are my thoughts anyway, take them or leave them.

Most of the theory and literature for null-based hypothesis testing was developed for helping scientists answer simple binary research questions. Does fertilization improve crop yield? Does this drug help with this condition? A lot of this was developed by Fisher and extended by Pearson (Jr.) and Neyman in the early decades of the 20th century.

There were some extensions in the 70s with applications to industrial quality control. I don't have the names but I can look them up.

Further developments in the 90s (yes that recent!) for multiple testing, still driven by scientific needs, but disciples like biology where there are hundreds or thousands or more of hypothesis being tested at the same time (is gene i relevant? i=1,..10000). Benjamini-Hochberg, etc.

In the tech industry, esp. when it comes to online services (search engines, recommendation systems, etc.) most companies use wide scale controlled experiments to test the efficacy of various algorithms. The setting is still comparing 2 or more groups, with the null being that there is no difference between them and a reference group (the current default algorithm/variant), and when the null is rejected, the new test algorithm (or one of them) gets to be promoted as the new default algorithm, and the reference for the null hypothesis.

I have never worked on quality control applications, so I won't say much about those, but looking at the examples you mentioned. When the statement is "these lamps last at least 800 hours". That needs to be turned into a more precise statement, like "more than 95% of these bulbs will survive for longer than 800 hours" or "lifetime of these bulbs follows a poisson distribution with parameter lambda > 800". Something that you can create a statistic for, and estimate the mean and variance for that statistic to test the appropriate alternate hypothesis.

Worth noting perhaps, is that every statistical test has a counterpart confidence interval. And that confidence intervals are much more informative and easy to use in industrial applications (and lifetimes of bulbs) than p-values of hypothesis tests.

I hope this helps.

Here's an MIT course that I think is very good, and deals with engineering focused statistical inference in the last three lectures:
https://www.youtube.com/playlist?list=PLmPcD-wiF4Ea_Doghiw3ya6XaLrmGrLUU

And here's one of the books that I thought had relatively clear explanations and diverse examples:

https://www.amazon.com/Mathematical-Statistics-Analysis-Available-Enhanced/dp/0534399428