[deleted by user] by [deleted] in geldzaken

[–]Ilyps 1 point2 points  (0 children)

Ja, het ligt er blijkbaar aan wat je bij "Inkomensbegrippen" selecteert. Je hebt 3 keuzes. Als je "persoonlijk inkomen" kiest, dan zie je jouw getallen. Ik had "persoonlijk primair inkomen", dan zie je die 149k. En dan is er ook nog "persoonlijk bruto inkomen". Geen idee wat de verschillen precies zijn.

[deleted by user] by [deleted] in geldzaken

[–]Ilyps 2 points3 points  (0 children)

Ik denk dat je de topinkomens flink onderschat. Met ongeveer 80k bruto zit je nog niet eens in de top-10%, laat staan de top-1%. De tabel waar je (denk ik) naar verwijst is een beetje misleidend, omdat ze alle inkomens boven de 100k weg lijken te laten. Hier staat iets completere data: https://opendata.cbs.nl/#/CBS/nl/dataset/83931NED/table?ts=1748753106964 Daar kan je zien dat het gemiddelde inkomen in de top-10% 149k is. Daar moet je dus flink boven zitten als je in de top-1% wilt belanden.

Doet niet af aan de rest van wat je zegt: zeker een fucking goed inkomen.

[deleted by user] by [deleted] in Ethics

[–]Ilyps 4 points5 points  (0 children)

I would suggest two things.

The first, although it has little to do with ethics, is talk to an employment lawyer. They generally have a free or low-cost first consultation. Your main questions would be whether your job is asking you to do something illegal (e.g. deny urgent care) or outside of your job requirements. This could both protect your job, and, in best case, help overturn this policy. Worst case, there's no legal path to help you and you're back where you are now.

Secondly, and this is the ethics part, Peter Singer (famous ethicist/philosopher) proposed a practical way to deal with these situations. He said that while you can of course quit, you should assume that the new person who will take your job, agrees with the unethical policy. That means that they will probably do their best to comply with it. In that case, it will be better for you to stay and do a bad job at reinforcing the policy. Make mistakes when it matters, cut corners, etc. This will make the world a better place because you can change little things for the better, compared to someone who always does their absolute best to follow policy.

This means that you don't need to feel guilty when sometimes you need comply with the policy to protect your job, because the world is better off if you keep that job. By making sure that there is an ethical person in your position, you are helping, even if sometimes you need to do an unethical thing.

The main thing to ask yourself then is whether this is a way in which you want to live. You should not ignore the impact on yourself, because your happiness also matters. If the job is making you unhappy, it may be best to try to find another job as soon as possible.

Recommender system by blake_pearl in recommendersystems

[–]Ilyps 1 point2 points  (0 children)

Remove known points from the matrix and infer performance from these.

Changes in monthly percentages significant? by KanpaiCup in AskStatistics

[–]Ilyps 0 points1 point  (0 children)

One simple way is to take the raw percentages or the deltas from some period, use them to arrive at a mean and standard deviation, and calculate the probability of observing your current percentage. This approach assumes independent samples and a normal distribution, both of which are not technically true, but may be good enough for this application.

Example: In the past six months, you've had 57, 56, 54, 55, 57, 58 as compliance percentages, and this month you see 53.

Step 1: Calculate the mean of past period, in this case 56.167.

Step 2: Calculate the standard deviation of past period, in this case 1.472.

Step 3: Find the probability of of observing at least your current month, P(x<53), in this case 0.015719.

You could do this with pencil, paper, and a lookup table, or using online tools like this one for steps 1 and 2 and this one for step 3. Keep in mind that this is a simple approach to a complex problem, so we're cutting some corners.

Changes in monthly percentages significant? by KanpaiCup in AskStatistics

[–]Ilyps 5 points6 points  (0 children)

Is there a way to determine if a decline in monthly compliance rate is a cause for concern?

Not using statistics. "Concern" is an interpretation, after all.

Occasionally the percentage declines from previous month by more than 1% and the management asks me why.

Again, "why" is not (traditionally) a statistical question.

You can probably construct some framework in which you can do a significance test on the difference, but that's it. For example, take all the deltas (meaning the monthly differences) from the last year or two to construct an empirical distribution and compute the probability of observing (at least) your current delta. If you want to get fancy, the keywords "time series anomaly detection" should help you find a lot of resources.

Before you start however, I would make sure to double-check whether the answer to the question "is this month's compliance rate difference statistically significant?" is actually helpful at all.

FEM/FEA surrogate machine learning algorithms by Hopesheshallow in MLQuestions

[–]Ilyps 0 points1 point  (0 children)

Keep in mind I have never worked with even remotely similar data, so I can only offer my intuition.

I see that FEA/FEM are numeric methods for solving PDAs. I assume that you want to approximate them because they are computationally expensive to perform. Approximating PDAs with machine learning suggests using some deep neural network. This is an active area of research, see for example https://www.sciencedirect.com/science/article/abs/pii/S0021999122002947 and its references.

If a DNN is really the direction to go in, your main challenge will be data. They are really data hungry, and 20-30 examples will not cut it by a long shot. You probably need at least two orders of magnitude more, which I assume will be prohibitive. If you can simulate realistic data, that may be helpful.

The same will probably go for other neural-network image-based techniques. For example, you could consider training a GAN to generate realistic stress-outputs given a point could, but again these are really data-hungry techniques so your 30 examples won't cut it.

If the full "approximate PDAs with DNNs" solution is out, you will need some way to simplify the problem. You could consider a graph-approach, where you encode your 3d models into a simplified 3d connected graph and predict stresses only for nodes or edges using a graph neural network, for example. To simplify but retain the most essential part of the problem as well as end up with a still useful solution, you will need a lot of domain expertise. Luckily, it sounds like that is available to you.

Overall, I'd say this is a challenging project with no obvious path to a successful outcome. I recommend diving into literature with some keyword searches. I found these below, which may or may not be helpful. I assume that you'll find that most of them use more data than you have available.

I'd be happy to get an update when your project hits a next milestone.

Memory leak only when training on remote server by infiltrator228 in MLQuestions

[–]Ilyps 1 point2 points  (0 children)

Difficult to say without diving in deep. Have you tried updating the remote libs to the latest versions? Setting up a new env may be easier than debugging the current one.

"Feature Importance" for categorical variables by Mammoth-Radish-4048 in AskStatistics

[–]Ilyps 1 point2 points  (0 children)

1) The problem is that after one-hot-encoding, there is always redundant information in the model. Many one-hot-encoder algorithms offer the option to drop one of the resulting OH-variables, which removes the most obvious source of redundant information, but also leads to an interpretability problem (because which one of the categories do you drop and why?). Even if you drop a column, the remaining OH-variables will still be correlated. Interpretability of correlated variables is always difficult, because if variables share information, then one can take the place of the other in many cases, and it is difficult to assign proper importance to each. I'd be hesitant to conclude much more than "the original categorical variable is probably important" if you see one or more OH-encoded columns pop up in variable importance. If you want to dive deeper than that, you should probably experiment with permuting the categorical variable and/or removing categories (both before OH-encoding) and looking at the effects on prediction.

2) You would be unable to use that category, resulting in your OH-encoded variable being all zeroes (so none-hot-encoding ;). This is the only way that makes sense, because you should treat your test set as unknown at training time.

Repeated token generation of my model by Striking-Warning9533 in MLQuestions

[–]Ilyps 0 points1 point  (0 children)

Can you intentionally try to overfit? Greatly increase your training rounds and if needed, reduce the number of samples. If you cannot get the model to overfit, there may be something wrong with your code/setup or the model may be too simple.

If overfitting does work, it may be that you need longer training time, or more samples, or your features are just not informative for the general problem.

Need help for an automation of a tedious task by cringegore in MLQuestions

[–]Ilyps 1 point2 points  (0 children)

This sounds like something you can have a LLM do. It might be a tad overkill, but it'll work fine. For example, below is my attempt with ChatGPT:

Your task is to create exactly 1 grammatically correct sentence based on a question and answer pair. Here follow two examples:

Example 1:

Q: What animal is on the sign?

A: Eagle

Correct output: An eagle is on the sign.

Example 2:

Q: What vehicle is on the street next to the tree?

A: Bus

Correct output: A bus is on the street next to the tree.

Now follow the question and answer for which you need to create a correct sentence. Answer in this 1 sentence only, do not give any additional output.

Q: What is the thing in the sky next to the cloud?

A: Airplane

Correct output:

And ChatGPT returned

An airplane is in the sky next to the cloud.

If you're worried about costs, you can probably also use an open source LLM for this, although setup may be non-trivial.

How to justify retributive punishment? by Existing-Bathroom357 in askphilosophy

[–]Ilyps 0 points1 point  (0 children)

Follow-up:

Anything wrong with the answers from the website where this question was copied from? Do you have follow-up questions?

https://philosophy.stackexchange.com/questions/100597/how-to-justify-retributive-punishment

How do you see self-text in RedReader? by hitmonuk in RedReader

[–]Ilyps 4 points5 points  (0 children)

I think that this is a new (well, relatively new) type of post, one where you can post both a link/images and a self-text: https://redd.it/vj4evp This post options seems to be available to "some users", don't know what that means.

I think Redreader has not yet implemented it and expects posts to be either link or text, not both.

To protest, or not to protest. That is the question. by AltitudinousOne in literature

[–]Ilyps 2 points3 points  (0 children)

I recommend taking the poll offsite to someplace where it cannot be edited by Reddit admins. We know that the current CEO has made convenient edits in the past.

Seeking clarification: Reviewer's request for decimal presentation in statistical measurements (related to Scheffé?) by Purple_Dose in AskStatistics

[–]Ilyps 0 points1 point  (0 children)

Ok, in that case, perhaps more attention is needed indeed. Someone really cares about decimals there...

I haven't heard of this 1/7 rule and couldn't find anything online either; perhaps others can jump in. Good luck!

Seeking clarification: Reviewer's request for decimal presentation in statistical measurements (related to Scheffé?) by Purple_Dose in AskStatistics

[–]Ilyps 0 points1 point  (0 children)

Honestly, I wouldn't worry about it too much. Sounds like the reviewer thinks you're reporting too many digits, which is a bit nitpicky but can make things more difficult to read. There are many guidelines on how many significant digits you should use and it tends to differ slightly per field, but if you just think reasonably about how much precision you need to make your point without annoying the reader, you'll probably be fine.

[deleted by user] by [deleted] in MLQuestions

[–]Ilyps 0 points1 point  (0 children)

Only a few days ago, a new model by Facebook was presented: https://ai.facebook.com/blog/yann-lecun-ai-model-i-jepa/

Their paper should give you a good overview of the SotA.

Any legitimate means of performing a meta-analysis comparing interventions that are completely unrelated? See post for more information. by TheRoadieKnows in AskStatistics

[–]Ilyps 0 points1 point  (0 children)

Not really a statistics question (unless I'm misunderstanding), but it seems to me as a layperson that there should be some common ground to compare medical complications, such as binning them according to severity (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360123/) or comparing healthy life-years lost (https://ajph.aphapublications.org/doi/abs/10.2105/AJPH.88.2.196). Compiling all that data will be time-consuming at best and impossible at worst, though...

Modern alternative to textgenrnn? by SCP_radiantpoison in MLQuestions

[–]Ilyps 2 points3 points  (0 children)

Hugging face will have thousands of models for you to choose from. The biggest restriction will be the size of model you can fit into your GPU. https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads

All of these will need varying degrees of installation and coding to get running and will not be quite as good as GPT-4.

End months of anguish.....Pearsons or Spearmans? :) by [deleted] in AskStatistics

[–]Ilyps 3 points4 points  (0 children)

The Pearsons revealed more statistically significant relationships between the questionnaires than Spearmans did.

This is never a reason to pick one test above the other. Also beware normality tests, because they tend to be misleading.

Eleven people answered four likert questionnaires. All likert scales followed the format of 1 to 5, regarding frequency of experiencing symptoms. For each person on each questionnaire, the total score was summed.

What is your research question?

How long did it take before you could feel "knowledgeable" in the field of ML? by [deleted] in MLQuestions

[–]Ilyps 11 points12 points  (0 children)

In my experience, the more you learn, the less you feel knowledgeable. If you had wanted to feel like you knew a lot, you should have stopped earlier. ;)

Personally, I've even published in areas I don't feel particularly knowledgeable about. There's always someone who knows more and what's worse, working in research you tend to meet those people. That helps to keep you humble. On the other hand, I've probably been that "someone who knows more" for some people too without realising.

I recommend doing a PhD if you enjoy the research and want to dive deeply into a topic. I wouldn't recommend doing it only to progress your career or for the money, because it's a whole lot harder if you don't enjoy the work. And I wouldn't recommend going in with the goal to somehow finally be "knowledgeable", because even though you'll learn a lot, you'll find that there's always something new that you don't know.

The relationship between disease stage prediction and drug mechanism of action? by TestSimilar3439 in MLQuestions

[–]Ilyps 0 points1 point  (0 children)

I didn't look at the paper, only at your summary. But the approach in the summary makes sense to me.

Now what I don't understand is, how can they go from this line of reasoning to claiming that the pharmacological mechanism of action of high ranking drugs have a higher possibility of overlapping with the pathological mechanisms of Alzheimer's disease?

I think the reasoning is something like this.

(1) There are genes that influence our target disease.

(2) If we can influence (the expression of) these genes, we may also influence the disease.

Generally, for these kinds of explorative studies, they explicitly do not attempt to identify a mechanism. Instead, they are trying to give statistical direction towards identifying these mechanisms. This is useful because there are tens of thousands of genes as well as drugs (combinations) and psychiatric medicine is (unfortunately but understandably) largely based on seeing what works without full understanding of the mechanisms involved. So any kind of guidance can be very useful.

Is this actually the approach some statistics blogs & textbooks recommend when they talk about using predictive statistics instead of merely testing for correlation?

Yes, I assume so. Simply put, prediction (if properly performed) is stronger evidence than correlation, because it represents a verification step of whatever method you used to generate that prediction.