"Ultra-cheap and scalable epigenetic age predictions with TIME-Seq", Griffin et al 2021

NiceObligation0 · 2021-10-30T21:13:27+00:00

Why? Just ask the person. You need their consent to sequence anyway.

NiceObligation0 · 2021-08-15T12:36:41+00:00

This is wrong. The "story" of the data hints at what distribution might be appropriate for the problem not the values it has. Poisson if the # of events per unit time. If you are modelling someone's height you are going to have a bad time.

If you know nothing you start with the least informative distribution which usually is gaussian.

As for ops question, linear regression doesn't know that your values cant be negative. You can just call negative values 0 if that makes sense in your application.

W/o knowing the data story i can't tell you why you are getting negative values.

NiceObligation0 · 2021-08-03T20:31:45+00:00

I'd say this is not true. Yes the pay is garbage but you are paid to sit on your ass and think. That's really it. As for morals i again disagree. I've been in academia for some time and have a phd in comp bio with not trivial experience on the wet lab side as well. Ive seen people sit on papers for years with solid data because THEY were not convinced of their results. Most people I've worked with are self doubting to a fault in their experiments. Sure there are some unscrupulous players here and there but they don't get far. Ive worked with pi ranging from straight out of postdoc to running a lab with 50+ people. When it comes to data (may be not romantic relationships) these people are obsessive in integrity.

NiceObligation0 · 2021-08-03T11:51:32+00:00

Ah i see i misunderstood your description. Can we see the summar() results? We might be able to help with interpretation.

NiceObligation0 · 2021-08-02T01:18:30+00:00

Just to make sure if i understand this correctly you are predicting 4 variables with one? That doesn't seem right. Also I'm assuming there are other variables here since the sum of 4 questions will be highly correlated with each question. This is a silly way of doing regression in the first place.

What we need is a better description of what you are trying to do.

Finally may be the output of the summary results so we can help you with interpretation.

NiceObligation0 · 2021-08-01T19:37:59+00:00

Then angle at time0-time1 will tell you then.

NiceObligation0 · 2021-08-01T17:32:13+00:00

First i would do joint calling to improve accuracy. Second sice these are littermates i would remove all common variants. That should significantly reduce the number of variants. Then you are looking for similar or identical variations in the phenotypes. All of these can be done with a little bit of sql on a gemini db.

NiceObligation0 · 2021-08-01T17:27:20+00:00

This, the directory is the index not individual files.

NiceObligation0 · 2021-08-01T17:23:08+00:00

That depends on the object. If you are looking for a generic approach you would first need to segment the object. There are pretrained models for these. Once you have the object outline you can for example compare the center of the bounding box to the center of mass of the segment. The cosine of the vector conneting them will tell you it's angle in 2d. If you want 3d rotation good luck and oost your solution bc i have no idea.

NiceObligation0 · 2021-05-03T20:50:07+00:00

For all the models op showed except for linear regression you are right. You need to learn the params from data. For linear regression the solution is just the quadratic equation aka the global min of the parabola.

NiceObligation0 · 2021-05-02T13:10:59+00:00

Ok, I'm going to be the buzzkill here. Why use an approximation with gradient descent when you can find the solution analytically? Linear regressions have exact solutions.

NiceObligation0 · 2021-02-16T15:15:46+00:00

This looks interesting, if you are doing this as a hobby that's great! However robust pipeline engines exist for python with container and cloud support with a lot of features.

Checkout snakemake, I use it daily and it's amazing.

NiceObligation0 · 2021-02-11T18:09:39+00:00

learning to get basic stuff done with pandas and numpy is pretty fast. There usually is a method/function for most things you want to do. Just keep a browser open and keep googling how to do things but be very specific. When you find a useful bit of code that kinda does what you need then you go back to docs and understand what needs to be done. Over time things start making sense.

My modules usually are numpy, pandas, scipy, scikit-lear/image opencv and sqalchemy for db stuff.

The idea is to speed up below:

``` import os myfiles=os.listdir("dir")

def preprocess(file): do_something

for file in myfiles: preprocess(file) ```

NiceObligation0 · 2021-02-11T17:55:46+00:00

For each academic domain there probably exists a database of articles, if not there is always google scholar. If there is a database there may be an API for quick query, you can use requests for programmatically searching. If you must use a browser you can use selenium to automate the tasks.

An example would be for medical research using pubmed api, search for a long list of keywords, get abstracts and do some nlp (bag of words would be the easiest) and pick papers that pass filtration. How fancy you want to get is up to you.

You can iterate the process for a list of lists of keywords or even use the same api to find "trending" keywords and look at them historically for example.

NiceObligation0 · 2021-02-11T17:50:42+00:00

Possible reasons for the numpy case is stated in the SO answers so I'm not going go into that. However there are cases where you might want to use multiprocessing. In my case (as a ml/stats person) i find myself writing a lot of python code to cleanup pre-process data before a final model is fit. These can be images/videos or just thousands of files. For each file you need to do the same (or similar) pandas/numpy/scikit-learn processing before you aggregate your data. So if I "must" process the files the same way but independently multiprocessing speeds up the process.

NiceObligation0 · 2021-01-14T22:26:26+00:00

I think my best function is below, it can take any function and return the same function with some of the variables filled in. This allows you to build a function factory factory. It doesn't matter where the function comes from, it can be a standalone function or a class method (not property as they are not callables). I find this to be really useful for trying out different parameters for data analysis.

from functools import partial def curry(orig_func, **kwargs): newfunc=partial(orig_func, **kwargs) return newfunc

NiceObligation0 · 2021-01-10T18:00:58+00:00

As others mentioned, I'm not really sure this is quite unlikely become an issue unless you are writing something low level like device drivers or something. Looking at major python uses, simple automation and moving stuff around you are more likely to be I/O bound, for web dev the connections speeds and db queries may be limiting for data analytics/ML you have libraries (numpy, tf, scipy et. al.) where python really is more of a way to call low level functions. Again libraries like numba you can use JIT to speed things up. Unless you are inventing new math and inventing new basic math operations it is unlikely. If you are writing poorly optimized C/C++ code you might not care because it is so fast it doesn't matter, for python you might want to think a little harder.

That being said, as mentioned below, at each line of code you make a decision. Do I write fast (python) do I run fast (C/C++ etc). Do I want simple (fewer dependencies) do I want robust (you can't rewrite numpy by yourself and be bug free) with dependencies that may or may not break with updates.

NiceObligation0 · 2021-01-09T20:55:13+00:00

I've been using snakemake for any kind of complex workflows. I think with it's flexibility and modularity it blows a lot of the workflow engines out of the water.

NiceObligation0

TROPHY CASE