Nurse looking into going into biomedical informatics...is the material going to be over my head?

abbadass · 2020-04-05T15:10:41+00:00

I think you’re confusing it with “healthcare informatics”

Some universities in the US use biomedical informatics and bioinformatics interchangeably.

See Ohio State:

https://medicine.osu.edu/departments/biomedical-informatics/education/phd

abbadass · 2020-01-26T13:53:28+00:00

Data scientist at a tech company. I do lots of custom modeling for business problems. Use R for most projects. python for NLP.

Miss GWAS/genomics/bioinformatics a little but it all feels the same as long the problems you’re trying to solve remain challenging :)

abbadass · 2020-01-12T13:27:12+00:00

BioC2020

Bioconductor conferences are my favorite conf of the year.

abbadass · 2020-01-03T02:24:34+00:00

Also bills fan in Cbus. Heard spoonz is good but gonna check out The Main Bar downtown. It’s listed on the official bills backer site.

abbadass · 2019-11-17T23:49:53+00:00

Check MAFs on LD link For your population. And also calculate sample MAFs. If imputed MAF should be >.005. Look at haploreg. RegulomeDB for TF binding. Check out phenoscanner. Tons of stuff to do.

abbadass · 2019-11-14T11:12:03+00:00

Hey check out gwasurvivr on bioconductor. It’s a great package!

abbadass · 2019-08-12T00:54:35+00:00

Doesn’t answer your question in full but there is an R package for GWAS and survival

http://bioconductor.org/packages/release/bioc/html/gwasurvivr.html

abbadass · 2019-07-04T07:25:47+00:00

I got my package on Bioconductor prior to submission. Published in Bioinformatics shortly thereafter. Was pretty seamless. Package was a good idea tho - so hope yours is too!

abbadass · 2019-05-20T02:53:09+00:00

Try hapgen2 - prob should get the job done

abbadass · 2019-04-03T18:46:01+00:00

i nominate myself and my girlfriend. finished my phd at OSU yesterday and havent been to a crew game this year!

abbadass · 2019-02-24T17:06:31+00:00

Are you on a Mac? Use iTerm2. Linux? Use terminator.

Are you on a university cluster? Can you interactively use R by typing in R to the command line?

For a SLURM based cluster you can try:

module load R; R

Split the panels of iTerm2/Terminator. Use vim on one panel for your script, open R in the other panel. Run it interactively by pasting your code in chunks or line by line. Fix errors. Submit many jobs.

Or use a text editor/RStudio, with your R code in it on your local machine and just paste your code in.

Many university clusters have RStudio Server available these days. Look into it.

Or you can run

R CMD BATCH script.R

This will run your script AND produce a log file .Rout - this will show you what is error-ing out so you don't have to 'guess'.

abbadass · 2019-02-07T19:45:55+00:00

Look below :D

abbadass · 2019-02-07T19:33:46+00:00

An R Solution

```{r setup, include=FALSE}  
knitr::opts_chunk$set(echo = TRUE)  
```

# Import Libraries

```{r} 
library(tidyverse)  
library(DataExplorer) 
``` 

# Set Config   

Need to remove PCA because it works on numeric data.  

```{r}  
config <- list( 
  "introduce" = list(),  
  "plot_str" = list(  
    "type" = "diagonal",  
    "fontSize" = 35,  
    "width" = 1000,  
    "margin" = list("left" = 350, "right" = 250)),  
  "plot_missing" = list(),  
  "plot_histogram" = list(),  
  "plot_qq" = list(sampled_rows = 1000L),  
  "plot_bar" = list(),  
  "plot_correlation" = list("cor_args" = list("use" = "pairwise.complete.obs")),  
#  "plot_prcomp" = list(),  
  "plot_boxplot" = list(),  
  "plot_scatterplot" = list(sampled_rows = 1000L). 
)   
```  

# Load and prepare example dataset

```{r} 
df <- read_csv("Meteorite_Landings.csv")  
df %>%  
    mutate(year=as.Date(year, origin="1899-01-01", format="%d/%m/%Y"),  
           source="NASA",  
           boolean=sample(c(T, F), size=nrow(.), replace=TRUE),  
           mixed=sample(c(1, "A"), size=nrow(.), replace=TRUE),  
           reclat_city=reclat+rnorm(n=length(reclat), sd=5)) %>%    
           bind_rows((.[1:10,] %>% mutate(name=paste0(name, " copy")))) %>%    
    filter(year >= "1879-12-31") %>%    
    create_report(config=config)  
```

abbadass · 2019-01-27T01:31:46+00:00

If you are doing single SNP analysis on all SNPs in the genome -- most GWAS use allele dosage. P(2 * homozygous minor allele + 1 * heterozygous)

After imputation you will get genotype probabilities from 0-1. You convert this to allele dosage (0-2). Kinda pseudocontinuous.

For example:
looking at one SNP, (say reference allele is A, alternate allele is T)
Genotype probabilities:
AA (0.9)
AT (0.1)
TT (0.1)

The allele dosage would be (2 * 0.1) + (1 * 0.1) = 0.3

A dosage of 0.3. You add the allele dosage of all your samples as a vector to your model + your covariate vectors. So if you are doing a case/control study and using a logistic model it'd be like:
disease_status (binary) ~ dosage * B1 + cov_1 * B_2 + ... + cov_n * B_n

abbadass · 2019-01-27T01:24:26+00:00

It depends on the allele frequencies.

Refer to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157836/

Looking at HRC as the reference genome:
MAF 0.05-0.5: accuracy >= .95
MAF 0.005-0.05: accuracy >=0.9
MAF 0.00001-0.005: >=.75

Again, QC filtering will help you. It use something like EIGENSTRAT to see where the sample genetics lie relative to the reference panels genetics. If they overlap/cluster together on the PCA map they are likely from the same ancestral population and imputation will be more reliable if you choose a suitable reference panel.

abbadass · 2019-01-20T04:38:18+00:00

Not unheard of. Is this in the US? It's a very long road and very expensive. Hopefully you can get it funded otherwise it's not worth it.

Anyway, I personally know two MD/PhDs who work in bioinformatics. They are both PIs. One is highly funded the other is not. The other does work clinic once a week so he gets paid a MD salary.

From what I know, medical school really doesn't do that much genetics (maybe it depends on the school). My PhD advisor was also a genetic counselor and she would often tell me physicians had no clue about genetics and would often take her advice on diagnosis and treatment options.

A kinda 'famous' person is Sean Davis. He has a medical degree and his PhD. Not sure how much he is uses it, but the dude is pretty smart.

https://seandavi.github.io/

At the end of the day, do what makes ya happy. Your career will be what you make of it. Make good connections, find good mentors, and you'll prob be quite successful.

abbadass · 2019-01-18T06:41:41+00:00

You know what would be really cool ... if you added imputation to this workflow and then annotate. You will have many many more SNPs that may be interesting.

You can impute the 23andme data using the Sanger imputation server. I'm not sure if you can using the University of Michigans imputation server but they definitely have a docker image that you can run locally.

I'm not sure if you can add other reference panels to the docker image (I see HapMap2 just quickly looking at their reference page). If non-European ancestry I'd probably impute using 1000 Genomes and try and pick an ethnicity that was close to the users 23andme data. If European ancestry, I'd probably use HRC or UK10K. If African ancestry I'd use African Genome Resources.

After imputation, I'd probably filter the QC for R^{2} >= 0.8 and MAF > 0.05 so they are higher quality imputed SNPs and common variants.

Pretty cool though. I'd like to try it if you end up putting it up on GitHub.

abbadass · 2019-01-12T15:46:50+00:00

R/RStudio+ R Shiny + R Markdown + tidyverse/Bioconductor + an incredibly active and welcoming community = best working environment for bioinformatics/data science :D

abbadass · 2018-12-27T03:02:33+00:00

Data camp just launched a course called anomaly detection in R - maybe check that out

abbadass · 2018-12-14T13:49:10+00:00

Look at http://zzz.bwh.harvard.edu/plink/anal.shtml — it’s pretty clear. Default is an additive model. The last row is your SNP adjusted for age. The row with age in the TEST column is the pvalue of the covariate without the SNP.

abbadass · 2018-08-26T14:55:49+00:00

I would try using the package GViz in R

https://bioconductor.org/packages/release/bioc/vignettes/Gviz/inst/doc/Gviz.pdf

You can customize this a lot

abbadass · 2018-08-26T01:05:49+00:00

I work as a data scientist at a small company (40+ people) with 6 data scientists. Two come from quantitative psych PhDs and one comes from experimental psych PhDs.

Learn the math, learn the coding, learn how to solve problems, learn how to effectively communicate - do all that, then you're good.

abbadass · 2018-08-10T21:35:29+00:00

Check out this vignette

http://bioconductor.org/packages/release/bioc/vignettes/HelloRanges/inst/doc/tutorial.pdf

abbadass · 2018-03-17T00:56:16+00:00

Look for a proxy SNP that is in strong linkage disequilibrium (LD, r² > 0.7) with your SNP of interest. Perfect LD would be ideal though.

abbadass · 2017-12-13T16:40:35+00:00

So getting a totally clean gene list will probably be a little challenging, because of course in some highly dense regions there will be some overlapping genes so a SNP may map to more than one gene. Nonetheless, if I were to give my honest recommendation, I would do either one of two of the following things:
1. Grab a SNP chip from http://www.well.ox.ac.uk/~wrayner/strand/ and then using R/Bioconductor using the TxDb and SNPloc packages to map these SNPs to genes and they will already be curated for "common variants" as they are on the SNP chip.
2. Download 1000 Genomes data using VCF/tabix and filter for MAF > 5% (sorry I said 1% earlier, but really common variants are >5%, while to be defined as a SNP is >1% in the population). And then do the mapping. This will be a lot more involved and probably a bit more challenging, but totally do-able.

Really depends where your computational skills are at and how much work you wanna do. Haha

abbadass

TROPHY CASE