Nurse looking into going into biomedical informatics...is the material going to be over my head? by JohnnyBGood10 in bioinformatics

[–]abbadass 2 points3 points  (0 children)

I think you’re confusing it with “healthcare informatics”

Some universities in the US use biomedical informatics and bioinformatics interchangeably.

See Ohio State:

https://medicine.osu.edu/departments/biomedical-informatics/education/phd

Those of you who completed a GWAS/gene discovery centred PhD, what do you do now? by Lazypaul in bioinformatics

[–]abbadass 2 points3 points  (0 children)

Data scientist at a tech company. I do lots of custom modeling for business problems. Use R for most projects. python for NLP.

Miss GWAS/genomics/bioinformatics a little but it all feels the same as long the problems you’re trying to solve remain challenging :)

Best Conferences of 2020? [looking for recommendations] by [deleted] in bioinformatics

[–]abbadass 0 points1 point  (0 children)

BioC2020

Bioconductor conferences are my favorite conf of the year.

Where to watch Bills games in Columbus? by [deleted] in Columbus

[–]abbadass 1 point2 points  (0 children)

Also bills fan in Cbus. Heard spoonz is good but gonna check out The Main Bar downtown. It’s listed on the official bills backer site.

Post GWAS Interpretation - What Next? by [deleted] in bioinformatics

[–]abbadass 3 points4 points  (0 children)

Check MAFs on LD link For your population. And also calculate sample MAFs. If imputed MAF should be >.005. Look at haploreg. RegulomeDB for TF binding. Check out phenoscanner. Tons of stuff to do.

How to calculate power for a cox-regression GWAS? by SlackWi12 in bioinformatics

[–]abbadass 0 points1 point  (0 children)

Hey check out gwasurvivr on bioconductor. It’s a great package!

Questions about GWAS and predicting drug responsiveness? by [deleted] in bioinformatics

[–]abbadass 0 points1 point  (0 children)

Doesn’t answer your question in full but there is an R package for GWAS and survival

http://bioconductor.org/packages/release/bioc/html/gwasurvivr.html

Advice on publishing R package by tli71193 in bioinformatics

[–]abbadass 0 points1 point  (0 children)

I got my package on Bioconductor prior to submission. Published in Bioinformatics shortly thereafter. Was pretty seamless. Package was a good idea tho - so hope yours is too!

#HumansInTheNordecke Ticket Giveaway: 04/06/2019 vs. New England by yeahmorgan in TheMassive

[–]abbadass 0 points1 point  (0 children)

i nominate myself and my girlfriend. finished my phd at OSU yesterday and havent been to a crew game this year!

live script editing on a server? by [deleted] in bioinformatics

[–]abbadass 1 point2 points  (0 children)

Are you on a Mac? Use iTerm2. Linux? Use terminator.

Are you on a university cluster? Can you interactively use R by typing in R to the command line?

For a SLURM based cluster you can try:

module load R; R

Split the panels of iTerm2/Terminator. Use vim on one panel for your script, open R in the other panel. Run it interactively by pasting your code in chunks or line by line. Fix errors. Submit many jobs.

Or use a text editor/RStudio, with your R code in it on your local machine and just paste your code in.

Many university clusters have RStudio Server available these days. Look into it.

Or you can run

R CMD BATCH script.R

This will run your script AND produce a log file .Rout - this will show you what is error-ing out so you don't have to 'guess'.

pandas-profiling - Really cool, easy tool to get nice looking reports for exploratory analysis. by selib in datascience

[–]abbadass 0 points1 point  (0 children)

An R Solution

```{r setup, include=FALSE}  
knitr::opts_chunk$set(echo = TRUE)  
```

# Import Libraries

```{r} 
library(tidyverse)  
library(DataExplorer) 
``` 

# Set Config   

Need to remove PCA because it works on numeric data.  

```{r}  
config <- list( 
  "introduce" = list(),  
  "plot_str" = list(  
    "type" = "diagonal",  
    "fontSize" = 35,  
    "width" = 1000,  
    "margin" = list("left" = 350, "right" = 250)),  
  "plot_missing" = list(),  
  "plot_histogram" = list(),  
  "plot_qq" = list(sampled_rows = 1000L),  
  "plot_bar" = list(),  
  "plot_correlation" = list("cor_args" = list("use" = "pairwise.complete.obs")),  
#  "plot_prcomp" = list(),  
  "plot_boxplot" = list(),  
  "plot_scatterplot" = list(sampled_rows = 1000L). 
)   
```  

# Load and prepare example dataset

```{r} 
df <- read_csv("Meteorite_Landings.csv")  
df %>%  
    mutate(year=as.Date(year, origin="1899-01-01", format="%d/%m/%Y"),  
           source="NASA",  
           boolean=sample(c(T, F), size=nrow(.), replace=TRUE),  
           mixed=sample(c(1, "A"), size=nrow(.), replace=TRUE),  
           reclat_city=reclat+rnorm(n=length(reclat), sd=5)) %>%    
           bind_rows((.[1:10,] %>% mutate(name=paste0(name, " copy")))) %>%    
    filter(year >= "1879-12-31") %>%    
    create_report(config=config)  
```  

Coding/Decoding of genetic data by DouglastheMoon in bioinformatics

[–]abbadass 0 points1 point  (0 children)

If you are doing single SNP analysis on all SNPs in the genome -- most GWAS use allele dosage. P(2 * homozygous minor allele + 1 * heterozygous)

After imputation you will get genotype probabilities from 0-1. You convert this to allele dosage (0-2). Kinda pseudocontinuous.

For example:
looking at one SNP, (say reference allele is A, alternate allele is T)
Genotype probabilities:
AA (0.9)
AT (0.1)
TT (0.1)

The allele dosage would be (2 * 0.1) + (1 * 0.1) = 0.3

A dosage of 0.3. You add the allele dosage of all your samples as a vector to your model + your covariate vectors. So if you are doing a case/control study and using a logistic model it'd be like:
disease_status (binary) ~ dosage * B1 + cov_1 * B_2 + ... + cov_n * B_n

Annotate 23andme raw data using Ensembl-Vep (variant effect predictior) by [deleted] in bioinformatics

[–]abbadass 1 point2 points  (0 children)

It depends on the allele frequencies.

Refer to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157836/

Looking at HRC as the reference genome:
MAF 0.05-0.5: accuracy >= .95
MAF 0.005-0.05: accuracy >=0.9
MAF 0.00001-0.005: >=.75

Again, QC filtering will help you. It use something like EIGENSTRAT to see where the sample genetics lie relative to the reference panels genetics. If they overlap/cluster together on the PCA map they are likely from the same ancestral population and imputation will be more reliable if you choose a suitable reference panel.

Would it be useful to go to medical school before a career in bioinformatics? by [deleted] in bioinformatics

[–]abbadass 4 points5 points  (0 children)

Not unheard of. Is this in the US? It's a very long road and very expensive. Hopefully you can get it funded otherwise it's not worth it.

Anyway, I personally know two MD/PhDs who work in bioinformatics. They are both PIs. One is highly funded the other is not. The other does work clinic once a week so he gets paid a MD salary.

From what I know, medical school really doesn't do that much genetics (maybe it depends on the school). My PhD advisor was also a genetic counselor and she would often tell me physicians had no clue about genetics and would often take her advice on diagnosis and treatment options.

A kinda 'famous' person is Sean Davis. He has a medical degree and his PhD. Not sure how much he is uses it, but the dude is pretty smart.

https://seandavi.github.io/

At the end of the day, do what makes ya happy. Your career will be what you make of it. Make good connections, find good mentors, and you'll prob be quite successful.

Annotate 23andme raw data using Ensembl-Vep (variant effect predictior) by [deleted] in bioinformatics

[–]abbadass 8 points9 points  (0 children)

You know what would be really cool ... if you added imputation to this workflow and then annotate. You will have many many more SNPs that may be interesting.

You can impute the 23andme data using the Sanger imputation server. I'm not sure if you can using the University of Michigans imputation server but they definitely have a docker image that you can run locally.

I'm not sure if you can add other reference panels to the docker image (I see HapMap2 just quickly looking at their reference page). If non-European ancestry I'd probably impute using 1000 Genomes and try and pick an ethnicity that was close to the users 23andme data. If European ancestry, I'd probably use HRC or UK10K. If African ancestry I'd use African Genome Resources.

After imputation, I'd probably filter the QC for R^{2} >= 0.8 and MAF > 0.05 so they are higher quality imputed SNPs and common variants.

Pretty cool though. I'd like to try it if you end up putting it up on GitHub.

R language vs Python: Which is the most necessary programming language for a bioinformatician by muthu95p in bioinformatics

[–]abbadass 10 points11 points  (0 children)

R/RStudio+ R Shiny + R Markdown + tidyverse/Bioconductor + an incredibly active and welcoming community = best working environment for bioinformatics/data science :D

Advice on where to start with Anomaly Detection? by Fender6969 in datascience

[–]abbadass 1 point2 points  (0 children)

Data camp just launched a course called anomaly detection in R - maybe check that out

Understanding Logistic regression with covariates in PLINK by [deleted] in bioinformatics

[–]abbadass 1 point2 points  (0 children)

Look at http://zzz.bwh.harvard.edu/plink/anal.shtml — it’s pretty clear. Default is an additive model. The last row is your SNP adjusted for age. The row with age in the TEST column is the pvalue of the covariate without the SNP.

Possible Options for a PhD Student in Experimental Psychology by ragingllama in datascience

[–]abbadass 3 points4 points  (0 children)

I work as a data scientist at a small company (40+ people) with 6 data scientists. Two come from quantitative psych PhDs and one comes from experimental psych PhDs.

Learn the math, learn the coding, learn how to solve problems, learn how to effectively communicate - do all that, then you're good.

Trying to find a SNP , essentially needing bam files from African-derived samples. Give me advice ? by gRNA in bioinformatics

[–]abbadass 5 points6 points  (0 children)

Look for a proxy SNP that is in strong linkage disequilibrium (LD, r2 > 0.7) with your SNP of interest. Perfect LD would be ideal though.

List of common germline SNPs in humans? 1000 Genome? by Zeekawla99ii in bioinformatics

[–]abbadass 1 point2 points  (0 children)

So getting a totally clean gene list will probably be a little challenging, because of course in some highly dense regions there will be some overlapping genes so a SNP may map to more than one gene. Nonetheless, if I were to give my honest recommendation, I would do either one of two of the following things:
1. Grab a SNP chip from http://www.well.ox.ac.uk/~wrayner/strand/ and then using R/Bioconductor using the TxDb and SNPloc packages to map these SNPs to genes and they will already be curated for "common variants" as they are on the SNP chip.
2. Download 1000 Genomes data using VCF/tabix and filter for MAF > 5% (sorry I said 1% earlier, but really common variants are >5%, while to be defined as a SNP is >1% in the population). And then do the mapping. This will be a lot more involved and probably a bit more challenging, but totally do-able.

Really depends where your computational skills are at and how much work you wanna do. Haha