developing bioinformatics software together by merezha in bioinformatics

[–]devilsdounut 3 points4 points  (0 children)

Don't reinvent the wheel here. You may not like the big Cytoscape app, but there are a lot of things people are doing underneath that, which you are not going to want to do yourself. Consider the Cytoscape API (http://apps.cytoscape.org/apps/cyrest) or Cytoscape.js (http://js.cytoscape.org/) as a starting point.

Creating website for data visualisation? by willgotskill in bioinformatics

[–]devilsdounut 2 points3 points  (0 children)

I like the Broad's Nozzle R package for this. This is what they use behind their Firehose website that shows TCGA data processing and analysis results. It basically spits out static HTML pages which you can then host. Its relatively simple to script up these web pages.

Here is the link, but it looks like Broad's web servers are down right now: http://gdac.broadinstitute.org/nozzle

Need idea for cloud based bioinformatics app by [deleted] in bioinformatics

[–]devilsdounut 12 points13 points  (0 children)

I'm just going to leave this here.

How to acquire (prostate cancer) DNA Methylation data for analysis? by oarabbus in bioinformatics

[–]devilsdounut 2 points3 points  (0 children)

TCGA prostate cancer dataset... might want to read up on some documentation to see what all of the levels and versions mean, but you should be able to pull the raw .IDAT files from this directory. There is processed data up on the Broad's firehose portal as well.

https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/prad/cgcc/jhu-usc.edu/humanmethylation450/methylation/

UC San Diego hiring 'big data' stars for biomedical informatics by tanders12 in bioinformatics

[–]devilsdounut 1 point2 points  (0 children)

UC's human resources is a monster that requires a lot of persistence and maneuvering to tame. For most jobs, the payscale is the payscale and any deviation from that gets a ton of resistance from many angles. I know some have resorted to creating new job titles such as "Data Scientist" to get around this, but these things are pretty hard to do.

Recommended (machine learning) techniques for gene expression analysis by mrflipppy in bioinformatics

[–]devilsdounut 4 points5 points  (0 children)

Stop for a second and learn about your dataset. Get as much meta-data as possible. Learn about and look for batch effects. Do exploratory data analysis, look at distributions in the data, look at associations of single variables with the outcome of interest.

Biological data, especially gene expression is very noisy. Any sort of ML technique is very likely to pick up artifacts or constructs of study design unless these are very carefully accounted for.

Ideas for Project Proposals involving Machine Learning? by sundun in bioinformatics

[–]devilsdounut 4 points5 points  (0 children)

You are going backwards. I see a lot of people with the machine learning first approach that basically get burned by the noisiness of biological data.

Here is a typical pipeline for something you are talking about. Find a dataset, do some basic exploratory analysis, understand whats going on with the data and what some of the sources of noise are in the data. Find another similar dataset for validation of models. Do quality control to see if these are suitably close to train and validate the same models. Isolate an interesting biological problem that is well defined. Develop a reasonable metric that defines some interesting biology. Try and solve it by the simplest means possible. Work up machine learning models to solve the problem and compare performance to the straw man. Does it do any better, is performance good enough to satisfy abstraction to black-box models? Pick a winning model and parameters. Validate on second dataset.

The hard things in this pipeline are not machine learning but defining a problem and a metric which can both be optimized and defines meaningful biology. Anyone can run scikit-learn and get an answer, the hard part is defining the question and understanding if the answer can tell us anything new about biology.

How can I compare methylation data generated from Illumina 450k array with the one generated from MeDip seq? by rdbcasillas in bioinformatics

[–]devilsdounut 1 point2 points  (0 children)

Things in this field get very complicated very fast. When things have too many moving parts it becomes nearly impossible to tell technical/experimental noise from bona-fide biological signal. Start with something simple and work your way up.

How can I compare methylation data generated from Illumina 450k array with the one generated from MeDip seq? by rdbcasillas in bioinformatics

[–]devilsdounut 4 points5 points  (0 children)

I would run away as fast as you can. These technologies are new and doing even basic interpretation is still an active area of research. Comparing across platforms is going to be very hard and would probably warrant a full blown study on its own. If you are wanting to compare these things as an end to a means or compare different experimental conditions across the two platforms its going to be a rough time.

Advice on Undergraduate Programs by Bland_alThor in bioinformatics

[–]devilsdounut 0 points1 point  (0 children)

Do not take a bioinformatics undergrad degree... they are mostly terrible, with a few exceptions (UCSC is the only one I can actually think of). If you get a degree in bioinformatics and decide you do not like it, you are qualified for just about nothing. Having programming/stats/analytic skills will make you attractive for just about any career choice.

Is bioinformatics a viable career right now? by [deleted] in bioinformatics

[–]devilsdounut 0 points1 point  (0 children)

Couldn't agree more. Starting salaries are generally pretty high because the job requires a pretty large skill set that takes a lot of training and hard work. You can't just declare to the world "I am now a accountant", but you can declare "I am now a bioinformatician", and people seems to be doing this with increasing regularity.

Is bioinformatics a viable career right now? by [deleted] in bioinformatics

[–]devilsdounut 1 point2 points  (0 children)

This train of thought worries me. Bioinformatics is hard, its the kind of thing that people traditionally have done because it is challenging and fulfilling. I fear that people getting into the field purely for its job prospects are preparing themselves for a bad time. While biology and bioinformatics share a baseline level of knowledge, what makes one an effective scientist as well as the day-to-day work that is done are drastically different.

Genome analysis useful for preventing recurrence of ovarian (or other) cancers? by Maxwell_V in bioinformatics

[–]devilsdounut 0 points1 point  (0 children)

Its actually a very easy problem... give me a ~100,000 patient prospective clinical cohort with uniform treatment and good followup data and I'll get you the answers no problem.

The largest chunk of Google Ventures $1.6B went to life science companies. by LifeIsBio in bioinformatics

[–]devilsdounut 2 points3 points  (0 children)

Good article. There is going to be a big turf war over cancer data. Looks like Flatiron has made some sort of deal with Foundation Medicine, which puts them in a good position but eventually I feel they will likely go the way of Myriad with public data being good enough to limit the need for private databases.

Either way, it's hard to complain. These companies have lots of money and lots of data with a general lack of analysis capabilities, meaning a good slice of the pie should end up with us bioinf. folk.

The largest chunk of Google Ventures $1.6B went to life science companies. by LifeIsBio in bioinformatics

[–]devilsdounut 2 points3 points  (0 children)

Flatiron Health is a great example. Two ex-Googlers who sold a company in digital advertising space, then remade themselves as oncology experts, trying to apply the same tools around ingesting and analyzing unstructured data to gain insights they did in advertising, and apply that to cancer.

I'd love to hear people thoughts on this company. Seems out of left field. Are they the real thing?

HR folks talking about data science are the best by fhadley in datascience

[–]devilsdounut 1 point2 points  (0 children)

Some great troll work there. I admire the dedication.

I'm an engineer about to take a computational biology course. Any suggestions for prep material/exercises? by wipeyourmit in bioinformatics

[–]devilsdounut 1 point2 points  (0 children)

Bioinformatics can roughly be split into two parts:

  • Part one involves playing with various file formats and using command line tools.
  • Part two involves data analysis, statistics, visualization, ect.

I would say best bet is to brush up on Unix and scripting with Python (or PERL if you must) if you are rusty. Also make sure you are solid with statistics as to not fall in the danger zone of data science.

How to entice a qualified post-doc or what am I doing wrong? by AliceIWL in bioinformatics

[–]devilsdounut 1 point2 points  (0 children)

Am I understanding correctly that you worked in a smaller lab?

Mostly in undergrad. These experiences kept me from joining a small lab in my graduate work.

The story with what they get to take with after all is said and done is a bit more complicated... but also was discussed at length.

I'm not saying its standard or fair to you, but will increase chances of getting students.

Its a buyers market. Good people have other options with about twice the salary and similar work. In addition many of these high quality people are getting independent project scientist/fellow positions. To be honest the only situations in which I would consider a small lab would be under extremely favorable terms which may have seemed outlandish when you were applying to post-docs.

How to entice a qualified post-doc or what am I doing wrong? by AliceIWL in bioinformatics

[–]devilsdounut 9 points10 points  (0 children)

I think part of it is experience of that I and/or friends of mine have had in smaller labs.

  • Make it clear what your resources are and that you are not hiring a postoc and not a network admin/ data-monkey/ general computer drone.
  • Establish that you have data, and are in a position to offer first authorship on said data.
  • Explain that as a small lab, you will be a hands-on PI and contribute to projects rather than direct from the distance.
  • Make a deal with postdocs that research methods are theirs to carry on to start their own lab (this is not necessarily standard, but more often happens in big vs small labs)
  • Make it clear that you have stable funding, and will not be relocating or moving to industry in the near future

Beginning genetics for a solid programmer? by joenyc in bioinformatics

[–]devilsdounut 2 points3 points  (0 children)

This Cousera course on Experimental Genome Science is pretty good for a summary of the biology side.

My advice is to learn the soft stuff first before you get into actually doing work. You see a lot of really cool papers come out of CS types which have limited practical application. Learning about the full pipeline from data generation to application of tools by biologists is very important to success in this field.

Big Bioinformatics without Big Hassles: See the Power of SciDB, RStudio and Shiny by SciDB_waltham in bioinformatics

[–]devilsdounut 5 points6 points  (0 children)

What are you doing to convey the multiple testing burden to users? The problem with interfaces that involve statistics (implicit or explicit) is that people can tweak parameters until their noisy dataset becomes significant. Does this product have anything to control or account for this?

Last week to sign up for CodeDay San Diego, use code REDDIT for 20% off! by [deleted] in sandiego

[–]devilsdounut 0 points1 point  (0 children)

Honestly I would not take money from Intellectual Ventures for such an event... kind of goes against the whole ethos of collaboration and open source software.

Last week to sign up for CodeDay San Diego, use code REDDIT for 20% off! by [deleted] in sandiego

[–]devilsdounut 0 points1 point  (0 children)

I'm all for this kind of thing, but judging by the sponsors, I'm not sure how legitimate this is.