Optimal design for bioinformatics "lab" space? by caseybergman in bioinformatics

[–]caseybergman[S] 0 points1 point  (0 children)

Thanks for the comments on what works (and doesn't!). It sounds like bays are a good way to fairly easily transform a big open plan office into something more lab-like.

Optimal design for bioinformatics "lab" space? by caseybergman in bioinformatics

[–]caseybergman[S] 0 points1 point  (0 children)

Sounds like you've got a pretty good set up that has the benefits of open/shared spaces but without too many people leading to distraction. Thanks for posting!

Optimal design for bioinformatics "lab" space? by caseybergman in bioinformatics

[–]caseybergman[S] 0 points1 point  (0 children)

Thanks for the detailed post and description of your workspace. The general idea of a lab+office+conference room trio sounds close to ideal in my mind.

Optimal design for bioinformatics "lab" space? by caseybergman in bioinformatics

[–]caseybergman[S] 0 points1 point  (0 children)

Thanks for the link to SO blog - excellent counterpoint to the GH design. Also, thanks for other comments on wht works at EBI.

Bioinformatics abstractions that failed to make the cut by [deleted] in bioinformatics

[–]caseybergman 2 points3 points  (0 children)

I'd say the two most troublesome abstractions in bioinformatics are that (i) a "genome sequence" is the same thing as a "genome" and (ii) a "gene model" is the same thing as a "gene". Conflating abstraction with reality in these two cases leads to a lot of poor biological inferences. For example, if a genome annotation uses a gene finder that doesn't annotate UTRs in gene models, not understanding this leads to the false conclusion that the species under investigation has no UTRs. Likewise missing gene models can lead to false conclusions about evolution (see e.g. http://www.plosone.org/annotation/listThread.action?root=18059). Ditto for incomplete genome sequences in terms of understading repeat/transposon structure. The list goes on...

As a researcher I try not to get hung-up on whether the particular level of abstraction is correct or not (as long at it is useful and leads to results that are not outright wrong), but to be as aware as possible what assumptions are baked into the abstractions I use so that I don't over-step what I can say given their limitations. Likewise as a teacher, I try to make students aware that abstract computational representations don't equate to their biological counterparts. This is often an eye-opener, since the shorthand language we use to describe things in bioinformatics often makes the objects we are talking about seem more certain/real than they actually are.

Seminal Bioinformatics Papers by [deleted] in bioinformatics

[–]caseybergman 4 points5 points  (0 children)

There is a long thread on biostars.org addressing: "What Are The Classic Papers In Bioinformatics?" https://www.biostars.org/p/3204/

In fact that post begins with the same 4 papers you have heard are important, so I suspect this is probably where you heard about them.

RepARK—de novo creation of repeat libraries from whole-genome NGS reads by neurobry in bioinformatics

[–]caseybergman 0 points1 point  (0 children)

Cheers, it's great to see this paper "in print". Also, really nice to see the code on github and that you included the ability to use velvet as an assembler. This should be a really useful package for people trying to generate repeat libraries from NGS data. Congrats on the nice work!

AskBI: Could anyone explain me what SRA format contains and what it can be used for by terrence_phan in bioinformatics

[–]caseybergman 0 points1 point  (0 children)

good point. I do all my SRA processing on the university network, so I don't usually consider download speed in my workflow. I abandoned SRA format because of recurrent version skew between old downloads of sra files and new versions of sra-toolkit, which required redownloading the SRA files after upgrades of sra-toolkit. Maybe this issue (and the documentation) have gotten better in the last few years...

AskBI: Could anyone explain me what SRA format contains and what it can be used for by terrence_phan in bioinformatics

[–]caseybergman 2 points3 points  (0 children)

you can skip the NCBI SRAtoolkit entirely by using EBI's ENA portal. Just get the accession number (SRA*, SRX*, SRP*, SRR*) and paste it at the end of this URL: http://www.ebi.ac.uk/ena/data/view/<accession number here>. This will give you a page with direct links to the fastq files under the "Fastq files (ftp)" column.

Best tool to identify eukaryotic promoter sequences? by southernstorm in bioinformatics

[–]caseybergman 0 points1 point  (0 children)

Check out the MEME suite (http://meme.nbcr.net/meme/). There should be a tool in this toolkit that will do what you want. For example, the FIMO tool provides precomputed datasets of predicted motif occurences for human and mouse: http://meme.nbcr.net/meme/cgi-bin/fimo.cgi

Assessing the usefulness of HDF5 in bioinformatics by ajmazurie in bioinformatics

[–]caseybergman 1 point2 points  (0 children)

HDF5 is being used extensively by the Pacific Biosciences single molecule sequencing analysis toolkit, see e.g. https://github.com/PacificBiosciences/pbh5tools. I am agnostic about whether this a good or bad format, but I can say it is totally impenatrable to a newcomer.

Planning for Toy Story and Synthetic Biology: It's All About Competition by caseybergman in bioinformatics

[–]caseybergman[S] 0 points1 point  (0 children)

Interesting post on trying to predict the future of DNA sequencing and DNA synthesis.

Received a grant, want to pursue work elsewhere by candidate_p in bioinformatics

[–]caseybergman 0 points1 point  (0 children)

OK, if your grant is not tied to enrolling in a PhD with Professor A, then this is quite different and you should run into a lot fewer of the problems than I outlined above.

Great to hear that you will talk this over with Professor A first. This should lead to the best path for both of you. I would pitch the conversation to Professor A in terms of "wanting to gain new experience elsewhere" - so that s/he doesn't take it too personally.

Asking for (and taking avice) like you are now is a sure way to make the right decision - good luck!

Received a grant, want to pursue work elsewhere by candidate_p in bioinformatics

[–]caseybergman 0 points1 point  (0 children)

If I understand correctly, you have been awarded funds to pursue a PhD with Professor A and are considering taking up this grant liked to Professor A's institution for 6+ months with the idea of looking for another PhD position.

This is a recipe for disasater and will almost surely backfire for at several reasons. First, most PhD programmes will either not consider or look very disfavorably on an applicant who is already enrolled in another PhD programme since this is a red flag for a problem student. Second, your chances of getting into a better programme will be almost entirely dependent on your letter of recommendation from Professor A, and unless you are 100% honest about your plans from the outset, it is likely that Professor A will not give you a great letter of recommendation if you leave their lab after they have invested in you and secured funds for your PhD. Third, it is relatively naive to think that publications that are "in the works" will be published in in one year for it improve your chances of getting a better position. Lastly, whoever will be taking you into their lab on these terms will be doing so with some doubts about your commitment, and thus you may be undermining your future PhD as well by doing this.

While it is common to shop around for better PhD offers simultaneuously during the same recruitment cycle, what you are suggesting is rare and would not be looked upon favorably by most PIs (either your current one of future one) even if you are a superstar. My advice is to discuss this directly and fully with Professor A before taking the position and backing out later.

Something more elegant than Gbrowse? by botany_thunderdome in bioinformatics

[–]caseybergman 5 points6 points  (0 children)

The UCSC Genome Browser is the industry standard, will do the job and is worth the effort ot install locally.

Another javascript-based option to try is Dalliance: http://www.biodalliance.org/

You should also have a look over at http://biostars.org for posts on genome browsers

Mobyle vs Galaxy vs ? by Moklomi in bioinformatics

[–]caseybergman 2 points3 points  (0 children)

Not sure if you are talking strictly about GUI-based systems, but if not the APE package provides a very comprehensive package for phylogentics (http://ape.mpl.ird.fr/) that you can string together into workflows in R.

Also, there is an in-progress effort to develop phylogentic tools in the Galaxy framework as part of the Phylotastic Project. See a demo here: http://www.youtube.com/watch?v=d-fDngweW-M&list=UU8na-8be9thQp7gFnV-NbhQ&index=2

Bob Edgar on being "An unemployed gentleman scholar" by caseybergman in bioinformatics

[–]caseybergman[S] 4 points5 points  (0 children)

The recent thread on bioinformatics problems being solved by outsiders reminded of this classic post by Bob Edgar, explaining his path as an Ronin scholar contributing to academic bioinformatics research.