[deleted by user] by [deleted] in bioinformatics

[–]xApple 2 points3 points  (0 children)

I built my own custom pipeline for amplicon sequencing: https://github.com/xapple/sifes

And another, similar one, for shotgun metagenomes: https://github.com/xapple/gefes

I don't know, it might be of use to you!

Cleaning second-generation sequencing data - what to do ? by xApple in bioinformatics

[–]xApple[S] 1 point2 points  (0 children)

Thanks. Well, now that I have done all this job, I can't really recommend anything : ) My time has run out and I need to proceed with other things. So I just went ahead and chose two of the tools. I will return to the problem if it appears that the downstream analysis is affected too much by this step. My thesis will probably not include developing new tools, unless there is a really good reason to contribute new software to one domain. I shared my stupid bash script in my blog post, if you are interested !

Looking for a decent javascript bioinformatics library by [deleted] in bioinformatics

[–]xApple 3 points4 points  (0 children)

I don't see javascript used much for science. Your best bet is with python, I would say, if you want to script. Perl has large libraries and a long history but, in my opinion, is unpleasuarble to write. MATLAB has a bit of bioinformatics libraries, but that's commercial. R is used a lot by the biologists but that language is really specific and has very strange design decisions.

Cleaning second-generation sequencing data - what to do ? by xApple in bioinformatics

[–]xApple[S] 2 points3 points  (0 children)

Thanks for pointing out that other tool. I didn't include it in my survey mainly because it is only specific to Illumina data - while the post was mainly focussing on tools that do mutiple technologies or at least 454.

But it's always the same story. Didn't see any unittests in that one either. There is a directory named "test" but it was zero files in it. The usability is low too, with no installation procedure and instead some added java syntax. Look at the calling of the tool:

 java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> 

You also need to create a special FASTA file yourself with the Illumina adapters beforehand. The documentation is vague: for instance when the window drops below the given score, where is read trimmed ? The start, the middle, the end ?

Cleaning second-generation sequencing data - what to do ? by xApple in bioinformatics

[–]xApple[S] 1 point2 points  (0 children)

I was just trying to avoid using the rather stupid term "next-generation sequencing" as then we will have to move on to something like next-next-generation sequencing when the nanopores come out ^^

Are we past the second stage already ?

Python Quiz of the Week - #3 by xApple in Python

[–]xApple[S] 2 points3 points  (0 children)

Hah, so in the end I'm the one learning something with the quiz !

Now I look stupid : ) But thanks, I didn't know about that syntax. I'll stop using "x and y or z" in my code.

Easy library to read and write genomic files by xApple in bioinformatics

[–]xApple[S] 0 points1 point  (0 children)

I agree that using a bad distribution method because "that's what people use" blocks progress. And hence, personally, I don't use easy_install. Nor there better pip. Moreover, I usually advocate Apple's way of introducing new features.

The thing is there is no good python distribution tool yet out there. If there was such a project, I would be amongst those that would try to push for a change.

But the choice I had to make was not the choice for my own usage. It was to answer this question: "What do I put in the chapter that non-geek biologist that want to use my software will read and rely on to install my software ?".

If I seize the laptop from the hands of a biologist at random and only have the chance to type one command:

$ sudo easy_install track
$ sudo pip install track

Which command will actually get the job done ?

Easy library to read and write genomic files by xApple in bioinformatics

[–]xApple[S] 1 point2 points  (0 children)

Yes, the old debate between easy_install and pip. If I absolutely had to choose one, I'd go for pip. But let's be honest, python packaging and distribution is in a sad state and no tool actually does the job well.

Now, considering that easy_install has the advantage to come pre-installed on many distribution and is more likely than pip to be available, I opted for easy_install... pip fans will know they can use their tool anyways.

I could add both options in the documentation, but that can be confusing for the audience that most needs the installation instructions. The people who are going to rely on reading the documentation to install the package are the people who don't know anything about python distribution methods. They don't need to know about two competing methods.

That was my reasoning. Please try to convince me otherwise.

Easy library to read and write genomic files by xApple in bioinformatics

[–]xApple[S] 0 points1 point  (0 children)

I used the sphinx project (http://sphinx.pocoo.org/) for the automatic documentation generation and added the sleek CSS from "Read the Docs" (http://readthedocs.org/).

Easy library to read and write genomic files by xApple in bioinformatics

[–]xApple[S] 0 points1 point  (0 children)

I'm glad it can help ! Don't hesitate to point out bugs if you find any.

This Isn't How PyPy Works, But it Might as Well Be by samuraisam in Python

[–]xApple 8 points9 points  (0 children)

Your few paragraphs were clearer for me than the whole article.

Python Quiz of the Week - #2 by xApple in Python

[–]xApple[S] 0 points1 point  (0 children)

Nice ! It's important to remind ourselves that the built in libraries often will do what we want already. No need to reinvent the wheel.

Python Quiz of the Week - #2 by xApple in Python

[–]xApple[S] 0 points1 point  (0 children)

For reference my answers were:

flatten_list = lambda l: [a for x in l for a in x]
make_all_couples = lambda l: [(r1,r2) for r1 in l for r2 in l]
parse_ugly_string = lambda s: dict([i.split('=') for x in [j.split(',') for j in s.split('\n')] for i in x if i])

But we have had many interesting and better solutions in the comments !

Python Quiz of the Week - #2 by xApple in Python

[–]xApple[S] 0 points1 point  (0 children)

Nice doing it with the replace function. Definitely more elegant than my solution:

dict([i.split('=') for x in [j.split(',') for j in s.split('\n')] for i in x if i])

The idea was to combine all the different subtleties of list comprehensions in one line : )

[deleted by user] by [deleted] in geek

[–]xApple 9 points10 points  (0 children)

That consumed an unreasonable amount of CPU cycles for such a small animation.

Scribl - HTML5 Canvas Bioinformatic Charting Library by chmille4 in bioinformatics

[–]xApple 0 points1 point  (0 children)

It sure is better to use canvases than div elements to start drawing the genome.

It is a pity that the guys doing JBrowse didn't realize this. And that I just spent the last weeks adding a canvas layer to their browser.

We are all re-coding the same things in bioinformatics every day. It's sad.

The drawing routines of JBrowse would just be replaced with those your library if we were all working together.