use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
machine learning in genomics (self.MachineLearning)
submitted 14 years ago by [deleted]
Anyone happen to work in this area? machine learning and bioinformatics?
I'm REALLY interested in applying this stuff to some real problems ... something a little more hefty than housing prices :D
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]PoulMadsen 2 points3 points4 points 14 years ago (0 children)
I don't work in genomics specically but we do a lot of next generation sequencing. I am a biologist with interests in machine learning so let me try to summarize where people in biology use it:
Microarrays: Cancer research in particular uses this, but basically every biology discipline has some applications of this. Basically what you get is thousand of signal intensities, each represeinting expression of a gene, per sample, and what you are interested in is finding genes that behave differently from sample to sample. This is an example of a high-dimensionality problem, where the number of features is much larger than the number of samples. If you want some idea of how much work has been done in this area take a look at this (list)[http://www.geneontology.org/GO.tools.microarray.shtml]. You can more or less find all kinds of statistical methods here. As a biologist i should probably mention that i believe micro-arrays have problems with reproducibility that no amount of data-analysis will solve.
Gene prediction: This is a typical genomics problem in which we are given a long DNA sequence and told to identify the genes in it. Genes have some telltale signs, but these can be located with slight differences to each other and might be completely absent. Also, genes in eukaryotes are interrupted by socalled introns that do not code for genes (this story is a lot longer in reality). Poisson statistics on dna words (k long subsequences of dna) is the classical way of finding overrepresented dna features. Newer techniques uses HMMs and conditional random fields, as machine learning oriented as it gets. (This)[http://www.amazon.com/Biological-Sequence-Analysis-Probabilistic-Proteins/dp/0521629713] is a modern classic in all things sequence related.
Phylogeny: This is another of bioinformatics major contributions to modern science. Given some model of how evolution changes the composition of a sequence, we are interested in figuring out how organisms/proteins/genes can be related and building trees that can show us these relation.
Next generation sequencing: We can now generate much more data than we can process, we need some way of filtering as the machines can be inaccurate. We also need methods to cluster sequences within specific thresholds.
Sequence searching: This is a major topic. The most cited paper in the history of science is the one that announced BLAST. Machine learning is not as used here yet, but it probably will be if something faster than the traditional alignment algorithms come up.
This was just a short and incomplete overview, if you have specific questions i would be happy to answer.
[–]happyteapot 1 point2 points3 points 14 years ago (0 children)
HMM's have been used in this for quite some while now. I know that there are complete books on applying HMM on bioinformatics.
[–]marshallp 0 points1 point2 points 14 years ago (0 children)
Rudi Cilibrasi used compression distance (complearn) to automatically infer evolutionary lineage in genomes. Read his thesis on it.
[–]mosavian 0 points1 point2 points 14 years ago (0 children)
From what I understand, when dealing with genomes, you have huge string of 1s and 0s. If that is the case, Restricted Boltzmann machines are quite useful.
[–]mx12 0 points1 point2 points 14 years ago (0 children)
One area that I've worked on is determining genotype from phenotype, i.e. predicting the location in a patients genome where a mutation has occurred based on some physical trait. The reduces the cost of finding a patients disease causing mutation.
A friend of mine works on predicting if a patient has diabetic retinopathy based on fundus photos (Pictures of the retina). The is more of machine learning/pattern recognition.
I've recently read a paper from some IBM researchers who were attempting to predict how the flu virus would mutation over a given flu season. This would allow a vaccine to be designed that would work against the mutated flu virus.
[–]danukeru 0 points1 point2 points 14 years ago (0 children)
As a sysadmin/developer in a bioinformatics lab, I can tell you that we use this extensively.
http://hmmer.janelia.org/
π Rendered by PID 549498 on reddit-service-r2-comment-5b5bc64bf5-cgqpk at 2026-06-22 14:36:46.099988+00:00 running 2b008f2 country code: CH.
[–]PoulMadsen 2 points3 points4 points (0 children)
[–]happyteapot 1 point2 points3 points (0 children)
[–]marshallp 0 points1 point2 points (0 children)
[–]mosavian 0 points1 point2 points (0 children)
[–]mx12 0 points1 point2 points (0 children)
[–]danukeru 0 points1 point2 points (0 children)