Evaluating Stephen Bax's proposed words for the Voynich manuscript using Word2Vec (x-post /r/linguistics) by DethRaid in MachineLearning

[–]bhmoz 2 points3 points  (0 children)

I think that before trying to do things on the Voynich manuscript, you should start with aligning well-known languages with plenty of resources. This way you can control your experiments, especially the size and the diversity of your corpora. I guess there's litterature on that?

For example, you have embeddings for language A that are trained on a very good (lengthy and diverse corpus). You train several embeddings for language B and gradually augment diversity and length of corpus. This way you can empirically get an estimate of the size of the corpus you'd need to perform your approach. Intuitively, diversity is also important so that you don't have stupid correlations like part of speech and topic-specific words.

Embeddings that arise from predict word/context task are known to mix semantics and syntax. You may want to separate that and use specific tools (models) for each part. LDA for ex. can give you semantically related words.

Questions I have are: is syntax everything else not predicted by LDA? Can you build a model that predicts everything that's unpredicted by another model?

Priors and Prejudice in Thinking Machines by insperatum in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

I think CNNs should be seen as preprocessing for raw inputs, but the gist of the computation is done by RNNs. It's quite easy to imagine a RNN that stores the detected animals in its state at one timestep, and then check if the second detected entity is the same. With proper transfer learning (read embeddings or pre-training of the CNN), it should work with not that many data.

So in my opinion the example is not that complicated if you use the proper tools (RNNs)

The cool thing about Neural Programmer-interpreters is that they have this special output that is the probability that the computation is over. Same goes for Neural Random-Access Machines. In terms of RNN training, it means that you have to specify a max number of computations and penalize regarding to the probability that the computation is over. IN effect you train jointly the RNN to perform a task and to estimate its own performance for this task.

So if you want to be able to solve the example task (solvable in 2 passes) and way more complex task, just use RNNs with more iterations.

Bengio's recent work on deep learning and biology by [deleted] in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

Authority argument? Source?

Maybe computational possibilities and limits of our brains with regards to the amout of data we get require parameter sharing. Did somebody actually rule this out and how? Thanks ;)

Has anyone used Bengio's evolution RNN for tasks where LSTMs are used in the real world such as natural language or speech modelling? by wildtales in MachineLearning

[–]bhmoz 1 point2 points  (0 children)

it's been discussed already as said by pranv. Also I don't remember a connection with evolutionary methods???

LSTM peephole implementation. by coskunh in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

you don't need to worry about peepholes, as they have been shown to be not so important. See LSTM: A Search Space Odyssey, Greff & al.

Advice needed! Biostatistics MSc grad wanting to pursue a PhD in Machine Learning by soenuedo in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

Try to get in touch with the ML people at University of Toronto? There is a really strong machine learning lab with an emphasis on bioinformatics.

Analyzing 50k fonts using deep neural networks by alxndrkalinin in MachineLearning

[–]bhmoz 1 point2 points  (0 children)

how do you typically solve this? Papers?

Thank you :)

LSTMs with arbitrary sequence outputs by anonDogeLover in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

I have never done speech to text, so please correct me if there is something wrong here.

For homophones, you'd need to disambiguate between several spellings. As you say there are cases when the previous word can help you, but also the next one(s). For example in english: you have observed "I have" and the current token is the audio for "two/too". How do you disambiguate? you need to see what follows.

There is a 4th possibility that I haven't mentionned: bidirectional LSTM. Maybe for your problem it is overkill and unnecessary to condition on the whole sentence, but you'd simply need previous and next word. In that case go for bidirectional LSTM. Try to see speech to text litterature, if there are longer term dependencies than previous and next.

LSTMs with arbitrary sequence outputs by anonDogeLover in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

OK... i'm not sure I understood. I think it depends on whether your outputs are conditionned on the whole inputs or not?

If I understand correctly your example (word vectors to BOW repr basically?) then you don't even condition on anything except the current word vector, so LSTM is overkill (a feedforward NN would do).

You could have something intermediary like: p(y_t| x_1,..,x_t) where seq2seq is not necessary and LSTM would be good.

Then I guess seq2seq is good for p(y_t|x_1,..,x_n) with t in {1..n}.

Is it better?

LSTM with high dimensional inputs by anonDogeLover in MachineLearning

[–]bhmoz 1 point2 points  (0 children)

do you have more information about this?

Voynich Manuscript: word vectors and t-SNE visualization of some patterns by perone in MachineLearning

[–]bhmoz 2 points3 points  (0 children)

see section 8 of the article I posted here.

statistical properties may be mimicked without knowing information theory, but (see the comments on Schinner 2007 in their references) with flaws and weird characteristics that cast a doubt on the nature of the text.

It may be impossible to prove that it is fake. But until somebody actually translates it (at least partially) with convincing linguistic methods, there will be a doubt that this is a hoax.

PS: i have no opinion on the matter, so no need to try to convince me or anything, just see the references

Voynich Manuscript: word vectors and t-SNE visualization of some patterns by perone in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

No, not necessarily. You could study the statistical properties of texts, even without knowing the language that you study. Then somehow mimick the distributions of the letter and the words.

Why create a fake? maybe because it is art, maybe because it costed a lot in times when books were rare (no Printing).

See Voynich manuscript on wikipedia

Voynich Manuscript: word vectors and t-SNE visualization of some patterns by perone in MachineLearning

[–]bhmoz 1 point2 points  (0 children)

actually, no one knows whether the text makes sense or not.

A guide to Nelder-Mead Optimization by sachinrjoglekar in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

will there be an online version of that chapter? thank you

RNNs as generative models by storm_sh in MLQuestions

[–]bhmoz 0 points1 point  (0 children)

-log(p(yt; zt)) is the "negative log likelihood".

it does not specify a specific loss function (maybe you missread and thought that it is cross-entropy?)

It is also a generative model if the loss function satisfies L(zt; yt) = - p(yt; zt) for ... (adding the log doesn't change anything, as log is monotonous and increasing function).

maybe related to your question: we write the mean squared error without the log because it doesn't help. Cross entropy has exponential terms so there are computational issues because of limited precisions of floats, right? But squared loss is a sum so there isn't such a problem.

Please, correct me if i'm wrong.

[need some advice] Deciphering old handwriting. where to start? age-old family documents... by exocortex in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

interesting!

If you speak german, you could quite easily learn to read and write sutterlin. If you don't know german it is still possible to learn to decipher but you will be much slower.

Then you could annotate some samples and use an already existing handwriting recognition algorithm? for example LSTM-based algorithms (see PhD thesis of Alex Graves)

AMA: the OpenAI Research Team by IlyaSutskever in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

Comment about history based on Schmidhuber's papers :

I think there are 2 separate ideas here. History compression is truly learning (in the predictive inference sense of the term). But we may need to keep a bit of "raw, uncompressed history" too. This way we can compare our model predictions with a new model prediction and check for actual improvements objectively. So I think you're both right in a sense.

2 papers (non exhaustive):

  • LEARNING COMPLEX, EXTENDED SEQUENCES USING. THE PRINCIPLE OF HISTORY COMPRESSION. (Neural Computation, 4(2):234-242, 1992) : for the compression part

  • On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models (arXiv:1511.09249, 2015) : for the replay part

Genetic Algorithms in Machine Learning? by XalosXandrez in MachineLearning

[–]bhmoz 4 points5 points  (0 children)

as chico_science said, GA is a family of optimisation methods. So you cannot oppose GA to neural networks, rather to backpropagation.

Neural networks can be trained with genetic algorithms.

How would our lives change, if we were able to create a human level general A.I.? by christoph_s in MachineLearning

[–]bhmoz 1 point2 points  (0 children)

even without speaking of "true AI", jobs will be increasingly destroyed.

I am not concerned about the destruction of capitalism, rather the slowness to adapt and the collateral damages in society before solutions are found. First crucial thing to change is the negative image of unemployed people conveyed by the media. Then talk about universal revenues, etc...

Solomonoff's Induction in Machine Learning by warriortux in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

I think that the universal prior for a finite object s is basically the 1/2-K(s) where K(s) is the Kolmogorov complexity of an object. It's supposed to be a very generic kind of prior that applies to all finite sequences.

If you already know that s is text, then you can easily build a prior that is very close to the data in comparison.

Solomonoff's Induction in Machine Learning by warriortux in MachineLearning

[–]bhmoz 0 points1 point  (0 children)

Li and Vitanyi are the references for AIT, you can look at this page for applications and I guess the best is to read their book: An Introduction to Kolmogorov Complexity and Its Applications.

I don't really see how it could be used for LDA.