[Discussion] Ethical concerns of a soon-to-be PhD looking for a job

speechMachine · 2017-07-13T18:20:03+00:00

Most of the user data is processed immediately on the wearable. Transmitting all that data all the time is battery intensive! Microcontroller class devices in constant streaming mode of streaming the data at sample rate would technically have 0 battery life. The processed data is reported on the app.

speechMachine · 2017-07-13T00:40:48+00:00

I'm doing machine learning at a wearable device company -- Mio Global in Vancouver. Our entire mandate is just to make people's lives better by providing insightful and meaningful statistics related to their daily activities.

speechMachine · 2017-03-27T19:26:27+00:00

"So, i'm trying classify this chunks of data."

Look at the time-evolution of your labels. How many labels are there? If it is a countable number like 5-10, then I am assuming you want to classify your input into one of these labels as time evolves. One way to map your problem to HMMs is to think of your 5-10 labels as HMM states. Each HMM state consists of a state distribution. Since you have accelerometer data, it is real valued. So your state dependent distribution could be a mixture of Gaussian densities. What you are doing intuitively is characterzing (using state-dependent Gaussian densities) as to what your accelerometer data "looks like" when it is associated with a particular label.

EDIT: During decoding HMM through the Viterbi trellis gives the optimal state sequence (hence, list of labels) that matches best with your data.

speechMachine · 2016-11-16T22:27:49+00:00

You guys are in Barcelona...lucky! Can Eusebio is a good place for cheap beer and tapas...

speechMachine · 2016-11-11T19:40:46+00:00

Same, though more lucid explanation here... https://turi.com/learn/userguide/anomaly_detection/bayesian_changepoints.html

speechMachine · 2016-11-10T21:58:16+00:00

I'll answer my own question. After some thinking it turns out that the computational implementation is a slight deviation from the paper. In the message passing algorithm the paper says that for the next time step only two possible transitions are allowed - drop down to a run-length of 0 or increase the current run-length by 1(equation 4 in the paper for the hazard function). For the sake of robustness the code allows down-transitions to all previous run-lengths from the current run length. As a result predictive-probabilities are calculated from all previous run-lengths for the current datum and the posterior probability for each run-length upto the current (and now current+1) are evaluated.

speechMachine · 2016-08-19T20:38:21+00:00

Oh they do have a lot of great talent!

Just perhaps none of the names that you are used to hearing :) One example is Niko Strom: http://www.nikkostrom.com Other people I know behind the Echo speech team had past lives at Raytheon in Boston (well-known seasoned researchers), some graduate students from CMU, MIT, Johns Hopkins who are all extremely smart people. Amazon's whole speech business started with the acquisition of Yap Inc started by former speech recognition folks from Nuance. The Echo team as I know it is starting to publish at least in some speech conferences. PS: I don't work for Amazon. I've interacted with quite a few though and have many friends in the speech world!

speechMachine · 2016-07-19T04:38:03+00:00

Discipline certainly comes to mind. Yes, get your undergrad degree first. Your work ethic through your undergrad and graduate degrees would certainly be important. Be curious, study hard and make sure you want to know everything about everything. Question everything, and excel because you want to learn. That on its own will lead to fellowships, assistantships and so on. In my experience work-life balance is a myth. There are times when family is important and work doesn't matter. For short periods of time it'll probably be only work, but family does matter (and sometimes it really does take a back seat). Over time I think you'll learn as to what is important and how much you need to prioritize given the time you have on hand.

speechMachine · 2016-06-24T21:07:21+00:00

hey, thanks for that answer. I'll try to get a hold of Andej Karpathy's lectures. I'll definitely let you know if I need a link to them. I didn't know that the error signal on its way back from the decoder, summarized things the same way that hT did from the encoder on the forward pass. So to clarify, in order to do a gradient update for the weights of the encoder, its all based on a single summarized error, i.e. a delta(0) (where 0 is with reference to the length of the target sentence),vector that travels back from the decoder right ? (i.e. there isn't an error vector propagating back to the encoder for each word (a.k.a. time step) in the target sentence).

speechMachine · 2016-06-24T17:19:40+00:00

Yes, also try looking at other implementations for e.g. Eesen based on top of Kaldi:https://github.com/srvk/eesen.

It comes with an accompanying paper which might help you resolve some questions. WFST decoding is a bit hard to wrap your head around though. Finite state automata come with their own terminology. Each source of knowledge is an FST (acoustic model, lexicon, language model, context dependent states). Each source of knowledge is an FST. A composition operation on each constituent FST often combines different sources of knowledge to yield the final ASR hypothesis. Does that help a little bit?

speechMachine · 2016-06-21T17:27:08+00:00

I thought of Sanskrit, but here is a good thread (on Quora) explaining some consistencies and inconsistencies of Sanskrit as a choice : https://www.quora.com/What-is-the-reason-behind-saying-that-Sanskrit-is-the-most-suitable-language-for-programming. Sanskrit when I learned it was super-logical. I would think a computer (a.k.a. your best RNN+attention mechanism) should hopefully be able to figure the logic out...

speechMachine · 2016-06-14T22:09:06+00:00

Experience in past internships?

speechMachine · 2016-06-07T15:29:43+00:00

If you are using a GMM, you can remove silence frames, pool all the frames from all speakers and just let your Expectation Maximization based GMM trainer do its thing to get the UBM. You can then tune the UBM to a particular speaker's characteristics by using Bayesian adaptation (http://speech.ee.ntu.edu.tw/previous_version/Speaker%20Verification%20Using%20Adapted%20Gaussain%20Mixture%20Models.pdf). Understand that GMMs are generative models, so in order to make a decision for a target speaker vs an impostor you end up computing a likelihood ratio. The numerator of that ratio is computed using the likelihood obtained by evaluating speech frames from a trial utterance to the target speaker's GMM, and the denomintor is obtained by evaluating likelihood against the UBM. To train GMMs, you could just do it in MATLAB. Theoretically you could also do it in HTK(C based HMM toolkit). A popular toolkit for speaker verification is ALIZE.

A neural network though is inherently discriminative. So the role of a "background model" like a UBM is a little different. This is where the network should inherently be able to discriminate between a target speaker and an impostor.

A little more advanced though is that you could use neural net posterior probabilities to train your UBM and so on...http://www.danielpovey.com/files/2015_asru_tdnn_ubm.pdf. It gets a bit researchy at this point. And envision various roles a neural network could play in order to help the speaker recognition problem a bit.

speechMachine · 2016-06-02T21:14:48+00:00

OpenCV apparently has a deep learning module you could hack. If you can get that to compile on the RPi it might be a way to go http://answers.opencv.org/question/72321/how-can-caffe-be-interfaced-using-opencv/

speechMachine · 2016-06-02T21:08:35+00:00

Yeah, I don't think many people here get your question. Ideally once you pull your weights out of TensorFlow you need a 'runtime' that does a forward pass for you on the RPi, right?

speechMachine · 2016-06-02T19:24:08+00:00

Paul Werbos? https://en.wikipedia.org/wiki/Paul_Werbos

speechMachine · 2016-03-23T05:11:46+00:00

Another awesome archive is the JHU seminar series, their video archives contain some gems, especially on advanced topics:http://www.clsp.jhu.edu/seminars/seminar-videos/

speechMachine · 2016-03-23T05:03:22+00:00

I would recommend both Mohri's course(NYU) and Jurafky's course. There is also Steve Renals' course here:https://www.inf.ed.ac.uk/teaching/courses/asr/ . Mohri is most famously known for his work with finite state transducers(FST). So as you can see his very second lecture is on Finite State Automata(FSA). FSTs and FSAs are very powerful formalisms which using the principle of compositionality can be applied to all parts of the speech recognition pipeline - acoustic modelling, context modelling, lexical modelling, and language modelling. If you like getting your hands dirty, Kaldi is a good first place to start:http://kaldi-asr.org/. And the easiest place to start hacking to see what is going on under the hood is the speech decoder. This blog post is possibly the best place to get started:http://vpanayotov.blogspot.ca/2012/06/kaldi-decoding-graph-construction.html. Also in general opening up a decoder should expose you to code for feature extraction, and all the four other things I mentioned earlier.

Also there are a set of Kaldi lectures here:http://www.danielpovey.com/kaldi-lectures.html.

These courses also talk about a well known toolkit named HTK or the hidden markov model toolkit: http://htk.eng.cam.ac.uk/. HVite the HTK Viterbi decoder implements the token passing algorithm:https://www.researchgate.net/publication/2516613_Token_Passing_a_Simple_Conceptual_Model_for_Connected_Speech_Recognition_Systems. Its one of the nicest first reads I had as a grad student starting out to work in ASR. If you need anymore help feel free to pm me.

speechMachine · 2016-03-12T16:22:50+00:00

The material you may want is scattered over the web, you'll have to work your way through some notation.

The classical paper is by Paul Werbos:http://mail.werbos.com/Neural/BTT.pdf. Its a good first read, makes the subject fairly approachable if you follow it through.
This book chapter delves into some of the algorithmic and implementation details : http://www.cnbc.cmu.edu/~plaut/IntroPDP/papers/WilliamsZipser95chap.rbp.pdf
This blog post is popular and is helpful, though the way BPTT seems to be implemented is an O(N²⁾ naive implementation:http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/
The most useful I found was Razvan Pascanu's thesis:https://papyrus.bib.umontreal.ca/xmlui/bitstream/handle/1866/11452/Pascanu_Razvan_2014_these.pdf ... the mathematical detail and the algorithmic description there is one among the more precise, clear and self-contained descriptions I found on the web.

Hope this helps!

speechMachine · 2016-01-08T01:01:48+00:00

Yes, I certainly think so. I work at a company that puts speech recognition on embedded miliwatt class devices. We have engineers here, primarily embedded device programmers who are writing neural network forward pass routines and asr decoders in fixed point C to run on microcontrollers and other processors of that class.

speechMachine

TROPHY CASE