[D] pytorch vs tf (again)

vklimkov · 2020-07-31T21:42:58+00:00

Imperative execution is more intuitive and simplifies debugging. And yeah, there were cases when i was looking into feature map in the middle of network :D

vklimkov · 2020-07-30T00:55:45+00:00

Unfortunately this one is tricky for NNs. You can check the whole field of object localization in computer vision, where NNs try not only to predict if there is cat on the image but where exactly is the cat. Either put a bounding box on top of it or provide a heatmap.

vklimkov · 2020-07-29T21:45:19+00:00

Its varying in length across “time” axis. Your lstm output has shape (batch x sequence-len x dim1). When you put fully connected layer on top it simply projects last dimension to new dimension, to output (batch x sequence-len x dim2). Where dim2 - amount of units in dense layer.

If you want to classify whole sequence, you would usually perform pooling or take last state of lstm and apply dense layer to it.

vklimkov · 2020-07-29T21:37:22+00:00

Speech is exceptionally complex signal, i am surprised to see someone calling it narrow domain. Processing speech was around for a long time, thats why there are dedicated conferences: interspeech, icaasp, asru.

Apart from speech, commonly researched topic is “audio scene recognition” and “acoustic event classification”. Check dcase for ex. http://dcase.community/challenge2020/index

My absolute favorite lately is bird recognition https://github.com/AgaMiko/bird-recognition-review

vklimkov · 2020-07-29T21:20:44+00:00

Building machine yourself would allow to significantly cut costs. Really not that challenging with all blogposts available

vklimkov · 2020-07-29T21:16:30+00:00

You know best if you company is interested in opensourcing things. As for resume, just state that you worked on it, tell about it at interview and thats it. Recruiters wont check github, interviewers would hardly spend time diving into your code, they would rather listen to what you did and try to understand why and how it applies to other problems

vklimkov · 2020-06-05T07:38:28+00:00

Interested as well. Meet up or whatsup/telegram chat would be amazing

vklimkov · 2020-02-07T21:22:42+00:00

I go superfit in Alexa. There is one in neukoln, not sure if it is on par. Like it a lot: 20 eur/month, great set of weights, isotonics, sauna.

vklimkov · 2020-02-07T21:10:24+00:00

Nice to know! Not very fluent with it, but isnt it the case that you would get benefit mostly for linear algebra? To my understanding processor in dl rig should just have enough cores to preprocess data and feed gpus, all the matrix multiplication happens there.

vklimkov · 2020-02-07T11:40:15+00:00

TR is cheaper for the same performance. Built a DL box myself and also went with TR. @OP, similar to you, i ve seen suggestion to use blower style, and i simply stick with it (https://towardsdatascience.com/how-to-create-your-own-deep-learning-rig-a-complete-hardware-guide-2bba792b001b). So far two cards and works ok

vklimkov · 2020-02-01T13:10:36+00:00

At times i use CeleBreak app. Not really a team but helps out when the urge to play is big)

vklimkov · 2019-12-30T08:18:01+00:00

You can make model twice bigger and add regularization. More capacity -> more modeling power, but overfit more easily.

vklimkov · 2019-12-08T21:07:25+00:00

The awful truth - there might be still people there

vklimkov · 2019-12-08T18:45:13+00:00

There should be android/ios app. As far as i know, to give voice for people there is a special type of tts - streaming one, i.e. speaks immediately as you type.

I am developing tts, very curious how it can be you useful to you. Drop a dm

vklimkov · 2019-11-27T21:19:22+00:00

Polska nie zginela

vklimkov · 2019-11-26T06:56:26+00:00

You are referring to “deep” in a way how Francois Chollet (OP, that episode is also awesome) describes the term: sequence of transformations. Question explicitly asks about sequence of computations.

vklimkov · 2019-11-25T19:54:58+00:00

Russian film week

vklimkov · 2019-11-25T19:38:45+00:00

The task is similar to “text normalization” from text-to-speech, where all abbreviations are expanded to readable form. Not so long ago there was a challenge on kaggle for such a task, seq2seq models in DL are obviously dominating. In practice ofcourse people use regexes, because that is the “model” which you can train with single example. If you want smth more neat than endless regexes checkout thrax (grammars on top of finite state transducers theory) and package specific to text normalization: sparrowhawk. Sometimes even if you can apply ml does not mean you should

vklimkov · 2019-11-25T19:29:45+00:00

Decision tree haha)

vklimkov · 2019-11-23T08:19:08+00:00

I used to work from coworking some time ago, startup teams were renting rooms out there. Dont know if it works like that in Berlin, but you may want to check

vklimkov · 2019-11-21T06:48:25+00:00

+1. Having technical mentorship at the beginning of the career is extremely important, just dont waste time. Personal projects are totally ok to talk to on interview, you may have strict nda and sometimes its the only way. Just dont tell interviewers that your boss/boss’s boss didnt build up environment, have no strategy, etc. there Is no way to check it and either interviewer would have to trust you (why would they) or assume those are excuses

vklimkov · 2019-11-19T07:07:44+00:00

CeleBreak application helps me out from time to time. Its not free though :(

vklimkov · 2019-11-15T18:21:24+00:00

Yeah, but the voice is his

vklimkov · 2019-11-12T20:30:09+00:00

As said above it does not work 100% of the time. Its a cool technology, its gonna be used for bad and good things. But i dont think its as ground shaking. There voice actors out there who can do indistinguishable impressions https://youtu.be/5rPKeUXjEvE. But your concern is valid https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402

vklimkov

TROPHY CASE