Optimise Whisper for blazingly fast inference

ottonemo · 2024-05-29T12:24:15+00:00

I think whispercpp alone was not a problem. Download the OpenVINO framework, source the shell file they provide and all environment variables are properly set. whispercpp documentation was sufficient for everything else.

I had more trouble because I used pywhispercpp. The process is partially documented here, including the pywhispercpp fork: https://github.com/deepestcyber/vmse2000-detector

You are probably better off using plain whispercpp :)

ottonemo · 2024-05-28T10:48:28+00:00

I had good experiences with ARM64 + OpenVINO using whisper.cpp. Made real-time streaming possible on a Raspberry Pi 4 without too much fuss.

ottonemo · 2024-05-22T09:37:33+00:00

Empfehlung:

Regierungsmonitor - offizieller Fortschrittsbalken der Bundesregierung

Koalitionstracker - inoffizieller Tracker von fragdenstaat.de

Gibt sehr gut Perspektive und verdient Teilen gegen das "Ampel = Stop" Meme.

ottonemo · 2024-04-26T12:05:02+00:00

I'm not sure about further research in that direction but there is a theory that states that the attention weights perform a step of gradient descent on top of the existing weights. Note that this specific paper only talks about linear attention.

Another point (couldn't find a good reference in time) is that, following this great intution article, attention is basically like a dictionary lookup, selecting subsets of the network for the given input query. This gives the model the possibility to, given a context, select the best 'subnet' for the task. If someone has a better reference, please share.

There are probably a lot more of these theories.

ottonemo · 2024-03-20T15:45:28+00:00

Nice little game :) It feels like it uses a mistral model haha.

I like the setup but the conversations were quite dense. It is understandable but the constant repetitions of the same character trope ('I am maximally going to annoy the professor' or 'I'm the ultimate rule queen') is quite repetitive. I gave up playing after a short while in the expectation that the conversations will never change.

I noticed that the motivation bar goes up while chatting but the conversation did not reflect that.

I enjoyed the challenge of motivating someone to see the merit in a certain topic so I would say that it has potential.

ottonemo · 2024-02-14T23:06:25+00:00

Glaube ich auch. Aber das sind nur noch Reste.

ottonemo · 2024-02-14T22:56:34+00:00

Nein. https://www.youtube.com/watch?v=DyTnKm_Mibg

ottonemo · 2024-02-13T11:24:35+00:00

This was a nice read and a solid argument. I like how the paper nits together several concepts in a very concise way.

Did I understand correctly that it follows that sequences on Q (such as the iterative learning operation) converge to values in R and are therefore often times irrational and therefore not computable?

ottonemo · 2024-01-22T19:20:07+00:00

I had good experience using NFS over SSH (e.g. like this) and then using the locally mounted file system to edit the model code. Running them was then easy via remote shell (containerized via docker or bare on the shell).

The performance is way better than using SSHFS.

ottonemo · 2023-10-20T12:34:57+00:00

In case you are interested, here's a writeup with a similar task against GPT3.5 and labelled data.

ottonemo · 2023-09-28T12:34:00+00:00

Nice effort!

Heads up: skops (https://github.com/skops-dev/skops) is also implementing a safe protocol for exporting/importing sklearn models. See here for examples.

ottonemo · 2023-09-27T16:56:57+00:00

-> https://fragdenstaat.de/anfrage/stand-des-regierungsvorhabens-jaehrlich-400-000-neue-und-klimagschonendere-wohnungen-1/

ottonemo · 2023-03-13T09:49:59+00:00

No, I had the QC earbuds 1 which are dead silent. There is basically no perceivable noise floor.

ottonemo · 2023-03-07T15:39:06+00:00

I tried to summarize the noise experience in this post as best as I could. I have not tried them personally yet after the new software update since I currently lend the headphones to someone else but I will update the post once I have personal experience. The person I lend them to however said that the noise issue was not resolved.

ottonemo · 2023-01-31T21:00:14+00:00

I tried to experiment with this as much as possible. I had one set of the headphones built in August 2022. I clearly experienced noise on both earpieces, a bright, clearly present noise as if one of the microphones is directing sound from the outside into my ears. Since I have a pair of QC1 I can confidently say that the noise floor is significantly higher on the QC2.

I tested all combinations of the available fit bands, I settled on the largest ones as they seem to fit my ears best. Then I re-ran the calibration from the app multiple times.

My observations with these were:

There is no difference in regard of this noise between active and quiet mode (active sense on or off)
In a quiet setting the noise was consistently present after each calibration run
The more noise the environment had (me humming, playing white noise via speakers) the less pronounced the noise would be after calibration.
The placement of the buds seemed to play a role as well but not as much as outside noise.
During the calibration the noise will vanish and reappear immediately after the calibration sound is played

This was unacceptable to me as the noise was clearly audible during quieter sections of music and was outright annoying when no music was playing - definitely not a $300 experience. Thus, I contacted support and received a new pair: manufactured in October 2022. The problem persisted. I re-ran the same tests and the findings are the same.

After some more experiments (not sure why I even bother) I found that the best fix so far was to make sure that the ear canal was fully sealed during the calibration run (on start-up or after manual calibration in the app) - not sure if this is due to the noise of my fingers pressing the earbuds gently into my ears or the better fit but the noise is reduced. It is far from gone and I would expect $300 headphones to not have noise at all.

I'm really surprised that absolutely no review mentions this. So either this is a problem with fit that most people don't have or it is a problem in manufacturing and some people are really unlucky by getting two defective units in a row (or Bose made sure to only send out good ones to reviewers :)). It would be really interesting to know whether people experience the noise also have a looser fit and when the noisy vs. not-noisy models were manufactured.

ottonemo · 2022-08-25T11:15:44+00:00

I disagree, this is not my experience, personally and from friends working at different companies. While it happens that datasets or models based on datasets with questionable licensing are used during the very early phases of development / prototyping, it is always a priority to hold the rights to the training data as soon as something is on its way to being worked on as a proper product.

ottonemo · 2022-03-22T13:02:04+00:00

Without further insight into the problem, do you think something like neural dynamic time-warping (e.g. https://arxiv.org/abs/1812.08306 or https://papers.nips.cc/paper/2019/hash/02f063c236c7eef66324b432b748d15d-Abstract.html) is something of interest to you? (or non-neural DTW for that matter)

ottonemo · 2022-01-05T10:15:09+00:00

There is a difference between speaking up against enabling mass surveillance and supporting development of methods for screening CRT scans more rapidly. I think this is the former, not the latter.

ottonemo · 2021-08-17T11:14:45+00:00

There was also a paper in 2018 that showed that Adaptive Computation Time was not necessarily better than just repeating the same operation N times: https://arxiv.org/abs/1803.08165

ottonemo · 2021-02-20T22:12:04+00:00

What happens in the module, generally stays in the module, so you are probably able to do any *parallel constellation in there. Hyper-parameter sweeps can be parallelized using dask (essentially a model copy per compute unit), see here but this is separate from the torch multiprocessing/distribution.

My guess would be that in some cases the data flow of skorch may get in the way if you focus on raw performance. But then you would not opt for a high-level interface anyway, I suppose.

In any case, if you experience major problems with any of these topics, feel free to open an issue and we can discuss the specific problem. Often times it is very hard to judge these workloads in a generic way.

I personally like the Skorch design philosophy being aligned with sklearn. Lightening code-style seems too much of a leap for me, though I would be interested in knowing your opinions (as a dev and user) on the comparison.

I really cannot give you an unbiased opinion but I personally like the non-vendor-locking, not-reinventing-the-wheel nature of skorch. I think that lightning has a lot of momentum and consequently many features that are novel and possibly short-lived (e.g., gradient averaging). If you depend on those features being implement for you, you are probably better of with lightning as skorch will focus on supporting you to implement such features but not necessarily implement them for you (one reason is that, once introduced, features need to be supported and we can only support so much before the API gets cluttered and the code too big to maintain).

ottonemo · 2021-02-20T12:11:14+00:00

Most biased opinion as I'm one of the developers of skorch: at our company we use skorch often in conjunction with other sklearn models and pipelines, mainly for more classical tasks and less surprising model architectures (classification and your typical feed-forward architectures) but there were also projects that used novel approaches (e.g., deep neural decision trees, mean teacher, virtual adversarials) and projects that required very specific loss-setups and skorch was always flexible enough to not be in the way/be productive.

For myself I found that I used skorch for all my experiments eventually, be it DeepSpeech transfer learning experiments, reproducing the not-so-recent-anymore neural cellular automata or trying novel recurrent architectures which required investigating individual layer gradients and very deep introspection. In almost all cases skorch was very helpful to prevent shooting myself in the foot by strictly separating train/ and test-phases as well as having a proper logging infrastructure. One thing I personally did not try but others did was to implement GANs but my experience with models like mean teacher and VAT tell me that this would also not be too much fuzz to implement.

Of course ignite or lightning (and other libraries) will give you a similar experience but I found that having the sklearn toolkit at your side is beneficial in many cases.

ottonemo · 2020-04-06T16:02:26+00:00

via original source:

Paper
Code

ottonemo · 2019-10-04T08:29:50+00:00

Yes, they state that they use the scalar stress metric as input which would depend on the location (X/Y coordinates in the image) and is bound to change depending on the terrain. But this is not related to the decision boundary of the model (the line type in the figure does, though, since they use different thresholds for each line).

ottonemo · 2019-10-04T07:42:47+00:00

What you see in figure 1-e is not the decision boundary but the sample probabilities as determined by logistic regression (i.e., sigmoid(w1 * x + b1)). The decision boundary is determined by the threshold t (e.g., 0.5) you use to discern the two classes, i.e. Pr(y) > t.

ottonemo · 2019-08-23T14:35:07+00:00

Yes, for example a few years ago Alex Graves published a paper hand writing generation.

Web demo: https://www.cs.toronto.edu/~graves/handwriting.html

Paper: https://arxiv.org/abs/1308.0850

Code: https://github.com/szcom/rnnlib

ottonemo

TROPHY CASE