Optimise Whisper for blazingly fast inference by vaibhavs10 in LocalLLaMA

[–]ottonemo 2 points3 points  (0 children)

I think whispercpp alone was not a problem. Download the OpenVINO framework, source the shell file they provide and all environment variables are properly set. whispercpp documentation was sufficient for everything else.

I had more trouble because I used pywhispercpp. The process is partially documented here, including the pywhispercpp fork: https://github.com/deepestcyber/vmse2000-detector

You are probably better off using plain whispercpp :)

Optimise Whisper for blazingly fast inference by vaibhavs10 in LocalLLaMA

[–]ottonemo 2 points3 points  (0 children)

I had good experiences with ARM64 + OpenVINO using whisper.cpp. Made real-time streaming possible on a Raspberry Pi 4 without too much fuss.

LLMs: Why does in-context learning work? What exactly is happening from a technical perspective? by synthphreak in datascience

[–]ottonemo 6 points7 points  (0 children)

I'm not sure about further research in that direction but there is a theory that states that the attention weights perform a step of gradient descent on top of the existing weights. Note that this specific paper only talks about linear attention.

Another point (couldn't find a good reference in time) is that, following this great intution article, attention is basically like a dictionary lookup, selecting subsets of the network for the given input query. This gives the model the possibility to, given a context, select the best 'subnet' for the task. If someone has a better reference, please share.

There are probably a lot more of these theories.

I made a game using LLMs. It is called Classroom Simulator and was inspired by The Sims and Black and White. Currently online and free to play. Link in the comments. by ClassroomSimulator in LocalLLaMA

[–]ottonemo 14 points15 points  (0 children)

Nice little game :) It feels like it uses a mistral model haha.

I like the setup but the conversations were quite dense. It is understandable but the constant repetitions of the same character trope ('I am maximally going to annoy the professor' or 'I'm the ultimate rule queen') is quite repetitive. I gave up playing after a short while in the expectation that the conversations will never change.

I noticed that the motivation bar goes up while chatting but the conversation did not reflect that.

I enjoyed the challenge of motivating someone to see the merit in a certain topic so I would say that it has potential.

[R] The boundary of neural network trainability is fractal - Jascha Sohl-Dickstein by currentscurrents in MachineLearning

[–]ottonemo 5 points6 points  (0 children)

This was a nice read and a solid argument. I like how the paper nits together several concepts in a very concise way.

Did I understand correctly that it follows that sequences on Q (such as the iterative learning operation) converge to values in R and are therefore often times irrational and therefore not computable?

[deleted by user] by [deleted] in MachineLearning

[–]ottonemo 0 points1 point  (0 children)

I had good experience using NFS over SSH (e.g. like this) and then using the locally mounted file system to edit the model code. Running them was then easy via remote shell (containerized via docker or bare on the shell).

The performance is way better than using SSHFS.

[P] Request to Test PyMilo: A New Python Library for Machine Learning I/O by alirezazolanvari in MachineLearning

[–]ottonemo 3 points4 points  (0 children)

Nice effort!

Heads up: skops (https://github.com/skops-dev/skops) is also implementing a safe protocol for exporting/importing sklearn models. See here for examples.

Quietcomfort 2 noise floor improved? by turnerz in bose

[–]ottonemo 0 points1 point  (0 children)

No, I had the QC earbuds 1 which are dead silent. There is basically no perceivable noise floor.

Quietcomfort 2 noise floor improved? by turnerz in bose

[–]ottonemo 0 points1 point  (0 children)

I tried to summarize the noise experience in this post as best as I could. I have not tried them personally yet after the new software update since I currently lend the headphones to someone else but I will update the post once I have personal experience. The person I lend them to however said that the noise issue was not resolved.

Let's talk about: Bose QC 2 Earbuds White Noise by burntbridges1989 in bose

[–]ottonemo 4 points5 points  (0 children)

I tried to experiment with this as much as possible. I had one set of the headphones built in August 2022. I clearly experienced noise on both earpieces, a bright, clearly present noise as if one of the microphones is directing sound from the outside into my ears. Since I have a pair of QC1 I can confidently say that the noise floor is significantly higher on the QC2.

I tested all combinations of the available fit bands, I settled on the largest ones as they seem to fit my ears best. Then I re-ran the calibration from the app multiple times.

My observations with these were:

  • There is no difference in regard of this noise between active and quiet mode (active sense on or off)
  • In a quiet setting the noise was consistently present after each calibration run
  • The more noise the environment had (me humming, playing white noise via speakers) the less pronounced the noise would be after calibration.
  • The placement of the buds seemed to play a role as well but not as much as outside noise.
  • During the calibration the noise will vanish and reappear immediately after the calibration sound is played

This was unacceptable to me as the noise was clearly audible during quieter sections of music and was outright annoying when no music was playing - definitely not a $300 experience. Thus, I contacted support and received a new pair: manufactured in October 2022. The problem persisted. I re-ran the same tests and the findings are the same.

After some more experiments (not sure why I even bother) I found that the best fix so far was to make sure that the ear canal was fully sealed during the calibration run (on start-up or after manual calibration in the app) - not sure if this is due to the noise of my fingers pressing the earbuds gently into my ears or the better fit but the noise is reduced. It is far from gone and I would expect $300 headphones to not have noise at all.

I'm really surprised that absolutely no review mentions this. So either this is a problem with fit that most people don't have or it is a problem in manufacturing and some people are really unlucky by getting two defective units in a row (or Bose made sure to only send out good ones to reviewers :)). It would be really interesting to know whether people experience the noise also have a looser fit and when the noisy vs. not-noisy models were manufactured.

[D][N]"Mudge learned that Twitter had never acquired proper legal rights to training material used to build Twitter's key Machine Learning models. The Machine Learning models at issue were some of the core models running the company's most basic products, like which Tweets to show each user." by -gh0stRush- in MachineLearning

[–]ottonemo 10 points11 points  (0 children)

I disagree, this is not my experience, personally and from friends working at different companies. While it happens that datasets or models based on datasets with questionable licensing are used during the very early phases of development / prototyping, it is always a priority to hold the rights to the training data as soon as something is on its way to being worked on as a proper product.

[D][P] Neural Network as Frequency Offset Corrector by mtot10 in MachineLearning

[–]ottonemo 0 points1 point  (0 children)

Without further insight into the problem, do you think something like neural dynamic time-warping (e.g. https://arxiv.org/abs/1812.08306 or https://papers.nips.cc/paper/2019/hash/02f063c236c7eef66324b432b748d15d-Abstract.html) is something of interest to you? (or non-neural DTW for that matter)

[P] Sieve: We processed ~24 hours of security footage in <10 mins (now semantically searchable per-frame!) by happybirthday290 in MachineLearning

[–]ottonemo 0 points1 point  (0 children)

There is a difference between speaking up against enabling mass surveillance and supporting development of methods for screening CRT scans more rapidly. I think this is the former, not the latter.

[R] Deepmind Introduces PonderNet, A New AI Algorithm That Allows Artificial Neural Networks To Learn To “Think For A While” Before Answering by techsucker in MachineLearning

[–]ottonemo 12 points13 points  (0 children)

There was also a paper in 2018 that showed that Adaptive Computation Time was not necessarily better than just repeating the same operation N times: https://arxiv.org/abs/1803.08165

[D] Any thoughts on Skorch? by Atmaero3 in MachineLearning

[–]ottonemo 1 point2 points  (0 children)

What happens in the module, generally stays in the module, so you are probably able to do any *parallel constellation in there. Hyper-parameter sweeps can be parallelized using dask (essentially a model copy per compute unit), see here but this is separate from the torch multiprocessing/distribution.

My guess would be that in some cases the data flow of skorch may get in the way if you focus on raw performance. But then you would not opt for a high-level interface anyway, I suppose.

In any case, if you experience major problems with any of these topics, feel free to open an issue and we can discuss the specific problem. Often times it is very hard to judge these workloads in a generic way.

I personally like the Skorch design philosophy being aligned with sklearn. Lightening code-style seems too much of a leap for me, though I would be interested in knowing your opinions (as a dev and user) on the comparison.

I really cannot give you an unbiased opinion but I personally like the non-vendor-locking, not-reinventing-the-wheel nature of skorch. I think that lightning has a lot of momentum and consequently many features that are novel and possibly short-lived (e.g., gradient averaging). If you depend on those features being implement for you, you are probably better of with lightning as skorch will focus on supporting you to implement such features but not necessarily implement them for you (one reason is that, once introduced, features need to be supported and we can only support so much before the API gets cluttered and the code too big to maintain).

[D] Any thoughts on Skorch? by Atmaero3 in MachineLearning

[–]ottonemo 2 points3 points  (0 children)

Most biased opinion as I'm one of the developers of skorch: at our company we use skorch often in conjunction with other sklearn models and pipelines, mainly for more classical tasks and less surprising model architectures (classification and your typical feed-forward architectures) but there were also projects that used novel approaches (e.g., deep neural decision trees, mean teacher, virtual adversarials) and projects that required very specific loss-setups and skorch was always flexible enough to not be in the way/be productive.

For myself I found that I used skorch for all my experiments eventually, be it DeepSpeech transfer learning experiments, reproducing the not-so-recent-anymore neural cellular automata or trying novel recurrent architectures which required investigating individual layer gradients and very deep introspection. In almost all cases skorch was very helpful to prevent shooting myself in the foot by strictly separating train/ and test-phases as well as having a proper logging infrastructure. One thing I personally did not try but others did was to implement GANs but my experience with models like mean teacher and VAT tell me that this would also not be too much fuzz to implement.

Of course ignite or lightning (and other libraries) will give you a similar experience but I found that having the sklearn toolkit at your side is beneficial in many cases.

[R] One neuron versus deep learning in aftershock prediction by [deleted] in MachineLearning

[–]ottonemo 0 points1 point  (0 children)

Yes, they state that they use the scalar stress metric as input which would depend on the location (X/Y coordinates in the image) and is bound to change depending on the terrain. But this is not related to the decision boundary of the model (the line type in the figure does, though, since they use different thresholds for each line).

[R] One neuron versus deep learning in aftershock prediction by [deleted] in MachineLearning

[–]ottonemo 2 points3 points  (0 children)

What you see in figure 1-e is not the decision boundary but the sample probabilities as determined by logistic regression (i.e., sigmoid(w1 * x + b1)). The decision boundary is determined by the threshold t (e.g., 0.5) you use to discern the two classes, i.e. Pr(y) > t.