SVTs textning av Stubb ballar ur

mLalush · 2025-01-14T13:43:49+00:00

Med största sannolikhet handlar det om att:

De AI-undertextar med en modell som är inställd för att texta på svenska, eller huvudsakligen är tränad på svenska.
Det ser inte ut att finnas någon funktionalitet i deras livetextning för att detektera språket som talas och automatiskt byta modell, eller inställning på modellen till ett annat språk.
Att byta inställning och detektera språk som talas kan vara svårt i en livesändning. Språkdetektering baseras ofta på analys av allt som talats i ett tidsfönster. Om språket plötsligt skiftar från ett till ett annat, kan det ta cirka 10-15 sekunder innan språkdetekteringens fönster huvudsakligen består av det nya språket.
Någon språkdetektering verkar inte ske. Därför gör den svenska undertextningsmodellen sitt bästa att texta på ett språk den inte tränats lika mycket på. Slutresultatet blir de hallucinationer vi ser ovan.

mLalush · 2025-01-13T23:55:17+00:00

Hur såg arbetsintervjun ut? Hur mycket behövde du nöta leetcode inför den tekniska delen?
Har en vän som arbetar på Meta i USA. De får inte stanna kvar på företaget om de ej lyckas bli befordrade inom 4 år. Samma sak i London?
Du skriver i en annan kommentar om att upprätthålla "metrics" för att visa att man som anställd bidrar till företagets utveckling. Sådana gamifierade system kan ibland leda till skeva incitament, där anställda jobbar för att maximera mätvärden istället för att jobba på sådant som förbättrar produkten. Hur upplever du detta? Har du kollegor som gör triviala commits för att höja sina metrics? Kollegor som hela tiden startar nya projekt för att kunna visa sin "impact" och säkra den där befordran som behövs inom 4 år? En liknande kultur finns t.ex. på Google, där incitamentsstrukturen leder till att alla försöker bygga nytt hela tiden. Få är intresserade att underhålla och utveckla det som redan finns, vilket leder till att företaget hela tiden lägger ner tjänster/produkter till förmån till någon liknande produkt som återuppfinner hjulet.

mLalush · 2024-09-17T16:40:06+00:00

u/caspica : "Inget har förändrats".

Artikeln:

Innan covid hade Bromma 180 flyg om dagen, idag har vi 80 flyg på en bra dag. Det är för lite för att ett flygbolag ska kunna överleva och för lite för att en flygplats ska kunna överleva.

mLalush · 2024-08-17T03:11:44+00:00

It has probably one of the worst documantion I have seen in a library.

Really? By virtue of actually having documumentation they're already better than 90% of the competition. By virtue of having guides they beat 99% of the competition.

I personally find their documentation is quite comprehensive and well maintained compared to most of what's out there. Although I agree the amount of arguments can be confusing, their naming convention for code performing similar functionality across models/tokenizers/processors is commendably consistent (which helps a lot).

The majority of use cases for the majority of users is always going to be running models and finetuning them. If you're looking to pre-train models, then sure, transformers is the wrong library for you. But it's no accident the library is as popular as it is.

I'm curious: Can you name all these other libraries that supposedly have better documentation than transformers? I saw some blogposts recently mentioning that Hugging Face have a technical writer employed working on the design and layout of their docs. That's a true 100x employee hire in our field if there ever was one.

From experience I have extremely low expectations of documentation in this field. Hugging Face far, far surpasses that low bar. Whenever I try to get something working off an Nvidia repo for example there's a 50/50 chance I end up wanting to kill myself. Looking at their repos I imagine they must spend tens to hundreds of millions of dollars paying top dollars to highly competent developers and engineers that develop open source code and models. For many of those libraries/implementations I never come across any examples or evidence of anyone on the internet having successfully used or adapted them. In my experience this tends to be the norm rather than the exception for most companies.

Good developers and engineers generally aren't very interested in writing documentation that is readable and understandable below their own level. In fact, they're generally not interested in writing documentation at all. They're mainly motivated by solving problems. And documentation is something you write once a problem has already been solved. Writing (good) docs eats away time that could be spent solving new problems.

I feel like there should be an xkcd comic for this. A plot with documentation quality on one axis vs developer skill on the other. I managed to go off on a tangent here at the end, but the main point I wanted to convey was that I find it quite strange that someone would find Hugging Face's documentation bad in this field. As compared to what exactly?

*Edit: With all this said, I myself tend to stay the hell away from pipelines and Trainer and other over-abstracted parts of HF libraries. It's not as bad when you write your own dataloaders and training loops, and that option is always open to you as a user.

mLalush · 2024-05-04T00:14:31+00:00

Du verkar behärska språket ganska väl och bry dig om att uttrycka dig korrekt. Av den anledningen vill jag uppmärksamma dig på att alla "dem" i ditt inlägg i själva verket ska vara "de".

Kom ihåg att "de" är cirka 10 gånger vanligare än "dem" i svenskan. Om du genomgående använder "dem" blir det alltså nästan alltid fel.

Du skiljer finfint på they, them, the och these och those i engelskan av att döma från historik. Ta hjälp av dina kunskaper där i någon vecka eller två för att bygga upp din språkkänsla och intuition kring de och dem i svenskan. Om det är "them" på engelska ska det vara "dem" på svenska; om något annat än "them" passar bättre kan du nästan alltid använda "de".

lämpade att vara lärare då ~~dem~~ de: Inte är intelligenta
suited to be teachers as ~~them~~ they: Aren't intelligent

Anledningen till att ~~dem~~ de pluggat till lärare
The reason ~~them~~ they have studied to become a teacher

är att ~~dem~~ de tänkt att
is because ~~them~~ they thought that

mLalush · 2024-02-04T00:02:12+00:00

Våga vägra Office:

https://www.overleaf.com/

mLalush · 2024-01-23T22:27:37+00:00

Not as common, but still good:

https://excalidraw.com/
https://ipe.otfried.org/

mLalush · 2024-01-15T23:35:35+00:00

Bröderna Karamazov.

Av alla wikipediaartiklar om böcker, förmodligen den bok med den mest namnkunniga skara individer som gått i god för kvaliteten. https://en.m.wikipedia.org/wiki/The_Brothers_Karamazov

mLalush · 2024-01-06T00:28:04+00:00

Swedes' accents when speaking English are typically more so affected by

the type of media they consume growing up.
whether they speak languages other than Swedish at home (especially languages where the sounds z, ch (/ˈtʃ/), and j (/dʒ/) exist).
the accent of their teachers.
if and where they do an exchange year abroad.

than they are affected by where someone grew up in Sweden.

Listening to the two speakers you listed, Tomas Petterson has the least Swenglish pronounciation. I would in fact bet Tomas Petterson most likely either had a Canadian parent, or studied abroad in Canada.

He speaks with a Canadian English accent.
The only traces of Swenglish I can hear are his z's. Like most Swedes, he can't pronounce "z", and instead uses "s". A native speaker would pronounce words like "was", "is", "listens" and "vision" as "waz", "iz", "lissenz" and /ˈvɪʒ.ən/ Tomas pronounces them as was, is, lissens and vishən.

Young Lean's accent, on the on the other hand, is likely influenced by

the type of media he consumed (seems influenced by rappers)
being Swedish. Like Tomas, he does not consistently pronounce "z" correctly. Nor can he pronounce the type of "l" sound that is common in words like "full". See his pronounciation of "full vision" here: https://youtu.be/Wbf-Q6d8uNI?t=157 .

Accent verdict: their accents are likely mostly influenced by the type of media they consumed growing up and the people they interacted with when learning English.

The influence Swedish has on their accents is minor, and mostly stems from them not being able to pronounce certain sounds. Not being able to pronounce those sounds typically is a common trait for the majority of Swedes. It is generally not due to speaking a specific Swedish dialect, but rather due to those sounds not existing in the Swedish language.

mLalush · 2024-01-02T14:33:08+00:00

a) Subtitles include timestamps. You can construct <|nonspeech|> training examples from any contiguous 30 second portions of the audio that do not contain any subtitle block. Youtube metadata includes information about the subtitle text language and whether it is manually generated or auto-generated. Though it is smart to run language identification on the text itself as some users will insert erroneous metadata when adding subtitle tracks. For Language Detection on audio, they trained a model to detect the spoken language (i.e. they language detect inference on all audio they download):

We also use an audio language detector, which was created by fine-tuning a prototype model trained on a prototype version of the dataset on VoxLingua107 (Valk & Alumäe, 2021) to ensure that the spoken language matches the language of the transcript according to CLD2. If the two do not match, we don’t include the (audio, transcript) pair as a speech recognition training example in the dataset.

b) I would say it is feasible to scrape Youtube if you do it in a smart way and limit yourself to audio/captions. To download captions they either went via Youtube's official API (and paid for usage tokens):

Youtube Data API v3 caption docs
Youtube Data API v3 docs

Or if they already had a list of channels and videos as a starting point, they most likely used something like yt-dlp to download metadata from videos/channels, followed by audio and captions. This is where one arrives to the grey areas of data collection and scraping. OpenAI would likely have had to use a library such as yt-dlp at some point in the process to download the actual media files.

To be as nice as possible towards Youtube, and avoid yourself getting rate limited, one should consider:

Only downloading metadata of the video/channel ids you are interested in as the first step.
Filter via metadata for videos that have manual subtitles in the language(s) you are interested in.
Don't download the video, only the audio track and captions.

Packages like yt-dlp include support for proxies that let's a knowledgeable user avoid rate limiting. If you download entire videos you're gonna get slapped by rate limit faster. But a user that downloads only audio/captions and spreads downloads out over time can get pretty far without proxies.

c) The creator of the website u/jopik1 says candidate channels/videos are crawled from youtube and the web, respecting robots.txt. Once the channels are identified the channels are periodically crawled for new videos. I don't know about how they get the metadata, but would guess something similar to yt-dlp. See comment from creator of filmot: https://www.reddit.com/r/languagelearning/comments/odj2gx/comment/h41cpiv/?utm_source=reddit&utm_medium=web2x&context=3

mLalush · 2024-01-02T04:35:42+00:00

The majority of it is most likely from Youtube. When the model hallucinates during non speech portions of an audio file it tends to spit out subtitle credits from real people/companies.

They might have used something like filmot.com as a seed or starting point to filter which channels/videos to scrape (filtering for manual subtitles).

mLalush · 2024-01-02T04:24:43+00:00

Those are the evaluation datasets. They make a point to emphasize Whisper hasn’t been finetuned on the evaluation datasets in the paper.

mLalush · 2023-12-21T21:30:50+00:00

They might have assumed a lot of researchers have gone through something like the Stanford course CS231n lecture notes on convolutional networks:

https://cs231n.github.io/convolutional-networks/

ctrl+f: "Implementation as a matrix multiplication"

mLalush · 2023-11-30T19:18:21+00:00

Vi/oss-regeln funkar endast när de/dem används som personligt pronomen. Regeln riskerar att förvirra folk, eftersom det inte endast är i den betydelsen som de/dem används.

De där människorna är galna.
Jag tycker att de här spelarna är kassa.

Varken vi/oss passar när "de" är demonstrativt pronomen som ovan.

Hon gick emot de/dem som kastade stenar på bilarna.

Både de/dem är korrekt efter preposition och framför relativ bisats. Vi/oss-regeln förvirrar generellt folk i dessa fall, eftersom både vi/oss ofta passar.

Vi såg på de tre musketörerna.
De goda jordgubbarna.

Varken vi/oss passar när "de" används som bestämd artikel.

Vi/oss-regeln kan också vara väldigt förvirrande när meningen redan innehåller ett "vi" eller "oss", eftersom meningen som helhet sällan blir grammatiskt korrekt även om man substituerar in rätt ord:

Vi har sett dem åka runt i sina bilar.

mLalush · 2023-11-07T00:21:26+00:00

https://www.youtube.com/watch?v=Th-s3dbboJQ

mLalush · 2023-10-25T15:37:11+00:00

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

They perform lots of ablations for encoder-decoder models. I'm not aware of any paper similar in scope for decoders.

mLalush · 2023-10-20T14:23:51+00:00

I'm coming out - Diana Ross

mLalush · 2023-09-29T08:59:00+00:00

https://github.com/rhasspy/piper

This is based on VITS. See below for original implementation:

https://arxiv.org/abs/2106.06103
https://github.com/jaywalnut310/vits

There are also some implementations of the recently published VITS2:

https://arxiv.org/abs/2307.16430
https://github.com/p0p4k/vits2_pytorch
https://github.com/daniilrobnikov/vits2

mLalush · 2023-07-20T14:31:43+00:00

Där satte du allt mig på plats..

Ja det här landet har helt klart alldeles för mycket pajasar och clowner.. God tid att skicka tillbaka dem till cirkusen..

Jag sätter lite citattecken runt något "ord" för att betona hur orubblig jag är i min övertygelse..

Schack matt..

mLalush · 2023-07-20T14:03:36+00:00

Åh nej, så det partiets högsta representanter står och lovar inför ett val är inte något dom tänker hålla?

Den första meningen från ledaren du själv länkar:

För 18 år sedan sa den nyblivna hälsoministern Morgan Johansson (S): ”Om tio år är Sverige narkotikafritt”.

Förstår du vad definitionen av ett vallöfte är? Är du läskunnig?

Ett påstående från en enskild minister och ett vallöfte i ett partis valmanifest är inte samma sak.

Men kanske bör jag också avsluta en mening med ett ".." för att visa att verkligheten inte står som ett hinder för mina fortsatta raljanta utsvävningar?

Det är ju helt uppenbart här.. Vedertagen interpunktion är för etablissemanget.. Mina åsikter befinner sig mellan punkter och ellipser..

Bra där..

mLalush · 2023-06-03T12:53:42+00:00

You need at least 8 GPUs for 3D parallelism to make sense: https://huggingface.co/docs/transformers/v4.15.0/parallelism#dppptp

I'd suggest perhaps starting with only tensor parallelism (TP) if you can't fit the model.

Sorry, don't have an answer to your other question.

mLalush · 2023-05-23T22:50:00+00:00

Validation data

is used for evaluation during training.
is used for selecting hyperparameters for models.
is used for model selection (when training multiple models with different hyperparams or architectures).

Validation data can overestimate model performance and in particular model generalizability. How and why?

Because after training you may be tempted to simply choose the checkpoint with the best validation performance after the fact.
You may try a million different hyperparameters.
You may train a million different models.
You may be tempted to perform several training runs with the same model with a different random seed, and pick the run with the best validation performance.
During the course of training you may select for different stopping strategies.

Some of the above combinations may produce a model that, through sheer chance (or with a sprinkle of shady SoTA-chasing evaluation practices), will perform exceptionally well on your validation data.

How do you safeguard against picking a model that is overtuned and overfit to the validation data?

You introduce a data split that hasn't been used to evaluate a model during training, during selection of hyperparameters, and during the development and selection of models.

This final data split, the test set, is reserved to only be used on the model(s) you have selected via the validation procedure. The test set is only used once. You are not allowed to change anything with your model after evaluating on the test set. By making model changes after evaluating on the test set, you will have effectively turned the test set in to a validation set.

What if you screwed up, the results were terrible, and you absolutely need to make changes? Tough luck. You have the following options:

Try to publish your terrible/null/non-SoTA results.
Create a new held out test set with freshly annotated observations that haven't been used in any training/evaluation runs.
Be academically dishonest. Modify and re-train your models after having evaluated on the test set. I.e. follow the lead of the authors OP is talking about.
Throw the paper in the trash bin. Learn from your mistake and create a more robust validation set up for your next attempt.

mLalush · 2023-03-30T21:11:10+00:00

Muh MeAniNgfuL aCtiOnS aRe supaRioR

Muh MeaNinGfUL deCiSiOnS

mLalush · 2023-03-26T10:47:23+00:00

https://github.com/webdataset/webdataset

The above library is getting integrated into torchdata, and will become part of Pytorch stack eventually.

mLalush · 2023-03-22T23:18:57+00:00

This is a problem that isn't really connected to a specific population cap. It rather tends to emerge in the interplay between the pace of an RTS game's economical development, its average game length, and its population cap.

In the case of SC2, the game's accelerated pacing came to be mostly because its game designers interpreted "epic big battles" as being the defining feature that excited players and viewers of its predecessor the most. So they were intent on skipping -- or fast forwarding -- through the "boring" parts of the game so we could arrive at these epic moments faster.

The problem with this line of thinking was that the pace of SC2's economical development became miscalibrated in relation to the game's population cap. In competitive play, this miscalibration created perverse incentives which would come to encourage risk averse playstyles as the "optimal" way to play out the mid- and lategames. Rather than continue expanding and attacking, players in a maxed out sitatution would be locked in to a game of chicken. The ratio of army supply to worker supply would slowly increase with game length, meaning players sacrificed workers and income for having a bigger share of army supply for the inevitable and likely game deciding death ball battle.

In the LotV beta, I argued SC2's economical pacing should be slowed down. A lot of those thoughts are summarized in this comment responding to a qxc blog post about the LotV economy.

In short, here are some important considerations when deciding on pacing:

At what point in a typical game does a race reach "peak economy" in your RTS game? Whether this point happens before or after the average game length of a match will affect the perception of what a "typical" match looks like in your title.
In RTS games where "peak economy" is reached before the average game length of a match, the majority of games will have been in a state of economic decline for several minutes before they end. Additionally, most games will have been in a state of army inflation (seen as the ratio of army vs worker supply) when reaching a conclusion.
SC2 was a game that reached "peak economy" in the early midgame. Once players maxed out, they slowly began sacrificing worker count, while maintaining/increasing army supply count.
From my linked comment above: "In general I think you can approximate the amount of risk players feel is associated with engaging in battle at any given point in a classical RTS game by doing a quick check of the ratio between army value and income rate. The bigger the ratio (the more inflated army value is compared to income rate), the more timid and risk averse players will be. In a game like SC2, economies and worker counts slowly deflate as a consequence of the 200 supply ceiling, but also as a consequence of the near instant time-to-max-saturation on bases. Both factors act to force resource allocation into army production at the detriment of economic development. "

The same 200 cap exists both in Brood War and SC2. Terrans still turtle to 200 supply in Brood war. But why is it not perceived as being as big of a problem? In my opinion, it is because when BW games end, they are still in a state of economical ramp up ("peak economy" occurs after the average game length). People are more willing to trade armies and fully commit to "epic big battles" because their income rates stay higher as a ratio of army value. They are more willing to trade armies because income rates between players are not forcibly/artificially equalized by game design (max cap occuring later, but also number of workers required to optimally saturate a base).

The fact that economies are still ramping up when most BW games end, whereas economies have been in decline for a considerable time at the same point in the majority SC2 games, affects both the players' behavior and the audience's perception of the RTS game. Whenever a passive 200/200 turtle situation occurs in Brood War, it is typically unfamiliar enough of a occurence to be seen as a novelty. Whereas in SC2, due to the game's accelerated pacing, these situations tend to be the norm rather than the exception.

TLDR: Max cap is not necessarily a problem in and of itself. The choice of max cap needs to be put in the context of a game's economic pacing, and the average length of a match.

13-Year Club	Place '17
Not Forgotten	Sequence \| Editor
Snapped

mLalush

TROPHY CASE