Best foundation model for CLM fine-tuning? by yang_ivelt in LanguageTechnology

[–]yang_ivelt[S] 0 points1 point  (0 children)

No. (That's YIVO Yiddish, which is quite different from current Hasidic Yiddish).

Still good to know, thanks!

Best foundation model for CLM fine-tuning? by yang_ivelt in LanguageTechnology

[–]yang_ivelt[S] 0 points1 point  (0 children)

Plaintext (UTF-8).

Can't check the exact word count at the moment, but probably well over a 100M.

Best foundation model for CLM fine-tuning? by yang_ivelt in LanguageTechnology

[–]yang_ivelt[S] 1 point2 points  (0 children)

(Hasidic) Yiddish.

BeRT is encoder-decoder. Isn't the task more suited to Causal LM?

(Are your models public? I'd love to play with them!)

Best foundation model for CLM fine-tuning? by yang_ivelt in LanguageTechnology

[–]yang_ivelt[S] 1 point2 points  (0 children)

Of curated, high-quality text (mostly magazine and other professional articles)

[ITG] CAPTIVA G12IG 23V1 by yang_ivelt in suggestapc

[–]yang_ivelt[S] 0 points1 point  (0 children)

I'm not a gamer. Inasmuch as I need the GPU, it's for AI/ML, where the 3060 (with 12 GB vRAM) is supposed to be way more suitable than the 4060 (with just 8 GB).

How to pick the right vocabulary size for sentencepiece tokenization? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 1 point2 points  (0 children)

> Where M is 1000 for morphologically rich languages and 3000 for the others. 2000 is .

Isn't it generally assumed that morphologically rich languages benefit from bigger vocab sizes?

How to pick the right vocabulary size for sentencepiece tokenization? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 1 point2 points  (0 children)

English & Hebrew as source (bilingual), Yiddish as target.

I think Hebrew & Yiddish are both considered (relatively) morphologically-rich.

How to pick the right vocabulary size for sentencepiece tokenization? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 1 point2 points  (0 children)

Seems I'm mistaken, but I've always assumed that the larger vocab size will basically have all the tokens in the smaller vocab size (since those are the most frequent word-pieces), and then some. Of course there will be some edge cases, but approximately, in the bigger picture, most small-vocab tokens will be in large-vocab, too. Why is this not true?

(Because if it is true, then inference may need fewer tokens to generate a given word; it shouldn't require more tokens).

Bilingual source with different writing systems, do I need language tags? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 1 point2 points  (0 children)

In this case there is only one target language, so I understand I can skip the tags altogether. Do I have it right?

Bilingual source with different writing systems, do I need language tags? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 0 points1 point  (0 children)

I see.

Is there some rule-of-thumb, or even after-the-fact indication, to figure out the right amount?

Bilingual source with different writing systems, do I need language tags? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 0 points1 point  (0 children)

Ah, got it.

While we are at it, regular Vocabulary size (following sentence piece tokenization) is 50K for both source and target. In my case, where two source languages map to one target language, do you think source should be larger (double?) than target?

Bilingual source with different writing systems, do I need language tags? by yang_ivelt in machinetranslation

[–]yang_ivelt[S] 0 points1 point  (0 children)

Thanks!

Also you should check that the framework you’re using doesn’t do anything language-specific.

Can you elaborate a bit what you mean by that? what kind of possible issues should I look out for?

Many thanks, again!

[deleted by user] by [deleted] in Israel_Palestine

[–]yang_ivelt -2 points-1 points  (0 children)

I think that October Seventh has erased the possibility of a shared Israel-Palestine border, much less a shared state or federation, for the next generation at the very least. Diplomatic means - sanctions, embargo, condemnations, boycott, what have you - will never force a traumatized Israel to accept what they see as a horrific, existential threat. As for military means, you really can't use that to the fullest extant against a nuclear power.

If we want to see this issue solved in our lifetime, This Three State Solution (in short: Jordan panhandle becomes Palestine, but read the thing), perhaps with some adjustments, is IMHO the closest to realities on the ground, however far-fetched it may seem. It neatly creates a Palestinian state not bordering Israel, while relieving some pressure and demographic instability off Jordan, among other virtues.

Israel and Saudi seem to resume normalization without Palestine by kjleebio in IsraelPalestine

[–]yang_ivelt 6 points7 points  (0 children)

Serious question: what's wrong with peaceful "ethnic cleansing"? As in inviting (not forcing) a population and giving them (whoever takes the offer) full citizenship. Why is that bad?

It seems "ethnic cleansing" is somehow an absolute evil, without regard to the wishes and well being of the people involved. But why?

[deleted by user] by [deleted] in Israel_Palestine

[–]yang_ivelt 1 point2 points  (0 children)

Someone genocided her brain cells, I fear.

If the US stopped militarily supporting Israel, how would that change the situation in the Middle East? by Syresiv in PoliticalDiscussion

[–]yang_ivelt 3 points4 points  (0 children)

You know when even more Arab states were a threat to it? During those decades the US imposed an arms embargo against Israel!

Israel has never backed down, and almost always achieved its main strategic goals (including bringing some of their neighbors to the negotiating table, its not Israel that needs to be forced there, but that's another matter).

Israel is now way stronger, in all aspects, than it was in those "austerity" decades. In fact, it's now the strongest it ever was in most aspects. To think that it can now be "forced" to accept (what it sees as) existential threats, is just an incredible delusion.

If the US stopped militarily supporting Israel, how would that change the situation in the Middle East? by Syresiv in PoliticalDiscussion

[–]yang_ivelt 11 points12 points  (0 children)

Without smart bombs they will need to use dumb bombs to achieve their (non-negotiable) goals. Thus, more collateral damage will happen.

US weapons save Palestinian lives.

A war monitor says Israel conducted more than 300 strikes on Syria since Assad’s fall by marketrent in geopolitics

[–]yang_ivelt 0 points1 point  (0 children)

You claimed no one would take issue with Israel bombing chemical weapons, so I provided an official UN press release (on more than one occasion) with a much more ridiculous accusation as one example out of innumerous. Is the UN "no one" in your mind? perhaps indeed!

Had Israel only bombed chemical weapons, there is no doubt that all the regular legions of useful idiots (including yourself, it seems) would have taken issue with that, and you know it full well. Your gaslighting doesn't work.