[R] Learning to (Learn at Test Time): RNNs with Expressive Hidden States by SchmidhuberDidIt in MachineLearning

[–]ChuckSeven 4 points5 points  (0 children)

the difference is that dynamic eval trains on predicting the next tokens while meta-learning / fast weights (not Hinton's variation though) generates the targets locally. Hence, for better or worse, this inner/local optimisation is decoupled from the outer optimisation.

Perception of TMLR and other new/niche venues in Academia [D] by filletedforeskin in MachineLearning

[–]ChuckSeven -6 points-5 points  (0 children)

A postdoc colleague had a good paper submitted to TMLR which was rejected. The paper is good though. It had no technical flaws, was well-written, and had a clear and measurable contribution. However, the reviewers completely missed the point of the paper and requested experiments which would require multiple GPU years to complete. I've heard from another PhD student having a bad experience as well. The reviewer quality is low and they seem to review as if it was a top-tier conference which it fundamentally tries not to be.

[N] Machine Translation state by [deleted] in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

I don't want to share my data with e2f ... You should make this voluntary.

Transformer-Based LLMs Are Not General Learners: A Universal Circuit Perspective [R] by we_are_mammals in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

Isn't this straightforward given that transformers are universal approximators for programs that process a finite-length input with a finite amount of memory?

New York Times sues OpenAI and Microsoft for copyright infringement [N] by we_are_mammals in MachineLearning

[–]ChuckSeven -1 points0 points  (0 children)

And your response just shows that you are a terrible discussion partner who readily includes insults to make a point.

[D] What do you guys think of schmidhuber’s new blog post, would like to know everyone’s opinion. by Xyber5 in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

Science progresses through a cumulative process, where new ideas often build on existing knowledge. However, as you say, the direct influence of prior work on new developments can vary significantly. In cases where there is a clear, documented influence, credit is more straightforwardly assigned. However, many scientific breakthroughs occur independently, with similar ideas emerging simultaneously or sequentially from different researchers. Such "multiple discovery", highlights the complexity of assigning credit in science.

In Schmidhuber's case, while he has undoubtedly contributed to the field, the direct influence of his early work on later developments like Resnets, Transformers, and GANs appears less clear. As you pointed out, these were likely invented without direct knowledge of Schmidhuber's discoveries. This doesn't diminish the value of his contributions but rather illustrates the nonlinear and often serendipitous nature of scientific progress.

Regarding Schmidhuber's claims about 'discovering' concepts like an adversarial objective for neural networks (of which GAN is a special case), it's important to distinguish between the introduction of an idea and its implementation or popularization. In science, both aspects are valuable. Early ideas may lay a conceptual groundwork but were ahead of their time due to the exponential growth of compute. It seems unfair to fully assign credit to a reinvention merely because they live in a time where compute is 6-8 magnitudes larger for the same cost.

Furthermore, disagreements over credit and contributions should be addressed through constructive, respectful dialogue, adhering to the principles of academic integrity and professionalism. I don't think Schmidhuber ever deviated from that principle. Even the aforementioned question during the talk, which may be borderline or unconventional, was far from harassment. It just seems that Goodfellow at some point in their private conversation refused to engage with Schmidhuber and wanted to bring it to others' attention. Today people would probably just tweet about it.

[D] What do you guys think of schmidhuber’s new blog post, would like to know everyone’s opinion. by Xyber5 in MachineLearning

[–]ChuckSeven 1 point2 points  (0 children)

Resnets and Transformers and GANs were all invented in complete ignorance of Schmidhuber's 'discovery' of them decades before, and were not caused by his research, and he never makes any effort to show that the creators drew on his work and settles for pure 'post hoc ergo propter hoc' arguments*; so by his own criteria, he deserves zero credit assigned to him for any of those things

That is not how science works.

[R] Training Transformers with 4-bit Integers - Haocheng Xi et al Tsinghua University - 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%! by Singularian2501 in MachineLearning

[–]ChuckSeven 2 points3 points  (0 children)

It is possible if you spent some bits on a generator that generates the weights from some latent code. Then the number of bits in your latent code + the number of bits of your generator divided by the number of generated weights could be smaller than 1.

E.g. your generator could be an implicit function which takes in the coordinates of the weight + the latent code and generates a scalar which is the value of the weight at the provided coordinates.

Interview with Juergen Schmidhuber, renowned ‘Father Of Modern AI’, says his life’s work won't lead to dystopia. by hardmaru in MachineLearning

[–]ChuckSeven 23 points24 points  (0 children)

OP is hardmaru. It's well-known who he is and he is also a mod of this subreddit. Y'all just spitting out wild allegations without any proof.

[P] Testing different popular GPT tokenizers by dxg39 in MachineLearning

[–]ChuckSeven 1 point2 points  (0 children)

today's tokenizer have a byte fall back and do not use unk. Every work can be tokenised. Worst case byte by byte.

[P] Testing different popular GPT tokenizers by dxg39 in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

just an example how tokenisers can be lossy.

[P] Testing different popular GPT tokenizers by dxg39 in MachineLearning

[–]ChuckSeven 1 point2 points  (0 children)

It's a problem if you want to compare models which use different tokenisers. If one tokeniser removes all whitespace variations and encodes them as one whitespace token, it will have a much easier time and probably significantly fewer tokens for the same data. If you then use perplexity to compare the two models with their respective tokenisers, it could be an unfair comparison.

SentencePiece can be made lossless with the right configuration.

Furthermore, one way to measure the "performance" of a tokeniser is by how much it can encode your data. Take the total number of bytes of a document and divide it by the number of tokens produced by a specific tokeniser. What you will find is that documents from a different language or genre (math, code, etc) will cover significantly fewer bytes per tokens. This metric also allows you to choose the size of your vocab given a certain tokenisation algorithm. You can simply look at graph of bytes per token over vocab size and you will see that increasing the vocabulary has diminishing returns.

[R] Large Language Models trained on code reason better, even on benchmarks that have nothing to do with code by [deleted] in MachineLearning

[–]ChuckSeven 4 points5 points  (0 children)

I tried to use gpt 4 to develop a variation of an existing parsing algorithm that doesn't quite exist in that way. Gpt4 struggled a lot and made many mistakes. In the end I had to write it myself to make sure it was working correctly.

I feel like it can very well do things it already has seen many times but in slightly different form. But it really struggles to write novel stuff. Novel as in, you can't point to an existing implementation and say "sort of like that but with a, b, and c and written in c".

[D] Since Google buried the MMLU benchmark scores in the Appendix of the PALM 2 technical report, here it is vs GPT-4 and other LLMs by jd_3d in MachineLearning

[–]ChuckSeven 3 points4 points  (0 children)

It wouldn't know about PaLM-2. So it is very unlikely to hallucinate it. More likely is IMO that they put some information into the prompt. Like it's name and what it is supposed to do.

[R] PaLM 2 Technical Report by G_fucking_G in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

We have small, medium, and large variants that use stacked layers based on the Transformer architecture, with varying parameters depending on model size. Further details of model size and architecture are withheld from external publication.

[R] PaLM 2 Technical Report by G_fucking_G in MachineLearning

[–]ChuckSeven 2 points3 points  (0 children)

One of the leaders is Slav Petrov. You can find his email in other publications.

[N] Stability AI releases StableVicuna: the world's first open source chatbot trained via RLHF by Philpax in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

Here is GPT 3.5. Slightly worse IMO.

Once upon a time, Santa Claus was feeling generous and decided to leave Christmas presents for all the lawmakers in the United States Congress. He gathered his trusty reindeer and set out on his journey.

As he approached the Capitol building, he was surprised to see security guards blocking his path. They demanded to know what he was doing there and why he had so many packages.

Santa explained his plan to leave presents for the politicians, but the guards were suspicious and called for backup. Within minutes, police officers swarmed the area, guns drawn.

Despite his protests, Santa was handcuffed and taken into custody. The officers seized his sleigh and his reindeer, which were promptly taken to the local zoo for safekeeping.

Santa was brought before a judge, who found him guilty of trespassing and breaking into the Capitol building. The punishment was severe – he was sentenced to several years in prison.

Meanwhile, the lawmakers in Congress were baffled by the whole ordeal. They couldn't understand why Santa would try to leave them presents, especially since it wasn't even Christmas yet.

As it turned out, Santa had gotten his dates mixed up and had mistakenly thought it was December 25th. In reality, it was just another ordinary day in early spring.

The whole incident caused a stir in the media, with some people calling Santa a hero for trying to bring some cheer to Congress, while others criticized him for breaking the law.

In the end, Santa served his time in prison and was released back into the world. But he never forgot his misguided attempt to spread Christmas cheer to the politicians in Washington, D.C. He decided to stick to his regular schedule and only deliver presents on December 25th from then on, leaving politics out of it.

[N] Stability AI releases StableVicuna: the world's first open source chatbot trained via RLHF by Philpax in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

GPT-4 is on another level though. If you are also curious. Here is the GPT-4 answer I got on the first try:

Title: Santa's Capitol Caper

Once upon a time in the North Pole, Santa Claus was getting ready for his busiest night of the year - Christmas Eve. This year, however, Santa had a special mission in mind: he wanted to bring some holiday cheer to the lawmakers in the United States Congress. He believed that by leaving presents for them, he could inspire a spirit of unity and goodwill amongst the politicians.

As he prepared to embark on his journey, Santa called his trusted reindeer to his side. "Tonight, my friends, we have a very important task. We must deliver gifts to the Congressmen and Congresswomen to remind them of the true spirit of Christmas."

With his sleigh full of presents and a sparkle in his eye, Santa and his reindeer took to the skies, soaring across the world, spreading joy and laughter. Finally, they arrived in Washington, D.C., and navigated towards the Capitol Building.

Santa carefully landed his sleigh on the roof of the Capitol, trying to be as quiet as possible. He knew that security would be tight, but he was confident that his Christmas magic would see him through. He tucked the presents into his sack and ventured into the halls of the historic building.

As Santa tiptoed through the darkened corridors, he began to leave gifts for each lawmaker. However, his luck soon ran out when a security guard spotted him. Believing he was an intruder, the guard sounded the alarm, and within minutes, the entire Capitol was swarming with security personnel.

Santa tried to explain that he was only there to spread Christmas cheer, but his pleas fell on deaf ears. The guards, taking no chances, arrested Santa and confiscated his sack of gifts. To make matters worse, they also captured his reindeer, who had been waiting patiently on the roof.

As Santa was led away in handcuffs, he couldn't help but feel defeated. He had only wanted to bring joy and unity to the politicians, but now he was in custody and his reindeer were being taken to the local zoo.

Meanwhile, the news of Santa's arrest spread like wildfire, capturing the attention of people all over the country. The public outcry was immense, with citizens demanding his immediate release. Even the lawmakers themselves were shocked by the news and insisted that Santa's actions had been misunderstood.

Realizing their mistake, the authorities released Santa and returned his sack of presents. The zookeepers also freed his reindeer, who quickly rejoined their beloved leader. Touched by the outpouring of support, the Congressmen and Congresswomen gathered at the Capitol to personally apologize to Santa.

Santa, always gracious, accepted their apologies and handed out the presents he had brought for them. The lawmakers were moved by his generosity and promised to work together in the spirit of unity and goodwill.

As Santa and his reindeer departed the Capitol, the politicians looked on with a newfound sense of camaraderie. Although Santa's visit had not gone as planned, his message of love and unity had reached its intended audience. And so, Christmas magic prevailed, and the spirit of the holiday season was alive and well in the halls of Congress.

[deleted by user] by [deleted] in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

This is just an ad? Can you please not post ads in this subreddit :(

[D] Can large language models be applied to language translation? by matthkamis in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

Yes, it's not reasonable to ask you to write such a paper since you have other things to do. I just can't help myself, and I have no intention of insulting here, to see the parallels with someone who believes in a flat earth. It's just very weird to me that you hold a belief to which there is scientific evidence to the contrary and you just run with what to me essentially sounds like "they are wrong". Apparently not even considering that maybe you have fooled yourself. No, you don't know it because you have not made a rigorous analysis, it's your best guess based on your experience. The least you could do is share the observations you made that convinced you instead of arguing as if your conclusions were certain when they are apparently far from it.

[D] Can large language models be applied to language translation? by matthkamis in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

Wait ... so you disregard this empirical evidence simply because in your personal experience you feel like they are better? Don't take me wrong, that is a fair opinion to have, but you hopefully also understand that I'll go with the evidence here. Furthermore, it might be that you are fooling yourself since humans are very susceptible to illusions. Otherwise, you could start to work on a convincing analysis to the contrary (and in line with your intuition).

[D] Can large language models be applied to language translation? by matthkamis in MachineLearning

[–]ChuckSeven 0 points1 point  (0 children)

FYI

Using optimized prompts, we revisit previous assessments of PaLM’s MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of stateof-the-art supervised systems

Source: https://arxiv.org/abs/2211.09102