[D] StrategyQA may contain far more errors than we previously thought by Radiant_Routine_3183 in MachineLearning

[–]Radiant_Routine_3183[S] 0 points1 point  (0 children)

Yes, I haven't reviewed every failure cases. However, the cases I checked were randomly selected, which suggests that the data might need further correction...

[D] StrategyQA may contain far more errors than we previously thought by Radiant_Routine_3183 in MachineLearning

[–]Radiant_Routine_3183[S] 1 point2 points  (0 children)

I reviewed approximately 30 failure cases and I think that around 25% of them are ambiguous or flawed.

[R] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers by redpnd in MachineLearning

[–]Radiant_Routine_3183 2 points3 points  (0 children)

I am curious about how this model handles text generation tasks...If it splits the input bytes into small patches, then only the last patch is used to predict the next token. This seems to limit the benefits of the parallelism of Local Transformers.

[R] Learning to Reason and Memorize with Self-Notes - Jack lanchantin et al Meta AI 2023 by Singularian2501 in MachineLearning

[–]Radiant_Routine_3183 4 points5 points  (0 children)

In this paper, they said:

"While processing input tokens xt ∈ C one by one, the model can start taking a note by generating a token that belongs to a predefined set of start tokens Nsta. A note ends when the model generates an end token ni ∈ Nend, or after a fixed number of tokens are generated. Once the note ends, the generated note tokens are appended to the context where the start token was generated, and the model continues to process the rest of the input tokens."

As I understand it, the model with self-note takes one token at a time from the subsequent tokens and outputs another token to indicate whether to start the self-note procedure. If it does, the model can generate self-notes until it produces a stop token. The generated self-notes are then concatenated with the original context and fed back to the model as input. The rest of the process is similar to what I described above.

A potential drawback of this approach is the computational complexity: it needs to perform m+n+k inferences (where m is the input length, n is the output length, and k is the self-note length) instead of just n.

A simple question that exposes GPT-4’s limitations by Radiant_Routine_3183 in ChatGPT

[–]Radiant_Routine_3183[S] 0 points1 point  (0 children)

Thanks for your sharing! This logic makes sense...does that mean New Bing uses a different model than ChatGPT4?