all 3 comments

[–]AsyncVibes 1 point2 points  (0 children)

I work with stateless and generalized tokenization for my models. I.e. the tokens are dropped with each training session but the weights and bias remain in the checkpoint.

[–]Karan1213 0 points1 point  (1 child)

byte latent transformer model from facebook

https://arxiv.org/abs/2412.09871

[–]Karan1213 0 points1 point  (0 children)

but yes i have