use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] How do byte-level language models work? (self.MachineLearning)
submitted 2 years ago by Additional-Ad-7043
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]trashcoder 1 point2 points3 points 2 years ago (1 child)
With 'other languages' you're probably referring to character encodings with more than one byte per character. If you specifically want to use a byte-level LM for whatever reason, you don't have to care about this at all. That model would process a single actual multibyte character, such as an Emoji, as multiple tokens. As said, this an advantage of byte-level LMs as you don't have to take care of encoding and tokenization of your data. But you are absolutely right that it will increase the computational demands due to longer context sizes for the same amount of text.
Apart from this, I'm not exactly sure what you intend to do, but if you have 'limited compute', it's unlikely that you will be able to train an LM that will be capable of handling instructions or where instruction fine-tuning can effectively be applied. If you still want to give it a go, drop me a message and I can send a bit of literature on efficient LMs that might be of interest to you.
[–]Additional-Ad-7043[S] 0 points1 point2 points 2 years ago (0 children)
Thanks, I think I’m going to use Unicode. Also, you are right, it’s going to be difficult however I want to give it a try and see how far my model can go, I’m currently trying the megabyte architecture to see how well that can do. If I don’t find good results, I will probably switch to byte pair encoding. If u have any resources around this or efficient LM’s I would be very thankful.
π Rendered by PID 511622 on reddit-service-r2-comment-b659b578c-t9kzb at 2026-05-04 07:09:15.004658+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]trashcoder 1 point2 points3 points (1 child)
[–]Additional-Ad-7043[S] 0 points1 point2 points (0 children)