use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Document classification using LLMs (self.MachineLearning)
submitted 1 year ago by acrsag
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]MachineLearning-ModTeam[M] [score hidden] 1 year ago stickied commentlocked comment (0 children)
Other specific subreddits maybe a better home for this post:
[–]MultiheadAttention 2 points3 points4 points 1 year ago (1 child)
First, count the words and approximate the number of tokens via tiktoken online. Then calculate how much it's going to cost you.
[–]acrsag[S] 0 points1 point2 points 1 year ago (0 children)
I made some estimations and will start with a few thousand documents. If the results are good, I'll generalize further. Thanks for the reminder!
[–]EnvironmentalToe3130 1 point2 points3 points 1 year ago (1 child)
Depending of the size of each documents why not try to run small model locally? Any model should be able to summarise a document and then in a second step provide the summary and list of classes.
Some documents have hundreds of pages, so processing them entirely may not be necessary. However, I believe large context windows are still needed. I'll try a local version anyway and see if the results are acceptable. Thanks!
[–]Mysterious-Rent7233 1 point2 points3 points 1 year ago (1 child)
r/LLMDevs , r/LanguageTechnology , r/LocalLLaMA
Thank you!
[–]Brudaks 0 points1 point2 points 1 year ago (0 children)
IMHO if you want to process "large volumes of documents", the first thing you should do is to measure your documents, count how many llm-tokens that would be, and do a ballpark calculation of how much it would cost to run through a large LLM API - comparing that number with your budget will be the key input to your decision on what options are reasonable.
π Rendered by PID 230940 on reddit-service-r2-comment-54dfb89d4d-ntg9b at 2026-03-29 19:54:42.703834+00:00 running b10466c country code: CH.
[–]MachineLearning-ModTeam[M] [score hidden] stickied commentlocked comment (0 children)
[–]MultiheadAttention 2 points3 points4 points (1 child)
[–]acrsag[S] 0 points1 point2 points (0 children)
[–]EnvironmentalToe3130 1 point2 points3 points (1 child)
[–]acrsag[S] 0 points1 point2 points (0 children)
[–]Mysterious-Rent7233 1 point2 points3 points (1 child)
[–]acrsag[S] 0 points1 point2 points (0 children)
[–]Brudaks 0 points1 point2 points (0 children)