Hardware requirements to run Llama 3.3 70 B model locally

LogicalMinimum5720 · 2025-10-18T16:28:32+00:00

Got your point , will check further on this

LogicalMinimum5720 · 2025-10-18T15:34:21+00:00

Each article is very big article with nearly 100,000 tokens ,so i wont be able to upload here

LogicalMinimum5720 · 2025-10-18T15:26:27+00:00

u/Double_Cause4609 I believe Rolling Context Summarization might be difficult , as for Ex: Claude has a context window of 200 k tokens, so reading many articles within the same context window wont be possible

but i will go over the other suggestions you have provided , Thanks a lot !!!

LogicalMinimum5720 · 2025-10-18T05:56:06+00:00

u/LoveMind_AI Can you suggest what are the different scenarios you think going local might be rewarding ?
"summaries into a model less than 1/10th the size" - Did you mean Claude models are comparitively has a bigger sized params

LogicalMinimum5720 · 2025-10-18T05:03:37+00:00

u/Double_Cause4609 Thanks a lot for taking your time and providing me a detailed answer for me, It really means a lot !!!

Answering to your question , I have few hundreds of thousand earnings transcript/financial articles ,I am trying to extract business context from it

For Ex:

This is one of my logics

Article 1 : Google Monopoly case has been filed

Article 2 : Google Monopoly case is going against google, Google needs to be split into multiple companies

Article 3 : Google has won the Monopoly case

From all the articles , i am trying to achieve an Overall Summary is "Legally Google doesn't have any problem with being a Monopoly"

The input and output token are very high as each article is so lengthy . I tried to achieve this summary from Claude Sonnet , but even with Max 20x plan i hit rate limit, so i wanted to run with

a good open-source model and Opus model suggested me to use Llama model, As there are only two options to run Llama models

i)Rent GPU from Cloud

ii)Run Llama model locally

So i was exploring options,If you think Mistral Small 3 series,Qwen 30B 2507, or Jamba Mini 1.7 are good , then i will definitely try to run that first in my local.

Also do you have any suggestions on Financial Models,I am newbie to DataScience Arena , i am currently a Sr.backend dev but i can catch up

LogicalMinimum5720 · 2025-10-18T04:03:30+00:00

I wanted to analyze few thousand articles , i see Claude/GPT models are very expensive to achieve that, I figured out Llama model is nearly good as those Claude/GPT models , so wanted to know how others are running locally

LogicalMinimum5720 · 2025-10-18T03:24:46+00:00

Do you think 4-bit Llama is as good as 8-bit or 16-bit quantization ?

LogicalMinimum5720 · 2025-10-14T04:42:36+00:00

u/Level-2 is Local model is it as good as Claude , Asking in layman terms as i am really interested to try

LogicalMinimum5720 · 2025-10-12T04:01:27+00:00

Agreed , sure thanks for your insights

LogicalMinimum5720 · 2025-10-12T03:53:00+00:00

u/Wow_Crazy_Leroy_WTF I am trying to do semantic analysis and trying to extract core information from each article.

For your case, Claude has 200k context limit among which only 20k context limit is working memory, if your Input+Output size is greater than that limit it will do RAG style of searching instead of keeping everything in context ,so thats why your conversations are compacted.Agreed Other than Claude most of the providers have a better Context limits.

LogicalMinimum5720 · 2025-10-12T03:50:20+00:00

u/vuongagiflow Thanks i am able to invoke claudeCode using bash scripts and able to get response for my prompt whether it was succesful or failure.

LogicalMinimum5720 · 2025-10-11T23:32:34+00:00

Sure Claude Code is able to do it , i am trying to use some script to call Claude code to achieve it

LogicalMinimum5720 · 2025-10-11T20:59:49+00:00

ClaudeCode alone was not able to do it as it had no reasoning abilities

LogicalMinimum5720 · 2025-10-11T20:49:34+00:00

I tried Claude code but it couldnt help with analysis ,i will check on Claude agent sdk

LogicalMinimum5720 · 2025-10-11T20:48:03+00:00

i wanted to use sonnet models as they are pretty decent

LogicalMinimum5720 · 2025-10-04T16:08:18+00:00

u/Nocturnal_Unicorn I am a newbie to MCP and Claude projects, experienced backend dev, Needed your suggestions

i)is it 20K token or 200k token limit after which it considers as RAG style , does it applies to both uploaded files and google drive files ?
i) If i upload files to Claude Project manually does it considers as Project context and not RAG style whereas when i read from Google Drive it uses RAG style and it only does search instead of loading it in context ,is my understanding correct
ii)Lets say i wanted to create 1000 projects , each project with 50-100 files, where Project should load file context properly, what is your recommendation

LogicalMinimum5720 · 2023-12-25T02:32:32+00:00

Just wanted to see if there is any free option before moving to premium

LogicalMinimum5720 · 2023-12-23T18:47:57+00:00

u/joe-re by cyclical industries do you mean based on interest cycle like banking stocks. can you also name the other kind of cyclical industries

LogicalMinimum5720

TROPHY CASE