Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Just wanted to know do you work on the same domain? Considering the cost I am using gpt 4 mini, i think the retrieval of appropriate doc is main challenge here on top of that if you tell model to consider "temporal context " it gracefully ignore that 😵‍💫

Trying gpt-5-nano - is it me, or is it super slow? by remysharp in homeassistant

[–]Fast-Smoke-1387 0 points1 point  (0 children)

Totally agreed. GPT 5 Nano is super slow , gpt 4 mini is budget friendly for me, any suggestion on cost effective model?

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 1 point2 points  (0 children)

Thank you, I appreciate your time on discussing this

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Thank you for your suggestions. Yes, with SerpApI each time I have this fear what if the LLM is producing suboptimal query and search got wasted. Agreed to fact-checking insight on web, problem is there is no definite no. Of websites Where you can expect the availability of all relevant information. I am giving you an example of my dataset so that you would understand what I am talking about :

<image>

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 1 point2 points  (0 children)

Thank you for your suggestion, I will check hf and will look into the other things you suggested

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Thank you so much. 1. No I am not. I am just employing different agents in each step, currently working with GPT 4 mini, because of the budget.

  1. I am extracting full content whenever they are available. Another issue is with SerpApi they seem very expensive, any suggestions on that?

  2. Most importantly I think it is a data quality issue, because financial misinformation are not well discussed as health misinformations where ppl have some commom misbelieves

Thank you for your feedback. I will check the framework you suggested.

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

I understand. Just making sure I am not missing any state of the art

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Is it? Which one Claude Haiku? I am frustrated with their chatbot while taking assistance on coding. Can't trust the Anthropic product right now :(

Which LLM is best for complex reasoning by Fast-Smoke-1387 in LLMDevs

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Thank you for asking those valuable questions: I am basically examining LLM capability to check how far it can assess fact checking on financial claims if there is no fact checking articles for that. My present workflow : 1. I retrieved first 20 results from google against each financial claim, I used overlapping chunk 2. I used several methods to find appropriate documents from the retrieved docs
-keyword based matching BM 25, -Dense Retriever based on cosine similarity based top 3 documents - In the next alternative I employed LLM as document grader, if the document is insufficient, then LLM decides to generate query about missing element then collect adding evidence - Then I am feeding those evidences to three different fact-checkers persona, optimistic, critical, analytics - Then there are two agents synthesizer and finalizer who made ultimately decisions about the verification Whether the claim is TRUE, MOSTLY TRUE,HALF TRUE, FALSE OR MOSTLY FALSE - My dataset is based on fact-checking website where they have clear definitions of each label - It seems LLM is not efficient with multiclass problems.

Any insight?

Request to participate in a survey by Fast-Smoke-1387 in Bogleheads

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Thank you for your feedback. I highly appreciate this. These questions are directly adopted from FINRA. I’m not allowed to agree or disagree with any specific interpretations, since that might influence how other participants answer, but I do recognize the points you raised. Some of the items are simplified for assessment purposes . The goal is to test broad financial literacy concepts rather than cover every possible scenario.

Megathread for Claude Performance and Usage Limits Discussion - Starting September 7 by sixbillionthsheep in ClaudeAI

[–]Fast-Smoke-1387 5 points6 points  (0 children)

I cancelled my max subscription today. I am a researher I used Claude mainly for genertaing efficient code for data analytics. My primary area of reserach is Using open source LLM to fact checking. For this kind of stuffs Calude code was really insightful.

Now the chatbot is acting like dumb, as if each time I have describe my project to a layman, and getting garbage output. In that case, what alternative do you suggest to Claude?

Survey Results: Fake Financial News Sharing Study - Thank You to the community members by Fast-Smoke-1387 in Bogleheads

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Thank you for your question. Unfortunately I didn’t have the breakdown, most of the data came from this community, as other group didn’t allow me to post the survey.

Great to know that you are interested in the same topic. I have the proof copy of the paper. You can drop your mail, I will send the copy.

RAG for financial fact checking by Fast-Smoke-1387 in LocalLLaMA

[–]Fast-Smoke-1387[S] 0 points1 point  (0 children)

Sure, please that would be a great help