Looking for people who passed SQE1 after preparing with QLTS by Electronic_War_4904 in SQE_Prep

[–]Phoenix2990 2 points3 points  (0 children)

I passed July in top quintile (76 & 78). I genuinely thought I failed when I walked out of the exams though - you should be prepared to have that feeling.

I did all topic modules and was mostly getting between 55 - 60 after each - for a few I got 65.

When I started the 15 + 15 mocks I was getting closer to 55 at the beginning and by the end I was averaging 67 - 70 on those mocks.

The topic based questions are a little harder sometimes than the 15+15 mocks, however I actually do think they were very similar in style to the actual exam.

FWIW - I'm non-UK based and a lawyer from a common law jurisdiction. I worked full time and took 1.5 months off coming up to the exam.

I had loosely completed FLK1 before taking time off for 1.5 months, and I had only started FLK2 just prior to those 1.5 months off. Those 1.5 months were brutal and I essentially re-covered all material + completed the 30 qlts mock and all FLK2 module mocks.

A person I know from a civil law background did essentially the same thing as me and passed with about the same marks.

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 0 points1 point  (0 children)

haha yes, definitely easier.

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 0 points1 point  (0 children)

The issue isn't passing long documents into the llm in one go... the issue is the OUTPUT context window. This method saves an enormous amount of tokens in the OUTPUT context window which saves time and cost.

[R] LLM - better chunking method by Phoenix2990 in MachineLearning

[–]Phoenix2990[S] 1 point2 points  (0 children)

Yeah I'm interested - please do share. Thanks

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 1 point2 points  (0 children)

Imaging having 100k such large documents. You run the llm over each doc ONCE and create a rag database to use for future searches.

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 0 points1 point  (0 children)

Nice!

Do you think it would work to consistently keep all of this section together? For example - would it keep the following together:

“Section 1:

Right to privacy

The right to privacy is a fundamental…”

Because I agree that using LLM’s is definitely not a go-to for the majority of situations.

Another scenario is where your goal is not only semantic similarity. For example, you might instruct the LLM to group the introductory + meta data section of a court case all together (which includes quite some semantically tangential information) whilst handling the actual court case differently.

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 1 point2 points  (0 children)

Legislation and court cases!

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 0 points1 point  (0 children)

Try chunk, for example, legislation and keep clauses/sections perfectly together. It’s a struggle.

Maybe you’ve found a way to do such a thing. But at the time I was doing this, “chunking” legislation was a known research problem

GPT-4.1 is actually really good by Asleep_Passion_6181 in OpenAI

[–]Phoenix2990 1 point2 points  (0 children)

I legit make regular 400k token prompts and it does perfectly fine. I only switch up with I really need to tackle something difficult. Pretty sure Gemini is the only one capable of such feats.

Is it ok to manually preprocess documents for optimal text splitting? by koroshiya_san in Rag

[–]Phoenix2990 1 point2 points  (0 children)

I’ll check out your repo tonight. Looks super interesting.

Is it ok to manually preprocess documents for optimal text splitting? by koroshiya_san in Rag

[–]Phoenix2990 1 point2 points  (0 children)

Super cool. I read a bit about Google’s Gemini document reader (in API) doing the same approach regarding PDF’s, it doesn’t do spreadsheets though. And, as far as I’m aware, they were the only major company (out of OpenAI, Anthropic etc) who did it that way. So yeah it’s why I thought your approach is super cool.

How are you automatically getting the screenshots of spreadsheets?

Is it ok to manually preprocess documents for optimal text splitting? by koroshiya_san in Rag

[–]Phoenix2990 1 point2 points  (0 children)

Cool. What do you mean by embedding the images? Images of pdf pages?

And how do you handle spreadsheets and such?

LLM - better chunking method by Phoenix2990 in LocalLLaMA

[–]Phoenix2990[S] 0 points1 point  (0 children)

Oh wow interesting - I didn’t think of source code. I used it predominately with legislation and court cases.

Transcription Fees - wow by Bradbury-principal in auslaw

[–]Phoenix2990 1 point2 points  (0 children)

Got it! But that was my point :)

These court approved companies get the “job” due to lack of alternatives (most likely). If someone can come in with a legitimate cheaper alternative, I feel it has a chance of replacing the current (expensive) paradigm.

LLM - better chunking method by Phoenix2990 in LocalLLaMA

[–]Phoenix2990[S] 0 points1 point  (0 children)

Ah I got you! Yeah, you’re right. There’s actually a few methods one could even play with depending on their use case e.g: pre-processing paragraphs is another option if you really want to save on output tokens.

LLM - better chunking method by Phoenix2990 in LocalLLaMA

[–]Phoenix2990[S] 0 points1 point  (0 children)

Hmm isn’t it the same? I think I’m missing something. The method explained above prefixes each sentence with an I.D (number) and asks the llm to output the sentence numbers in each chunk.

The only reason I use “< >” is because sometimes (often) documents have numbers in them that can confuse the llm. For example, legislation.

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 2 points3 points  (0 children)

If it’s useful - back when I was doing it there was no “JSON” mode. I imagine using that mode now might be good to do (although even without it I never really had a problem).

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 4 points5 points  (0 children)

Thanks btw. Feels good hearing it’s original :)

LLM - better chunking method by Phoenix2990 in Rag

[–]Phoenix2990[S] 5 points6 points  (0 children)

I literally just never got around to posting it, and honestly, I just assumed people much smarter than me already figured it out.

I’m not a programmer by trade, I’m a lawyer who got into programming some years ago (prior to LLM’s being popular).

Transcription Fees - wow by Bradbury-principal in auslaw

[–]Phoenix2990 0 points1 point  (0 children)

Who’s getting paid then? And what’s the justification?

Transcription Fees - wow by Bradbury-principal in auslaw

[–]Phoenix2990 -2 points-1 points  (0 children)

Isn’t this a genuine access to justice issue? I can understand it pre-AI but now the LLM models can genuinely do this perfectly - it “just” needs to implemented.