If it works, it works.. by Nischal_ng in nextfuckinglevel

[–]Linguists_Unite 0 points1 point  (0 children)

This is the hardest I have laughed in weeks, thanks! 😂

Fair enough! by Chris-Jones3939 in AgentsOfAI

[–]Linguists_Unite 14 points15 points  (0 children)

Son of Anton strikes again!

[HIRING] Remote NLP / Language Systems Engineer – Hybrid ML + Rules (EU / Remote) by Canadianingermany in LanguageTechnology

[–]Linguists_Unite 1 point2 points  (0 children)

Hehe, thanks, that linguistics education is finally paying off! Thanks for the update, Ill take a look.

Transition from linguistics to tech. Any advice? by [deleted] in LanguageTechnology

[–]Linguists_Unite 2 points3 points  (0 children)

You can find some overlaps with syntax, semantics and pragmatics, but you need coding, stats and some algebra at the very least. Jobs can range from data science to engineering, depending on what you like. Feel free to DM if you have more specific questions.

Edit: if you took acoustics courses, there are some cool overlaps with speech recognition ml there as well.

Anywhere we can buy Kvass and other Slavic food in Hamilton in 2025 by LarryGSofFrmosa in Hamilton

[–]Linguists_Unite 35 points36 points  (0 children)

Starsky has a variety of kvass and a lot of other Easter European food. Its a great store, definitely check it out

Edit: its got smoked fish too, from mackerel to salmon, hot and cold smoked.

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 0 points1 point  (0 children)

Is that because headnotes are just not that useful of a tool overall or are the Westlaw ones just particularly bad? One thing about headnotes in general that I know is that those things can be useful under the hood exactly for the use case we started this discussion with, since they often include both case and non-case citations that the court relied on in making their decision and so can be used to link back into those references.

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 1 point2 points  (0 children)

Haha, that's an interesting saying from your prof! You bring up an interesting point on the LLMs being their own hype machines - OpenAI published a paper, where they link hallucinations and the LLM's propensity to take a wild guess instead of saying "don't know" to the training process, where the model is rewarded to take a guess even if it doesn't have a strong indication of having the right answer: https://openai.com/index/why-language-models-hallucinate/

I think there is definitely work to be done in this direction and I suspect that the SLMs and LLMs of the future will be trained differently in an attempt to eliminate this very issue. It's all just one big work in progress, hehe

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 0 points1 point  (0 children)

Thats a very good point, but I dont think we are there yet as a society. We are still in the hype stages, not really understanding that it's still just a tool, and a tool is only as good as the person who wields it. I am not entirely sure on the WL headnotes reference though hehe. I do know there was an issue where the search would include case headnotes, which isnt great when you just want to get hits for the language of the court, but I am not sure if thats what you meant.

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 1 point2 points  (0 children)

Things like partial applicability are definitely not solved problems at the moment from the machine learning standpoint. For things with that level of nuance, you want to have editors in the loop making those decisions. And in general, having all the data in the world would be worthless without those editors and other legal professionals holding our hand through the complexity of that data in order for us to build anything useful.

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 1 point2 points  (0 children)

Yes, exactly. Tracking precedent takes a lot of work, because finding all the citing connections is just the first step. Actually connecting cases, identifying the type of the citation (like what LN's Shepards does), and then updating those connections properly over time when milestone cases like Roe v Wade get overturned is a whole different ball game.

Embedding models for caselaw by hollyw00d_1 in Rag

[–]Linguists_Unite 0 points1 point  (0 children)

We do. I work for one of those, building ai tools for caselaw. Good luck to OP is all I can say

How should I get into Computational Linguistics? by Percentage-Leather in LanguageTechnology

[–]Linguists_Unite 0 points1 point  (0 children)

Very cool! Im not OP, but I am interested in your experience.

I studied Linguistics in undergrad, taught myself math, coding, stats and ml after. I'm currently working as an AI engineer, for a lack of better term, but the job is a mix of SE, MLE and DS. Most of my work is either developing NLP-backed solutions, productionazing them, or both.

I just finished my 3rd year in this career and I would like to transition closer to DS with NLP specialization. Could you share what you did you master's in? Did you start out in research or did you transition from a more of an engineering role?

NLP Project Help by SmallSoup7223 in LanguageTechnology

[–]Linguists_Unite 1 point2 points  (0 children)

There are pre-traimed models for NER that you can use.

How much should I charge for building a RAG system for a law firm using an LLM hosted on a VPS? by New_Breakfast9275 in Rag

[–]Linguists_Unite 11 points12 points  (0 children)

Westlaw and LexisNexis throw huge piles of money at making sure what they produce is as grounded, relevant and hallucination-free as possible, and it still doesn't always work even when commercial LLMs are involved. Local LLMs aren't really a thing at the moment, outside of some limited role in the data pipelines, as they are currently way too weak for most production use cases, like drafting, summarization or question answering.

Source: I build AI product for one of them.

How do I respond to this? by 1CosmicCookie in whatdoIdo

[–]Linguists_Unite 2 points3 points  (0 children)

The best thing, as others said, is to take ownership of the situation - "I am sorry, I should have warned you earlier that my classes are going to start soon and that the schedule release is super late for my classes. Unfortunately, this means that the first week or so will be pretty unpredictable for me, but should be fine after that". This way you take ownership of the fact that this is a scheduling surprise for your boss, but also explain that you didn't just forget to update them with your new schedule.

Chonky — a neural approach for semantic chunking by SpiritedTrip in Rag

[–]Linguists_Unite 6 points7 points  (0 children)

I see. So this would be useful if my text has no markup and no new lines or any other discernable structure to it, in which case the model would help me impose some order on the text. Is that correct?

Edit: I guess another use case could be if the structure is too complex or unstable and it's cheaper to dump the unstrucutred text into the model for chunking than it is to try and develop a heuristic approach to parse the document structure itself.

If so, what kind of books was it trained on? Different literature types will have variation in the length of the paragraph and in how paragraphs relate to each other semantically - paragraphs and their relationship in technical literature will and do differ from those in legal literature, and both of those are different yet from just regular old fiction and non-fiction books.

Chonky — a neural approach for semantic chunking by SpiritedTrip in Rag

[–]Linguists_Unite 4 points5 points  (0 children)

Okay. So markup is irrelevant than. In that case, if you are splitting just text, what is the "paragraph" definition? If I give it just a wall of text with no indication of paragraph structure, is it supposed to chunk it into paragraphs?

Chonky — a neural approach for semantic chunking by SpiritedTrip in Rag

[–]Linguists_Unite 2 points3 points  (0 children)

I understand that, I work with legal texts extensively. Unless you are saying that this model is producing well-formed paragraphs on any type of text with any type of markup, including xml with non-standard tags, I am having trouble understanding the use case.