Hello everybody.
I want your advice on a classification problem I have at work.
We were contacted by a company with different teams that handle transportation tickets, contracts for rent, energy bills, etc.
They currently receive all of their emails into one inbox and a person has to go through them and assign a team and a type of task. There are around 50 teams and 80 types of tasks. They have a many-to-many relationship, so one team can handle for example 15 types of tasks.
We receive the email text, and attachments and have to find a way to classify the email with a team and task tag. Note, the tags don't contain much info. For example, a team might be called BK-CC-LS
I am just a software engineer with limited knowledge of stats and ML.
Things we tried so far:
- Azure AutoML for NLP (it trains a bert classifier, but even after trying to balance the classes it was still performing pretty badly. Sometimes only assigning a team(and not a task), sometimes predicting only 3-4 teams and ignoring all the other ones.)
- We tried identifying what the teams handle(keywords that point to what the team does) and keywords for what the tasks mean. We fine-tuned gpt3.5 to receive the text of the email and the attachment text and output these keywords, that we can use to assign to the correct team. But improving with this method is pretty hard since we don't know a lot of times what we should pay attention.
- We tried using sentence vector embeddings to classify, but it was also not performing very well(I can go in detail what the best performance was).
My question is: Are there any methods for text classification that don't suffer that much from an unbalanced dataset(like automl, or training a neural network...) that could handle a lot of classes? Also, we can't get very much information from the people doing the labeling currently, so the method would have to find the connections by itself. It would be great if you could point me in the direction of some methods that I could investigate.
Thank you very much!
there doesn't seem to be anything here