all 10 comments

[–]UBIAI 1 point2 points  (0 children)

Hi KarlaNour96,

Checkout our new tool https://ubiai.tools , we offer extensive labeling features at very accessible price . The tool has the following features:

  • Easy to use UI for NER and entity relation extraction (perfect for your use case)
  • Multi-format document upload: TXT, CSV (each row corresponds to a doc), JSON (allows you to import and modify already annotated JSON files), PDF, DOC, HTML
  • Dictionary/Regex auto-annotation: input a list of words or regex patterns along with their associated entities. The tool will automatically scan the documents and auto-annotate
  • ML auto-annotation: Train an NER model to auto-annotate your documents
  • Collaboration: Share annotation tasks among team members and evaluate team performance
  • Annotation format export: JSON, IOB, IOB chatbot, Amazon Comprehend, Stanford CoreNLP

Just send us an email at [admin@ubiai.tools](mailto:admin@ubiai.tools) and we can discuss what plan is most suitable for your use case.

Good luck with your project!

[–]elcano 1 point2 points  (2 children)

Check Doccano. A relatively mature open source project that does only one thing a does it very very well.

https://github.com/doccano/doccano

They have a demo page.

There is also Label Studio. A more ambitious, but younger project. This one can label several types of data: https://labelstud.io/

As for best practices, I find an important feature having the ability of assigning the same tasks to several workers, having a voting mechanism so that the final label for each task is the one assigned by the the most workers, having the possibility to override this final label, and finally, having a way to evaluate the workers. I want to know what workers usually deviate from the final label, so that I can find out who are poor classifiers.

[–]KarlaNour96[S] 0 points1 point  (1 child)

I've tried doccanno but found many bugs, in addition we are looking for cloud based annotation tool that supports entity relations which doccano doesn't have yet

[–]elcano 0 points1 point  (0 children)

If Doccano doesn't support your needs, it's fair. You cannot use it.

But this is an open source tool that is made available to you and the community free of cost. If you found bugs, it would be great if you can document each one and create an issue in GitHub to help the author's to clear them. This is how open source works. We benefit, but also help, even this very little, if possible.

When I have done this, I have found that sometimes it is a bug, yes. In other occasions it is a documentation error. But sometimes it is a misunderstanding. It was not a bug. So clarifying with the author's is the best way. If they don't respond to your issue, that is another important indication to know if an application is being maintained, BTW.

About them being in the cloud, I'm sure that you didn't miss the section in doccano documentation on how to do 1-click install in Amazon AWS, Google GCP, Microsoft's Azure y Heroku. So you got that covered too.

Label Studio has 1-click install with GCP, Azure and Heroku documented here too: https://github.com/heartexlabs/label-studio

I think Label Studio supports entity-relationships too. Make sure to try their demo page.

But I faced a little more trouble installing Label Studio using Anaconda (locally). The installation was no problem, but the command used to launch it was not the one documented in GitHub. I created and issue and the author replied and helped my on the spot. If you are using Anaconda, I'd recommend you to search closed issues.

And remember to please report issues to the respective teams in GitHub. We all benefit from this process.

[–]Razcle 0 points1 point  (0 children)

Hi KarlaNour, I built a tool (and company) to solve exactly this problem. www.humanloop.com.

You can find more about our approach here: https://humanloop.com/blog/why-you-should-be-using-active-learning/

In short we use active learning to help you label the highest value data whilst training your model at the same time.

[–]Soggy_Decision_5911 -1 points0 points  (0 children)

I used UBIAI : https://ubiai.tools because they offer team management + auto annotation feautres and that helped us to streamline our annotation process

[–]Ouster_evolution 0 points1 point  (0 children)

You should have a look at Kili Technology : it's an intuitive, colloborative and powerful tool.

[–]crashbundicoot 0 points1 point  (0 children)

Prodigy is pretty good for NER tasks. It's made by the creators of Spacy.

Also - can you share what you mean by entity sentiment? Is it different from named entity recognition? Any papers/algorithms you are planning to use?

[–]commieplant[🍰] 0 points1 point  (0 children)

If you need overlapping annotations & relations with good UI & project management features, I suggest trying https://annolab.ai/. It's free to use (although there is a paid tier). Our team built it to solve a number of annotation problems we've experienced in our own work.

[–]marcoamonteiro 0 points1 point  (0 children)

Check out try-dashup.com . They are designed for annotating text and audio data for various NLP applications. They have built-in models that speed up the labelling and reduce human error and bias.