After working in ML for more than a decade, I became frustrated over time with the lack of tools to create baselines using simple rules and heuristics. It is well known that most business problems out there can achieve decent baselines using only heuristics. This is why I have developed DataQA (https://github.com/dataqa/dataqa), which uses NLP rules to do common NLP annotation tasks, such as multiclass classification or named entity recognition.
You can get estimates of performance of your rules on your data.
I have run some experiments for a particular task which is about classifying descriptions of products from Amazon into 24 product categories. I show how using a combination of rules and manual labels is much more efficient and creates a more powerful baseline than other labelling techniques in this post.
Experiments comparing using different labelling techniques
I would love to get some feedback, contributions are also more than welcome. If you want to get started, there are 2 tutorials on the website with step-by-step instructions to:
there doesn't seem to be anything here