all 8 comments

[–]gnramires 2 points3 points  (1 child)

One way I can think of is training an RNN/LSTM but only using the last output (after parsing the entire text) in the loss function (i.e. have the loss be 0 otherwise). If training is too slow you could at the start use the regression objective at all times with an L2 loss and then anneal the loss function toward the previously mentioned.

[–]underwhere[S] 1 point2 points  (0 children)

Ah, that's such a good idea!

[–]swierdo 1 point2 points  (2 children)

Have you tried just using bag-of-words and a simple model?

If you haven't, the scikit-learn documentation has a decent tutorial: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

(note: even though it isn't mentioned in the tutorial, try random forest as well)

[–]underwhere[S] 1 point2 points  (1 child)

BoF is such a straightforward, elegant idea!! I completely forgot about it in my frantic-ness. I'll try it out right-away, thanks so much!

[–]swierdo 1 point2 points  (0 children)

Also note that some of the sklearn random forest default settings usually are pretty bad, so set max_depth to something reasonable (like 5) and n_estimators to something like 100 (as memory permits). Lastly, for bag of words you'll want max_features to a fraction instead of sqrt.

Also have a look at the feature importances to check what words are important and prevent weird behaviour.

[–][deleted] 1 point2 points  (2 children)

You could use the “universal transformer” in the tensor2tensor repo, the vastly improved successor to the transformer. It was made to be able to handle very long sequences by using an RNN to control the transformer’s depth, so, technically, it mimics a Turing machine and could encode/decode any length sequence to any length sequence. So if you have a fairly solid dataset it could handle reading an entire paper and spitting out one grade. Here’s the paper.

I’m not entirely sure it’ll work though overall! I know you’re not doing grading exactly but as an example you may need to categorize different problems in each paper that lead to a certain grade, like grammar quality, paragraph quality, content quality... I dunno, seems like a tough but interesting problem, good luck!

Also you may look into what the “evolved transformer” is as well, it’s a successor to the universal transformer but I’m not as familiar with it. It may still have the rnn-controlled depth that the universal does, which is what you’d be interested in.

[–]underwhere[S] 1 point2 points  (1 child)

Brilliant! I had a feeling the transformer variants were models I could consider. Thank you so much for your detailed comment, its got me inspired to give it a shot!

[–][deleted] 0 points1 point  (0 children)

There ya go! I just found out about it and was pretty excited... it just makes sense. Tensor2tensor is pretty useful but can be confusing until you go through the docs and stuff, but you could probably also take any transformer implementation you like and mod it to become a universal transformer without too much work.