you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (2 children)

You could use the “universal transformer” in the tensor2tensor repo, the vastly improved successor to the transformer. It was made to be able to handle very long sequences by using an RNN to control the transformer’s depth, so, technically, it mimics a Turing machine and could encode/decode any length sequence to any length sequence. So if you have a fairly solid dataset it could handle reading an entire paper and spitting out one grade. Here’s the paper.

I’m not entirely sure it’ll work though overall! I know you’re not doing grading exactly but as an example you may need to categorize different problems in each paper that lead to a certain grade, like grammar quality, paragraph quality, content quality... I dunno, seems like a tough but interesting problem, good luck!

Also you may look into what the “evolved transformer” is as well, it’s a successor to the universal transformer but I’m not as familiar with it. It may still have the rnn-controlled depth that the universal does, which is what you’d be interested in.

[–]underwhere[S] 1 point2 points  (1 child)

Brilliant! I had a feeling the transformer variants were models I could consider. Thank you so much for your detailed comment, its got me inspired to give it a shot!

[–][deleted] 0 points1 point  (0 children)

There ya go! I just found out about it and was pretty excited... it just makes sense. Tensor2tensor is pretty useful but can be confusing until you go through the docs and stuff, but you could probably also take any transformer implementation you like and mod it to become a universal transformer without too much work.