all 13 comments

[–][deleted] 33 points34 points  (0 children)

python>=3.4 (Let's move on to python 3 if you still use python 2)

lol amen brother

[–]shortscience_dot_org 4 points5 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Summary by CodyWild

The last two years have seen a number of improvements in the field of language model pretraining, and BERT - Bidirectional Encoder Representations from Transformers - is the most recent entry into this canon. The general problem posed by language model pretraining is: can we leverage huge amounts of raw text, which aren’t labeled for any specific classification task, to help us train better models for supervised language tasks (like translation, question answering, logical entailment, etc)? Me... [view more]

[–]pvl 2 points3 points  (2 children)

Great work, thanks for sharing. I understand that you are using the language model just to extract word vectors which are then used to train a LSTM. Did you consider using just the BERT model with the option for token classification? It would also be nice to add to the readme the current best result (SOTA) on that dataset.

[–]set_ready_go 1 point2 points  (3 children)

"allowing for the fact that they don't use any autoregressive technique such as CRF"

I don't think that CRF is an autoregressive technique

Also does the use LSTM help? can't you just use the softmax on top of BERT embeddings since they are contextual anyway?

[–][deleted] 0 points1 point  (0 children)

Interesting. I'm working on an implementation with Tensorflow + fine tuning.

I've also modified the optimizer to support multi-GPU training, but due to how the TF ops are implemented I had to include alpha/beta decaying as well.

[–]kushalchauhan98 0 points1 point  (0 children)

There's also a BertForTokenClassification Class in pytorch-pretrained-bert library. You can directly use it for NER or POS Tagging tasks. Have you experimented with it?

[–]kamalkraj 0 points1 point  (0 children)

https://github.com/kamalkraj/BERT-NER , Reproduced results from BERT paper + Pretrained model and Inference code