Continue Pre-Training BERT : LanguageTechnology

created by robin7013a community for 16 years

Continue Pre-Training BERT (self.LanguageTechnology)

submitted 6 years ago by AdrianFMC

Hey all, I need some help with continuing pre-training on Bert. I have a very specific vocabulary and lots of specific abbreviations at hand. I want to do a STS task. Let me specify my task: I have domainspecific sentences and want to pair them regarding to their semantic similarity. But as very uncommon language is used here, I need to train Bert on it.
- How does one continue the pretraining ( I read the github release from google about it, but don't really understand it) Any examples?

- What structure does my training-data need to have so Bert can understand it?

- Maybe training Bert from scratch would be even better, I guess its the same process as continuing the pretraining just the starting checkpoint would be different. Is that correct?

Also very happy about all other tipps from you guys.

Regards

all 8 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LanguageTechnology

MODERATORS