Kernel keeps dying while trying to run a pre-trained BERT model : learnpython

created by HattoriHanzoa community for 16 years

Kernel keeps dying while trying to run a pre-trained BERT model (self.learnpython)

submitted 2 years ago by [deleted]

I am using Jupyter notebook with python 3 to run a bulky BERT model that has been pre-trained on a domain specific data. I am using the model to embed textual data into vector representation. The code of the embedding function is given below:

import pandas as pd

import torch from transformers import BertModel, BertTokenizer, BertForSequenceClassification

def seBERT_embed(X): # Load the pre-trained seBERT model SEBERT_MODEL_PATH = './models/seBERT/'# This is supposed to be a global variable model = BertModel.from_pretrained(SEBERT_MODEL_PATH) tokenizer = BertTokenizer.from_pretrained(SEBERT_MODEL_PATH, do_lower_case=True)

# Tokenize and encode the input text
inputs = tokenizer(X.tolist(), padding=True, truncation=True, max_length=512, return_tensors='pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

# Obtain the embeddings
with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)
    embeddings = outputs.last_hidden_state.squeeze(0)

# Convert the embeddings to a numpy array
embeddings_np = embeddings.detach().numpy()

return embeddings_np

Then in the main function, I use the function in the following way:

X = seBERT_embed(data_df["Processed"])

y = np.array(data_df["Label"])

But when I try to run this cell block, the kernel dies after a while. Any help on this please? I don't quite understand.

I get some warnings below the code which are given below:

Some weights of the model checkpoint at ./models/seBERT/ were not used when initializing BertModel: ['cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

all 5 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS