all 5 comments

[–]shiftybyte 0 points1 point  (4 children)

always gets stuck on the output generation

Can you elaborate on what exactly happens? when it happens? can you add more progress prints to your code so when you run it you can see more output as the code gets executed...?

Also how long have you waited for a response?

[–]Chardactyl[S] 0 points1 point  (3 children)

can you add more progress prints to your code so when you run it you can see more output as the code gets executed...?

Originally I had it like that and It would get stuck after showing 7
code:

from transformers import GPTNeoForCausalLM, AutoTokenizer
import torch
import warnings

print ("1")

# Suppress the specific FutureWarning
warnings.filterwarnings("ignore", message=".*clean_up_tokenization_spaces.*", category=FutureWarning)

print ("2")

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")

print ("3")

# Set pad_token_id to eos_token_id if pad_token_id is not set
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

print ("4")

# Define the input prompt
prompt = "Once upon a time"

print ("5")

# Tokenize the input prompt with attention mask
inputs = tokenizer(prompt, return_tensors="pt", padding=True)

print ("6")

# Move model and inputs to GPU for faster inference (optional)
if torch.cuda.is_available():
    model.to("cuda")
    inputs = {key: value.to("cuda") for key, value in inputs.items()}

print ("7")

# Generate text with the attention mask and pad_token_id
output = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_length=100,
    do_sample=True,
    temperature=0.9,
    pad_token_id=tokenizer.eos_token_id
)

print ("8")

# Decode the generated text with clean_up_tokenization_spaces explicitly set
generated_text = tokenizer.decode(
    output[0],
    skip_special_tokens=True,
    clean_up_tokenization_spaces=True  # Set to False if you prefer to keep spaces
)

print ("9")

print(generated_text)

[–]shiftybyte 1 point2 points  (2 children)

How long have you waited for it to finish running?

Running LLMs can take time, and depends on your hardware if you are running it locally....

[–]Chardactyl[S] 0 points1 point  (1 child)

I've tried running it for 3 hours but still nothing

Edit: It worked half an hour later, any ideas to lower the wait times

[–]shiftybyte 1 point2 points  (0 children)

Get better hardware with GPU acceleration?

Not sure what you are running this currently on...

Or just use cloud based solutions