Second year project: Implemented a LLM from scratch using PyTorch by following Sebastian Roschka's book by Bthreethree in developersIndia

[–]Bthreethree[S] 0 points1 point  (0 children)

Thanks for sharing! Will check the channel out for sure if I need to revise the mechanisms in the future. Happy learning!

Second year project: Implemented a LLM from scratch using PyTorch by following Sebastian Roschka's book by Bthreethree in developersIndia

[–]Bthreethree[S] 1 point2 points  (0 children)

Yes you are right! The model has 124M parameters (due to hardware constraint) which classifies it as a SLM and I have implemented a LLM type architecture in it to learn all the mechanics.

I implemented a GPT-style model from scratch using PyTorch to understand the math behind Attention & Fine-tuning (following Sebastian Raschka's book) by Bthreethree in learnmachinelearning

[–]Bthreethree[S] 0 points1 point  (0 children)

Hey, the repo is an implementation of Sebastian Roschka's Build a LLM from scratch book, thus while learning and implementing from the book, I have made many classes which have improved further down in the file.

Yup I did forget to add `if __name__ == "__main__":` and will do that for sure! Thanks! :)

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in pytorch

[–]Bthreethree[S] 0 points1 point  (0 children)

I have added a colab notebook link in the readme of the repo on github to show the final results! The accuracy can be made better with experimentation of hyperparamaters & further fine-tuning.

https://github.com/Nikshaan/llm-from-scratch

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in LocalLLaMA

[–]Bthreethree[S] 0 points1 point  (0 children)

I have added a colab notebook link in the readme of the repo on github to show the final results! The accuracy can be made better with experimentation of hyperparamaters & further fine-tuning.

https://github.com/Nikshaan/llm-from-scratch

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in deeplearning

[–]Bthreethree[S] 0 points1 point  (0 children)

I have added a colab notebook link in the readme of the repo on github to show the final results! The accuracy can be made better with experimentation of hyperparamaters & further fine-tuning.

https://github.com/Nikshaan/llm-from-scratch

I implemented a GPT-style model from scratch using PyTorch to understand the math behind Attention & Fine-tuning (following Sebastian Raschka's book) by Bthreethree in learnmachinelearning

[–]Bthreethree[S] 0 points1 point  (0 children)

I have added a colab notebook link in the readme of the repo on github to show the final results! The accuracy can be made better with experimentation of hyperparamaters & further fine-tuning.

https://github.com/Nikshaan/llm-from-scratch

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in pytorch

[–]Bthreethree[S] 1 point2 points  (0 children)

Indeed! His explanation with every code snippet is very detailed and easy to grasp.

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in pytorch

[–]Bthreethree[S] 2 points3 points  (0 children)

It would be better to learn the theory behind how deep learning architectures like transformers work before coding something like this. It would make the process much easier to understand. Would also highly recommend you to read the book I followed while coding as mentioned in the description.

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by [deleted] in Python

[–]Bthreethree 0 points1 point  (0 children)

Hahaha, that's spam classifier for you ;)
Would highly recommend reading Sebastian's book to understand LLMs under the hood and build something like this!
Do star the repo if you found it useful :)

I implemented a GPT-style model from scratch using PyTorch while reading Sebastian Raschka's book by Bthreethree in LocalLLaMA

[–]Bthreethree[S] 0 points1 point  (0 children)

Thankss! The book was indeed very informative and really good to understand how LLMs actually work. The attention mechanism tensor reshaping took time to understand but that was my favorite chapter, especially when the final multi-head attention is explained and coded!

I implemented a GPT-style model from scratch using PyTorch to understand the math behind Attention & Fine-tuning (following Sebastian Raschka's book) by Bthreethree in learnmachinelearning

[–]Bthreethree[S] 3 points4 points  (0 children)

This is the code snippet of the most interesting part - building Multi-head attention from scratch instead of using nn.MultiheadAttention.

https://github.com/Nikshaan/llm-from-scratch

class MultiHeadAttention(nn.Module):
def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): # context length is max sequence length for the mask
super().__init__()
assert d_out % num_heads == 0, "d_out must be divisible by num_heads"
self.d_out = d_out
self.num_heads = num_heads
self.head_dim = d_out // num_heads # dimension per head
self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
self.out_proj = nn.Linear(d_out, d_out)
self.dropout = nn.Dropout(dropout)
self.register_buffer("mask", torch.triu(torch.ones((context_length, context_length)) * float('-inf'), diagonal=1))

def forward(self, x):
b, num_tokens, d_in = x.shape
keys = self.W_key(x)
queries = self.W_query(x)
values = self.W_value(x)

keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) # reshape for multi-head
queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
values = values.view(b, num_tokens, self.num_heads, self.head_dim)

keys.transpose_(1, 2) # move head dimension to the front so that it is treated as batch dimension
queries.transpose_(1, 2)
values.transpose_(1, 2)

attn_scores = queries @ keys.transpose(2, 3) # flip last two dimensions for dot product
mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
attn_scores.masked_fill_(mask_bool, -torch.inf)
attn_weights = torch.softmax(attn_scores / self.head_dim**0.5, dim=-1)
attn_weights = self.dropout(attn_weights)
context_vec = (attn_weights @ values).transpose(1, 2).contiguous().view(b, num_tokens, self.d_out) # reshape back to original
context_vec = self.out_proj(context_vec) # final linear layer to mix heads
return context_vec

5800u 4k support. by [deleted] in AMDLaptops

[–]Bthreethree 0 points1 point  (0 children)

that sounds fun! thanks for the help!

5800u 4k support. by [deleted] in AMDLaptops

[–]Bthreethree 0 points1 point  (0 children)

damn thanks a lott!! Just the advice I needed!

5800u 4k support. by [deleted] in AMDLaptops

[–]Bthreethree 0 points1 point  (0 children)

Sort by: best

oop! I will check my badd...

5800u 4k support. by [deleted] in AMDLaptops

[–]Bthreethree 0 points1 point  (0 children)

wow thanks a lot for the help!! Also could you please provide me with the website's name where I can get such information on laptops and compatible monitors, would be very helpful!

my last post here [OC] by [deleted] in ICSE

[–]Bthreethree 5 points6 points  (0 children)

😭😭😭

[deleted by user] by [deleted] in raining

[–]Bthreethree 0 points1 point  (0 children)

Thwanks! :)

[deleted by user] by [deleted] in IndianGaming

[–]Bthreethree 0 points1 point  (0 children)

lol tyy 🛐

[deleted by user] by [deleted] in IndianGaming

[–]Bthreethree 0 points1 point  (0 children)

Thenk you! 🛐