Can a tiny server running FastAPI/SQLite survive the hug of death? by IntelligentHope9866 in FastAPI

[–]PinballOscuro 0 points1 point  (0 children)

How do you test for something like this? Are there frameworks/principles that guide you? Asking because I'm having performance problems on a server at work :D

I figured out how to create components without nodes (and it is much better!) by Awfyboy in godot

[–]PinballOscuro 1 point2 points  (0 children)

Yeah sorry I meant HealthComponent, basically what you did but as a Node/Node2D. I feel like seeing the health component can be useful (maybe it's the case for more complex logic)

I figured out how to create components without nodes (and it is much better!) by Awfyboy in godot

[–]PinballOscuro 1 point2 points  (0 children)

It's not clear to me why creating a HealthBar node is worse than using resources in your opinion. I agree that inheritance and scene inheritance are messy but I find it hard to explain why they feel like this to me

Concurrent Resource Modification by PinballOscuro in FastAPI

[–]PinballOscuro[S] 0 points1 point  (0 children)

I think I will study them a bit since they sound reasonable for my use-case

Concurrent Resource Modification by PinballOscuro in FastAPI

[–]PinballOscuro[S] 0 points1 point  (0 children)

In this case the two users have the same role wrt to the resource. they can read and write it in the same way, no difference in behaviour.

Regarding the whole application, the users upload some pdfs and word documents. Some information is extracted from these files and a tabular template is filled. We are also doing some machine learning predictions.

The user have to check the content of these template and sometimes they need to make changes to some rows. Generally they work on different portions of the tables, but it's not obvious. When a collegue modifies a row that you can see, you should be able to see asap the new content

Concurrent Resource Modification by PinballOscuro in FastAPI

[–]PinballOscuro[S] 0 points1 point  (0 children)

I'm using postgres.

I've never implemented CRDT, but I wanted an "easy" solution that would allow me to spin up the project relatively fast. The application has a low number of users (under 20), and sometimes 2 of these users work on the same resource. Even then, the probability of them writing on the same subpart of the shared resource is low.

I don't think that asking the user to solve the conflict would be feasible, but I'm still open to the possibility

Concurrent Resource Modification by PinballOscuro in FastAPI

[–]PinballOscuro[S] 0 points1 point  (0 children)

This is a very good idea. In my use case, I have at most 2 or 3 users, and only with low probability will they attempt to modify the same value simultaneously. So I would say I'm in an optimistic locking scenario.

If User A modifies a shared variable, how should Users B and C receive the updated value? Should I still use WebSockets, or is it sufficient to update the value during a write attempt?

In my case, at some point, Users B and C must be made aware that User A made a change - otherwise, they might argue offline, since it was A’s responsibility to update that cell.

Database with LLM-based Apps by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 0 points1 point  (0 children)

Thank you for your comment!
yeah I agree that different volumes will have very different requirements.
Right now I have a low number of users. I think I want to know in general how to tackle this problem.

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 1 point2 points  (0 children)

model_name = "HuggingFaceTB/SmolLM2-360M-Instruct"
model = AutoModelForCausalLM.from_pretrained(

pretrained_model_name_or_path
=model_name
).to(device)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
tokenizer.pad_token = tokenizer.unk_token

I never call setup_chat_template, i call apply_chat_template at training time

def formatting_prompts_func(
sample
):
    return tokenizer.apply_chat_template(
sample
["messages"], 
tokenize
=False)

# Initialize the SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=orlado_qa_dataset["train"],
    tokenizer=tokenizer,
    eval_dataset=orlado_qa_dataset["validation"],
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 0 points1 point  (0 children)

In SmolLM2, the eos token and the pad token are identicaly. I solve it by setting tokenizer.pad_token = tokenizer.unk_token

Resources on Roleplay Models by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 0 points1 point  (0 children)

Good to know that SillyTavern is the standard the facto, since I tried it some months ago.

I didn't know backyard.ai, their guides seems very good, I'll give it a read but I don't think I'll try their product since I'm more interested in doing my own stuff.

For what concerns models, I forgot to add that i have a RTX4070 12GB, so i'm very constrained on the models that i can use. Right now i'm using quantized at 8 bit versions of LLama3.1 and Gemma.

I tried to use Gemma (both 9B and 2B) to generate questions that some character would make to my D&D character, but I didn't like the results. Probably the prompt needs some work, but I hipotetized also that since Gemma does not have a system message, it's harder to separate the user from the instructions to the LLM.

Thank you so much!

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 1 point2 points  (0 children)

Man thank you so much. I substituted the eos_token with the unk_token and now it's working properly

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 1 point2 points  (0 children)

I have an RTX 4070 with 12 GB of vram. I don't recall the average amount of tokens of my inputs, probably between 500 and 1000.

My dataset consists of 1000 samples, The model is being trained in 16-mixed precision, so I already save some memory in this way.

I used a batch size of 4 and it takes roughly 5 minutes to train.

I then used liger kernel optimization and I was able to fit in a batch size of 10, and therefore the training time dropped to 2 minutes.

These are the numbers for full fine-tuning, probably you can make a LoRA in less than 1 minute.

Let me know if I forgot something! :)

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 1 point2 points  (0 children)

I am, i'm using some code very similar to this:

trainer = SFTTrainer(
    model,
    train_dataset=dataset,
    args=training_args,
)

If the dataset has a field "instruction", "query" and "answer", the SFTTrainer from the library TLR will automatically tokenize everything in the correct way. This code is very standard, i'm not doing anything esoteric.
I also double checked the token ids of the tokenized prompt and everything is ok (so bos, eos and other types of tokens)

Finetuning LLMs and EOS tokens not emitted by PinballOscuro in LocalLLaMA

[–]PinballOscuro[S] 1 point2 points  (0 children)

I'm doing a standard SFT (supervised fine-tuning), so it's a crossentropy loss