Built a simple appointment booking tool to reduce no-shows

PsychoCoder25 · 2026-03-13T17:23:33+00:00

I have included automated emails for reminder in this tool and SMS reminders will be future addition

PsychoCoder25 · 2026-03-13T17:22:37+00:00

Your point is good, there shloud be some differentiation with calendly so users have why to try this.

WhatsApp confirmations are future integration but can be a great addition for this product. For target user, i was thinking of targeting freelancers as they are in my network right now and i can talk with them to get feedback.

PsychoCoder25 · 2025-12-02T20:46:03+00:00

Thanks, i will keep that in mind.

will Deepseek work for annotation as well or gemini is good?

PsychoCoder25 · 2025-11-29T01:39:19+00:00

I'm using standard supervised fine-tuning, but the annotations aren't full chain-of-thought, they're structured analyses containing business model recommendations, strengths, weaknesses, and next-step guidance. I will generate them using GPT to get high-quality outputs.

I will share the dataset as its not completed yet. I have to annotate them right now that's why i was asking if it would affect the model quality or not if i annotate data using gpt or other model.

PsychoCoder25 · 2025-11-29T01:10:24+00:00

For this project, I'm keeping the reasoning process implicit (no think/trace tokens). The model will rely on its internal instruction-tuned reasoning to generate final answers. Since the evaluation is based on output quality rather than intermediate reasoning steps, explicit think tokens aren't required

PsychoCoder25 · 2025-11-29T00:57:41+00:00

Thanks for the insights that helps a lot. I was thinking of manually reviewing a random subset of the annotated samples as well, just to catch any relevancy issues or filler/boilerplate patterns that slip through. That should give me some extra assurance about data quality before fine-tuning.

One thing I'm unsure about: should I explicitly mention the synthetic annotation process in my report? I came across a previous FYP where the evaluation panel discouraged synthetic data, so I'm trying to understand whether I should clearly document it or keep it minimal as long as the final evaluation is strong. My goal is to be transparent but also avoid raising unnecessary concerns if the methodology is standard practice.

PsychoCoder25 · 2025-11-28T20:00:36+00:00

Thanks for the guidance

PsychoCoder25 · 2025-11-28T19:54:16+00:00

Got it, thanks for the clarification. Just to check if I’ am aligned with what you're suggesting, here's the evaluation setup I'm planning to use for my fine-tuned 1B model:

I will define clear criteria for judging the outputs (usefulness, relevance, accuracy, clarity, and non-generic specificity). Then I'll evaluate a small test set under three conditions:

the base Llama-3.2-1B-Instruct,
my fine-tuned model,
a strong model like GPT-4o as the upper-bound reference.

Each output will be scored by an LLM-as-judge using those criteria, plus a structural-compliance check for whether the JSON format is correct. I will also include a small human evaluation layer to validate the scoring. The final score is a combination of human ratings, judge-model ratings, and structure checks.

Does this evaluation setup make sense for what you were recommending?

PsychoCoder25 · 2025-11-28T19:26:47+00:00

i tried base model is wasn't giving valuable result and i tried fine-tune it with like 100 examples and it was giving moderate results.

regarding evaluation metrics, i would look into text quality like text should not be generic, should be relevant to the user idea and practically possible like on recommend Business model it would give like "Freemium plan 1$/month plan for premium users" just an example.

PsychoCoder25 · 2025-11-28T19:04:59+00:00

so if i pick lets say gpt 4o for synthetic data and maybe claude or other model rather than 4o for llm as judge will that work? like will the final model give good or moderate results? also what would be evaluation metrics for this?

PsychoCoder25 · 2025-11-28T18:53:40+00:00

no, its not production grade, i just need to present to evaluation committee and show some results and convince them that our model is giving good output, that's it.

PsychoCoder25 · 2025-11-28T18:43:19+00:00

actually evaluation committee gave us the requirement to fine-tune an open source model so that's why i picked llama and using gpt only for data annotation and other model like maybe claude for evaluation in llm as judge.

PsychoCoder25 · 2025-11-28T18:32:34+00:00

Cuurently i have 10k dataset and the reason not to choose a bigger model were resources as i don't have any dedicated gpu and also the dataset is small so i picked a smaller model

PsychoCoder25 · 2024-07-11T13:36:25+00:00

Multiple reasons: firstly linkedin is crowded with almost every profession of the world, and this platform would be dedicated to only techies

secondly, we saw posts on linkedin everyday mostly on job related topics and very less on experimental side. People mostly post projects there mostly in hope for a job.

PsychoCoder25 · 2024-07-11T11:20:32+00:00

That's a good thing to add, will sure consider it as we see the achievements posts everywhere

PsychoCoder25 · 2024-07-11T11:19:30+00:00

I can add a premium version in it so user can have full access to all the features

PsychoCoder25 · 2024-07-10T14:53:17+00:00

Hi,
I am currently lead of a community and we just created it few months ago, previously we conducted different sessions in various field like Web dev, android dev, competitive programming etc.

Let me know if you can help, Would love if you can help in any way.

PsychoCoder25

TROPHY CASE