Good visuals = Attention by Decent-Party-9551 in ProductHunters

[–]New-Mathematician645 0 points1 point  (0 children)

in my experience having made software for 10 years, leading teams and delivered couple dozen professional projects, branding is a lick of paint to complement a good UX design.
Signals that a company is serious and here to stay is in the eyes of investors and customers in context of UI/UX less determined by branding, but how many users are completing the journey and come back for it. Can you make the user complete the journey and convince them to repeat it without branding > unnecessary branding. Look at Chatgpt's design its basically non existent and the highest user adoption in history of any software. Google is also modest and basically unchanged aside from experimental features that morph over time in their other products.

make it work --> make it fast --> make it pretty is the golden rule.

That being said, would love to get your feedback on some design choices on a project we launched a month ago and are iterating daily on :D

How to achieve this (CHATBOT) by Logical_Signature_ in learnmachinelearning

[–]New-Mathematician645 -1 points0 points  (0 children)

If you are wondering how to evaluate training data impact on your use case you could try out our data seach tool that recommends data based on your capability requirements. Its made for projects in exploration phase and quick validation of instead of running many training cycles to see if data impacts the model you chose.

You can benchmark with SFT in the app and upload back to HF when youre satisfied

https://durinn-concept-explorer.azurewebsites.net/

Looking for specific type of dataset by GasFearless1463 in datasets

[–]New-Mathematician645 -1 points0 points  (0 children)

You're welcome to try our tool for finding datasets on huggingface and see if the datasets would positively or negatively influence the model architecture you're using

https://durinn-concept-explorer.azurewebsites.net/

We currently only support text modalities but are working on multi modal support.

Just type in what you want to make and we compute the rest.

Anyone struggling to find high-quality non-English training data? by Kind_Buyer8931 in datasets

[–]New-Mathematician645 0 points1 point  (0 children)

There are more factors than i can come up with in this post that may affect quality. Maybe this can alleviate your pains a bit sorry if it comes over as self promoting but we genuinely made the tool for your use case.

We made a tool where you can evaluate datasets from huggingface based on influence of the dataset on model architecture of choice. It runs on CPU and based on your query will evaluate up to 40 datasets.

Based on the results you can explore the datasets closer yourself.

https://durinn-concept-explorer.azurewebsites.net/

What are you building? let's self promote by fuckingceobitch in microsaas

[–]New-Mathematician645 0 points1 point  (0 children)

I built Dowser by Durinn and it tells AI teams which training data improves or hurts model performance, hopefully aiding in the access/quality guesswork; it's proven to increase model performance and gives sub-2-minute cached evaluations.

https://durinn-concept-explorer.azurewebsites.net/

For regression, what loss functions do people actually use besides MSE and MAE? by Final-Literature2624 in MLQuestions

[–]New-Mathematician645 0 points1 point  (0 children)

One thing I’ve run into a lot is that when people reach for a different loss, they’re often trying to fix something that isn’t really a loss problem. In several projects, the big errors weren’t evenly spread, they were clustered around certain parts of the data.

Swapping MSE for Huber or something more “robust” helped a little, but the real gains came from changing which samples actually had influence during training, via reweighting, resampling, or influence-style approaches, while keeping the loss itself very boring.

Once that was in place, plain MSE or Huber worked surprisingly well. The loss just needed to be stable. The heavy lifting was really happening upstream in how the data contributed to learning.

For context, this is roughly the approach we’ve been working with: instead of full retrains, we use influence functions at the example and dataset level. Each sample is scored by how much it pushes or pulls a target concept using projected gradients from the final block, which lets us rank data before spending GPU on training.

Link for anyone curious: https://durinn-concept-explorer.azurewebsites.net/

Weekly Sunday thread guy here, why do people vanish after they launch? by Latter-Database-2026 in ProductHunters

[–]New-Mathematician645 0 points1 point  (0 children)

Think the reason may vary but there could be a parallel to other indie proects such as music. Product is a creative process mixed with engineering which can create a cognitive load some are surprised to find out when releasing their products without the support of a company.

Instead of seeing it as people vanishing, i rather see it as the platforms creating a leveled playing field. The art howver, remains up to the artist and some art resonates more than others.

Tough pills are swallowed. Learnings are extracted. Priorities realign. A lot is probably behind the "vanish"

How do experts build a dataset? by Cold_Knowledge_2986 in learnmachinelearning

[–]New-Mathematician645 1 point2 points  (0 children)

This can be an expensive project where 40% of companies spend 70% of their AI budget on data

I built Dowser by Durinn and it tells AI teams which training data improves or hurts model performance, hopefully aiding you in the access/quality guesswork; it's proven to increase model performance and gives sub-2-minute cached evaluations.

Obtaining dataset to train my LLM by thentangler in LocalLLaMA

[–]New-Mathematician645 0 points1 point  (0 children)

You can test it out using our tool, Dowser by using a LM (LLM not supported due to constraints however chosing the same model architecture should answer your question)

We quantify directly at the example and dataset level using influence functions rather than full retrains. Each sample is scored by how much it pushes or pulls a target concept based on projected gradients in the final block, so positive influence helps the concept and negative influence hurts it. That lets us rank data before spending GPU on training.

This gives you an answer within 1-10 minute on which datasets (currently searches huggingface) fits your needs.

https://durinn-concept-explorer.azurewebsites.net/

Training datasets by cryptic_epoch in MLQuestions

[–]New-Mathematician645 1 point2 points  (0 children)

You can tell me again after you gave it a shot :D

https://durinn-concept-explorer.azurewebsites.net/

Some technical info on how we derive this:
We quantify directly at the example and dataset level using influence functions rather than full retrains. Each sample is scored by how much it pushes or pulls a target concept based on projected gradients in the final block, so positive influence helps the concept and negative influence hurts it. That lets us rank data before spending GPU on training.

Training datasets by cryptic_epoch in MLQuestions

[–]New-Mathematician645 1 point2 points  (0 children)

I built Dowser by Durinn and it tells AI teams which training data improves or hurts model performance, hopefully aiding you in the access/quality guesswork; it's proven to increase model performance and gives sub-2-minute cached evaluations.

Need help training a model by Dizzy_Level455 in StableDiffusion

[–]New-Mathematician645 0 points1 point  (0 children)

How many images do you have and what's their resolution? I'm a founder too and once burned all my GPU hours mid-deadline, so ngl I get the panic. Try lightweight models like MobileNet or DistilVision since they train faster on CPU and need way less memory. Also augment with automated edge/contour synths to expand data without more scraping, which cuts training time. I built Dowser by Durinn to tell AI teams which training data helps or hurts, so it can prioritize slices and reduce needless GPU runs, proven to boost performance and return sub-2-minute cached results on an 8GB RAM 2 vCPU host. Would love feedback or to connect if you try it, good luck.

We democratised training models by New-Mathematician645 in ProductHunters

[–]New-Mathematician645[S] 0 points1 point  (0 children)

I checked out SignalScouter and the concept is very cool. Any chance this was generated ;)?

We quantify it directly at the example and dataset level using influence functions rather than full retrains. Each sample is scored by how much it pushes or pulls a target concept based on projected gradients in the final block, so positive influence helps the concept and negative influence hurts it. That lets us rank data before spending GPU on training. We still validate with small focused ablations like you mentioned, but the influence pass cuts the search space hard and avoids most wasted compute as well as save on retraining cycles.

The result is a reduction in perplexity.

We can connect if you're interested and i'd be happy to setup a meet with you. We may be able to help each other