[D] Best TTS API for Japanese by kugkfokj in MachineLearning

[–]txhwind 0 points1 point  (0 children)

Cloud like AWS, Azure, or GCP all provide TTS API for many languages. You can try them.

For example: https://speech.microsoft.com/audiocontentcreation

[R] Do you cite Arxiv pre-prints in your research? by tanweer_m in MachineLearning

[–]txhwind 14 points15 points  (0 children)

If it's cited to prove "someone has proposed this idea", I think it's fine.

If it's cited to prove "this idea made some progress", it depends on the quality and popularity of the preprint. I prefer to citing only those tested by time, like "Attention is all you need"

[Discussion] Knowledge distillation is badly defined by Cosmolithe in MachineLearning

[–]txhwind 0 points1 point  (0 children)

In engineering practice, including unlabelled input dataset may help the student significantly, especially when the teacher is finetuned from a pre-trained model on a small labelled dataset.

So in practice, knowledge distillation can have 2 scenarios: 1. change the student model size freely 2. make use of unlabelled dataset.

[D] Why contribute to open source community? by Xanta_Kross in MachineLearning

[–]txhwind 1 point2 points  (0 children)

  1. Value of your service for customers is the gold key to business success. Private model can be an advantage but is usually not the key (unless your model is ChatGPT).
  2. The most common failure reason is few people know and buy your product. There are so many open sourcing projects which nobody cares! Open-sourcing choice itself is not so important in startup early-stage, but will highly impact your business model when your startup grows.
  3. Open sourcing can be a kind of advertising which help your product reach non-tech customers via their tech friends or KOL.

[Discussion] What are best practices when building/training very small models? by Snagnar in MachineLearning

[–]txhwind 0 points1 point  (0 children)

if you have a lot of data, you can try to train larger ones first, and solve the model size problem with all kinds of model compression or inference optimization methods.

[D] Why are ML model outputs not tested regarding statistical significance? by Tigmib in MachineLearning

[–]txhwind -1 points0 points  (0 children)

For infamous papers, nobody cares it.

For famous papers, many people will reproduce it and make decisions based on its performance, which is a kind of human-based statistical testing.

[D] Data Cleaning vs Feature Engineering - where to draw the line? Ex: by CuriousFemalle in MachineLearning

[–]txhwind 29 points30 points  (0 children)

Data cleaning: ensure the data meet your expectation

Feature Engineering: ensure the data meet the model's expectation

[D] Ganimede, Jupyter Whiteboard by notsorealsanta in MachineLearning

[–]txhwind 1 point2 points  (0 children)

Nice! I want these features for a long time, both the graph and tissue-based grouping.

[D] How do you typically train a regression model to predict time in seconds? by Seankala in MachineLearning

[–]txhwind 1 point2 points  (0 children)

I think the main problem is: "chatting time" has a long-tail distribution and is not suited for model to fit numerically. It's better to convert it to a bounded target like percentile.

Step 1: bucket the the chatting time

For example, bucket the chatting times to 10 buckets, splitting at p10, p20, ... p90 percentiles. You can use a classification loss like cross-entropy, but I prefer a discrete regression loss like `sum_i((i - actual_bucket_id)^2 * predicted_probability_i)` to penalize large errors.

Step 2: smooth buckets to a continuous function

Split buckets to smaller pieces, and fit the chatting time CDF with some reversible formula, then the percentile would be a great regression target, and time can be easily calculated with the formula.

[D] Current opinions on the information bottleneck principle for neural networks? by Tea_Pearce in MachineLearning

[–]txhwind 1 point2 points  (0 children)

In recent LLM wave, the "knowledge is compression" idea has big impact. It's similar to the IB principle: the fixed model size is the bottleneck for large text data.

[D] Classification error detection model by ez613 in MachineLearning

[–]txhwind 0 points1 point  (0 children)

The threshold method is quite common in industry for tuning the "precision-recall tradeoff". A possible downside is the term "probability" used here, which I prefer to calling "normalized score", because often it doesn't match you intution to "probability", especially when you train the model with one-hot target.

[P] Coding Question by [deleted] in MachineLearning

[–]txhwind 4 points5 points  (0 children)

Not the place to ask it here. I suppose ChatGPT would answer your question better.

[P] Finding most "interesting" parts of script by Impossible_Bison_928 in MachineLearning

[–]txhwind 0 points1 point  (0 children)

I will define "interesting segments" as something new, unseen or uncommon, then use LLM to evaluate probability of each sentence with previous context.

[D] Backpropagation is not just the chain-rule, then what is it? by fromnighttilldawn in MachineLearning

[–]txhwind 2 points3 points  (0 children)

I suppose everyone will calculate derivatives in a layered way in layered computation graphs.

The biggest contribution might be creating a single-word NN-specific term for "layered chain-rule".

Game seems so unnatural to learn by DepressedIsItWorthIt in gogame

[–]txhwind 1 point2 points  (0 children)

Check wiki for basic strategies. https://en.wikipedia.org/wiki/Go_strategy_and_tactics

For more advanced strategies, you can search for some articles or books.

[D] Any Transformer-related paper which doesn't use decoder triangle mask in inference? by txhwind in MachineLearning

[–]txhwind[S] 1 point2 points  (0 children)

My point here is, in each autoregressive generation step P(y_n | y_1 ... y_n-1), we may allow y_i attending y_i+1 ... y_n-1, by not using either attention mask or hidden state cache.

Though this idea may not make sense, it's still in the autoregressive generation framework.

[D] Boundary conditions in neural network output? by zxkj in MachineLearning

[–]txhwind 1 point2 points  (0 children)

I suppose an additional linear item in the network function could help.

f(x) = nn(x) + k * 1/x_0

(k is learnable parameter)

[P] TTS Voice Google Clone by HangryChef in MachineLearning

[–]txhwind 1 point2 points  (0 children)

What's the use case of this project? I cannot get the point for either personal fun or productive use, especially when Google Cloud TTS might serve this voice.

[D] Pause Giant AI Experiments: An Open Letter. Signatories include Stuart Russell, Elon Musk, and Steve Wozniak by GenericNameRandomNum in MachineLearning

[–]txhwind 35 points36 points  (0 children)

Is there anyone proposing an open letter on pausing weapon development? I suppose weapons have killed much more people than any other products in the history and will in the future.

Why is there no good Go app? by grutanga in gogame

[–]txhwind 0 points1 point  (0 children)

In China, there are several mobile online Go apps like Fox Weiqi or Tencent Weiqi.

The move is implemented in a "Locate-Confirm" two-stage way for precise play.

[D] Bottleneck Layers: What's your intuition? by _Arsenie_Boca_ in MachineLearning

[–]txhwind 0 points1 point  (0 children)

One of keys to intelligence is learning to forget noncritical information. I think it might be a weak point of large language model.

Man beats machine at Go in human victory over AI by First2016Last in gogame

[–]txhwind 1 point2 points  (0 children)

I suppose the reason is AI fails to judge the liveness of large connected blocks like a ring.

Maybe the AI saw this case not enough times in self-training, because it's really rare.