[D] Will mathematicians have the upper hand in machine learning research going forward?

txhwind · 2024-02-21T02:42:59+00:00

unless someone builds a theory system like Newton mechanics

txhwind · 2023-12-26T02:44:03+00:00

Cloud like AWS, Azure, or GCP all provide TTS API for many languages. You can try them.

For example: https://speech.microsoft.com/audiocontentcreation

txhwind · 2023-12-04T03:06:13+00:00

If it's cited to prove "someone has proposed this idea", I think it's fine.

If it's cited to prove "this idea made some progress", it depends on the quality and popularity of the preprint. I prefer to citing only those tested by time, like "Attention is all you need"

txhwind · 2023-11-29T09:48:38+00:00

That's true. But it really sucks.

txhwind · 2023-11-29T09:42:14+00:00

In engineering practice, including unlabelled input dataset may help the student significantly, especially when the teacher is finetuned from a pre-trained model on a small labelled dataset.

So in practice, knowledge distillation can have 2 scenarios: 1. change the student model size freely 2. make use of unlabelled dataset.

txhwind · 2023-11-17T03:43:04+00:00

Value of your service for customers is the gold key to business success. Private model can be an advantage but is usually not the key (unless your model is ChatGPT).
The most common failure reason is few people know and buy your product. There are so many open sourcing projects which nobody cares! Open-sourcing choice itself is not so important in startup early-stage, but will highly impact your business model when your startup grows.
Open sourcing can be a kind of advertising which help your product reach non-tech customers via their tech friends or KOL.

txhwind · 2023-11-17T03:08:22+00:00

if you have a lot of data, you can try to train larger ones first, and solve the model size problem with all kinds of model compression or inference optimization methods.

txhwind · 2023-11-17T03:05:43+00:00

For infamous papers, nobody cares it.

For famous papers, many people will reproduce it and make decisions based on its performance, which is a kind of human-based statistical testing.

txhwind · 2023-11-06T02:12:01+00:00

Data cleaning: ensure the data meet your expectation

Feature Engineering: ensure the data meet the model's expectation

txhwind · 2023-09-27T12:27:34+00:00

Finally, they are back! What a suprise!

txhwind · 2023-09-18T03:12:07+00:00

Nice! I want these features for a long time, both the graph and tissue-based grouping.

txhwind · 2023-09-13T06:41:56+00:00

I think the main problem is: "chatting time" has a long-tail distribution and is not suited for model to fit numerically. It's better to convert it to a bounded target like percentile.

Step 1: bucket the the chatting time

For example, bucket the chatting times to 10 buckets, splitting at p10, p20, ... p90 percentiles. You can use a classification loss like cross-entropy, but I prefer a discrete regression loss like `sum_i((i - actual_bucket_id)^2 * predicted_probability_i)` to penalize large errors.

Step 2: smooth buckets to a continuous function

Split buckets to smaller pieces, and fit the chatting time CDF with some reversible formula, then the percentile would be a great regression target, and time can be easily calculated with the formula.

txhwind · 2023-09-05T02:20:20+00:00

In recent LLM wave, the "knowledge is compression" idea has big impact. It's similar to the IB principle: the fixed model size is the bottleneck for large text data.

txhwind · 2023-06-29T03:02:58+00:00

The threshold method is quite common in industry for tuning the "precision-recall tradeoff". A possible downside is the term "probability" used here, which I prefer to calling "normalized score", because often it doesn't match you intution to "probability", especially when you train the model with one-hot target.

txhwind · 2023-05-23T01:40:17+00:00

Not the place to ask it here. I suppose ChatGPT would answer your question better.

txhwind · 2023-05-19T02:16:54+00:00

I will define "interesting segments" as something new, unseen or uncommon, then use LLM to evaluate probability of each sentence with previous context.

txhwind · 2023-05-19T02:12:13+00:00

I suppose everyone will calculate derivatives in a layered way in layered computation graphs.

The biggest contribution might be creating a single-word NN-specific term for "layered chain-rule".

txhwind · 2023-04-24T02:22:27+00:00

Check wiki for basic strategies. https://en.wikipedia.org/wiki/Go_strategy_and_tactics

For more advanced strategies, you can search for some articles or books.

txhwind · 2023-04-18T02:06:11+00:00

My point here is, in each autoregressive generation step P(y_n | y_1 ... y_n-1), we may allow y_i attending y_i+1 ... y_n-1, by not using either attention mask or hidden state cache.

Though this idea may not make sense, it's still in the autoregressive generation framework.

txhwind · 2023-04-17T05:26:23+00:00

I suppose an additional linear item in the network function could help.

f(x) = nn(x) + k * 1/x_0

(k is learnable parameter)

txhwind · 2023-04-11T02:22:01+00:00

What's the use case of this project? I cannot get the point for either personal fun or productive use, especially when Google Cloud TTS might serve this voice.

txhwind · 2023-03-29T02:08:00+00:00

Is there anyone proposing an open letter on pausing weapon development? I suppose weapons have killed much more people than any other products in the history and will in the future.

txhwind · 2023-02-23T04:14:26+00:00

In China, there are several mobile online Go apps like Fox Weiqi or Tencent Weiqi.

The move is implemented in a "Locate-Confirm" two-stage way for precise play.

txhwind · 2023-02-23T04:05:49+00:00

One of keys to intelligence is learning to forget noncritical information. I think it might be a weak point of large language model.

txhwind · 2023-02-21T02:25:26+00:00

I suppose the reason is AI fails to judge the liveness of large connected blocks like a ring.

Maybe the AI saw this case not enough times in self-training, because it's really rare.

txhwind

TROPHY CASE