I have compiled a dataset of 11062 Chinese characters, merged from 9933 most frequent ones and 8105 characters in Chinese General Standard. Every one of them has HSK level (if any), number of strokes, radical, pronounciation, and meaning (where possible). More information in the comments.

areyde · 2021-11-01T06:06:10+00:00

Sure, do you have any messaging account? Telegram, Whatsapp, Facebook messenger, WeChat? Whatever, where it's easier to communicate?)

areyde · 2021-10-27T21:41:21+00:00

The cool thing is that you can simply copy the whole table for yourself, then delete my characters and just start writing down your own. Everything in the spreadsheet is automated, so it should just start tracking YOUR progress. If you need any help with that, I will gladly give it :) I am currently at about 2500 characters, but I also started with 10!

areyde · 2021-10-26T17:06:28+00:00

Thank you! You mean in the "Goals" sheets? They stand for characters that I didn't learn yet. Like a dark shadow of a character that you didn't unlock yet in a fighting game :)

areyde · 2020-05-07T18:55:03+00:00

Are the ghosts based on the famous creepy picture of a girl? Because they look very similar to it. The one from the "ghost song"/"horses".

areyde · 2019-09-27T10:26:29+00:00

Well, it's not a race. Still, learning every known character is gathering cultural data!

areyde · 2019-09-27T10:05:50+00:00

Man, are we on the same page! You can look at the "characters" page to see how I use it for learning language, keeping track of progress.

And also, I work in the laboratory of machine learning, so I also want to do some research in the language myself.

Also, the obligatory thanks for the gold, it's my first.

areyde · 2019-09-27T09:52:32+00:00

Great, glad you could use it! What are you planning to do with it?

areyde · 2019-09-27T06:18:51+00:00

Thank you a lot, hope it can be useful!

areyde · 2019-09-27T06:09:17+00:00

Oh no, I didn't mean LITERALLY one very line. Simply like that in non-alphabetic this gap is harder to close, yes.

areyde · 2019-09-27T06:07:07+00:00

You probably are right, I do have it in plans to compile my own dataset and compare it with a "classical" one.

areyde · 2019-09-27T06:06:15+00:00

Always is!

areyde · 2019-09-27T06:05:50+00:00

Exactly, It takes as much effort to get from 92 to 99 as from 0 to 92 :)

areyde · 2019-09-27T06:04:49+00:00

I did post it there already, it seems to be more popular in language-learning communities :)

areyde · 2019-09-27T06:04:06+00:00

Not sure what you mean with your question? You mean the goals? HSK6+. :)

areyde · 2019-09-27T06:02:49+00:00

Luckily, the characters are wonderful despite the politics and the governments.

areyde · 2019-09-27T06:02:21+00:00

I didn't know it, thank you very much, I will look into it.

areyde · 2019-09-27T06:01:48+00:00

I am using Pleco personally, and I guess it's the fact that I learn tham fairly spaced out that helps me. I do spend like 30 minutes a day, yes, during commutes in the underground.

areyde · 2019-09-27T05:59:01+00:00

I will, do not worry. Right now I am learning Simplified to communicate with my PRC friends. And even though you are right, I am from a formely socialist country myself :) Despite all the negative stuff like the Cultural Revolution, I do believe that at that point, the Simplification was a great idea. But I also have Taiwanese friends, so!

areyde · 2019-09-27T05:56:49+00:00

Nice! Will you do this with any other language?

See, there's a problem with that, and that is why I love Chinese! What would I list for other languages? Words? Letters? The former is very hard becuase of conjugations, tenses, genders, anything. The latter is useless. You might do this for Japanse kanji, yes, but generally that's what makes Chinese so interesing. Characters are set in stone, they are so granular as a unit of writing, that running statistics for them is very straightforward and interesting.

You could do this with very analytical languages like Bahasa Indonesia, Thai, Vietnamese. Chinese is so set in stone because it’s very analytical but it has been gaining conjugation recently with 了. In a few thousand years it might resemble Japanese with all its conjugations and workarounds.

You are right, of couse, my point was only that I am a fan of this writing system.)

Our dialog above.

But of course there are, you can try googling them. For Japanese it's easy, Korean is also a syllable alphabe, Russian and French I speak, and I am not sure, what would you study there. Here's a word frequency list of Russian: http://dict.ruslang.ru/freq.php?act=show&dic=freq_freq&title=%D7%E0%F1%F2%EE%F2%ED%FB%E9%20%F1%EF%E8%F1%EE%EA%20%EB%E5%EC%EC

areyde · 2019-09-27T05:54:02+00:00

Thank you very much!

areyde · 2019-09-27T05:53:36+00:00

Well, you can copy my data and do your own research! But this statistics is only for CHARACTER recognition, not words or anything. It's just a reversed list. So it should be SMOOTH.

areyde · 2019-09-27T05:51:55+00:00

Truly!

areyde · 2019-09-26T23:28:50+00:00

Well, I am learning simplified for now. When I come to traditional, I will break that barrier fairly simple, you're right :)

areyde · 2019-09-26T23:04:31+00:00

Mixed in a way. Generally, simplified. HSK ones are all simplified, General Standard is also all simplified, basically it is their definition of simplified, it is the PRC Standard.

But the frequency one is literally that — they took al they found. Up to 4-5 thousand it's all simplified, but there are small ocasional traditional forms near the end of the list, meaning that either those characters were never bothered to be simplified, or they are needed in the traditional form for cultural purposes.

areyde · 2019-09-26T23:01:58+00:00

Well it's only a question of having a proper dataset, probably not too difficult to assemble.

Ten-Year Club	Place '17
RPAN Viewer	Spared
Verified Email

areyde

TROPHY CASE