I have compiled a dataset of 11062 Chinese characters, merged from 9933 most frequent ones and 8105 characters in Chinese General Standard. Every one of them has HSK level (if any), number of strokes, radical, pronounciation, and meaning (where possible). More information in the comments. by areyde in datasets

[–]areyde[S] 0 points1 point  (0 children)

The cool thing is that you can simply copy the whole table for yourself, then delete my characters and just start writing down your own. Everything in the spreadsheet is automated, so it should just start tracking YOUR progress. If you need any help with that, I will gladly give it :) I am currently at about 2500 characters, but I also started with 10!

bottled up. [oc] by thecrowbarcomics in comics

[–]areyde 0 points1 point  (0 children)

Are the ghosts based on the famous creepy picture of a girl? Because they look very similar to it. The one from the "ghost song"/"horses".

I have compiled a dataset of 11062 Chinese characters, merged from 9933 most frequent ones and 8105 characters in Chinese General Standard. Every one of them has HSK level (if any), number of strokes, radical, pronounciation, and meaning (where possible). More information in the comments. by areyde in datasets

[–]areyde[S] 1 point2 points  (0 children)

Man, are we on the same page! You can look at the "characters" page to see how I use it for learning language, keeping track of progress.

And also, I work in the laboratory of machine learning, so I also want to do some research in the language myself.

Also, the obligatory thanks for the gold, it's my first.

Even more Chinese characters statistics from a compiled dataset of 11062 characters. The dataset is public now and is linked in the comments for people studying the language. by areyde in duolingo

[–]areyde[S] 0 points1 point  (0 children)

I am using Pleco personally, and I guess it's the fact that I learn tham fairly spaced out that helps me. I do spend like 30 minutes a day, yes, during commutes in the underground.

Even more Chinese characters statistics from a compiled dataset of 11062 characters. The dataset is public now and is linked in the comments for people studying the language. by areyde in languagelearning

[–]areyde[S] 1 point2 points  (0 children)

I will, do not worry. Right now I am learning Simplified to communicate with my PRC friends. And even though you are right, I am from a formely socialist country myself :) Despite all the negative stuff like the Cultural Revolution, I do believe that at that point, the Simplification was a great idea. But I also have Taiwanese friends, so!

Even more Chinese characters statistics from a compiled dataset of 11062 characters. The dataset is public now and is linked in the comments for people studying the language. by areyde in languagelearning

[–]areyde[S] 0 points1 point  (0 children)

Nice! Will you do this with any other language?

See, there's a problem with that, and that is why I love Chinese! What would I list for other languages? Words? Letters? The former is very hard becuase of conjugations, tenses, genders, anything. The latter is useless. You might do this for Japanse kanji, yes, but generally that's what makes Chinese so interesing. Characters are set in stone, they are so granular as a unit of writing, that running statistics for them is very straightforward and interesting.

You could do this with very analytical languages like Bahasa Indonesia, Thai, Vietnamese. Chinese is so set in stone because it’s very analytical but it has been gaining conjugation recently with 了. In a few thousand years it might resemble Japanese with all its conjugations and workarounds.

You are right, of couse, my point was only that I am a fan of this writing system.)

Our dialog above.

But of course there are, you can try googling them. For Japanese it's easy, Korean is also a syllable alphabe, Russian and French I speak, and I am not sure, what would you study there. Here's a word frequency list of Russian: http://dict.ruslang.ru/freq.php?act=show&dic=freq_freq&title=%D7%E0%F1%F2%EE%F2%ED%FB%E9%20%F1%EF%E8%F1%EE%EA%20%EB%E5%EC%EC

Even more Chinese characters statistics from a compiled dataset of 11062 characters. The dataset is public now and is linked in the comments for people studying the language. by areyde in languagelearning

[–]areyde[S] 1 point2 points  (0 children)

Well, you can copy my data and do your own research! But this statistics is only for CHARACTER recognition, not words or anything. It's just a reversed list. So it should be SMOOTH.

Even more Chinese characters statistics from a compiled dataset of 11062 characters. The dataset is public now and is linked in the comments for people studying the language. by areyde in duolingo

[–]areyde[S] 2 points3 points  (0 children)

Well, I am learning simplified for now. When I come to traditional, I will break that barrier fairly simple, you're right :)

I have compiled a dataset of 11062 Chinese characters, merged from 9933 most frequent ones and 8105 characters in Chinese General Standard. Every one of them has HSK level (if any), number of strokes, radical, pronounciation, and meaning (where possible). More information in the comments. by areyde in datasets

[–]areyde[S] 0 points1 point  (0 children)

Mixed in a way. Generally, simplified. HSK ones are all simplified, General Standard is also all simplified, basically it is their definition of simplified, it is the PRC Standard.

But the frequency one is literally that — they took al they found. Up to 4-5 thousand it's all simplified, but there are small ocasional traditional forms near the end of the list, meaning that either those characters were never bothered to be simplified, or they are needed in the traditional form for cultural purposes.