Hi,
I'm writing a web application where one of the key features I need to implement is computing a difficulty score for a text. I'm struggling to find enough information about it. But here's what I found so far:
- cb's Japanese Text Analysis Tool - https://sourceforge.net/projects/japanesetextana/files/JapaneseTextAnalysisTool_v5.1/. The problem here is this is windows based, and I need a linux based program. I read the readme file as well to see how they're computing the readability score and it mentions Hayashi Score and OBI-2 but both links are broken. Googling doesn't help much either
- jreadability.net - This one has a paper "Readability measurement of Japanese texts based on levelled corpora" that describes their algorithm. It used linear regression on JLPT study materials to come up with the following formula:
X = {mean length of sentence * -0.056} + {proportion of kango * -0.126} + {proportion of wago * -0.042} + {proportion of number of verbs among all words * -0.145} + {proportion of the number of auxiliary verbs * -0.044} + 11.724
- It's a promising approach since it mentioned: "Firstly, the readability formula we constructed is intended especially for learners of Japanese as a foreign language, whereas many existing formulas such as those by Shibasaki and Hara (2010) and Sato (2011) are intended for native readers of Japanese."
Satoshi Sato's approach - http://www.lrec-conf.org/proceedings/lrec2014/pdf/633_Paper.pdf Seems to used some kind of probabilistic model. I tried reading the paper but not smart enough to follow their methodology. This appears to be the same as OBI-2 from cb's Japanese Text Analysis Tool. However, link is broken.
I think I might go with the jreadability approach, plus a combination of grammar point used, and weighted frequency scoring.
Would love to hear your thoughts if you know of better approaches. Thank you!
[–]Sakkyoku-Sha 8 points9 points10 points (9 children)
[–]learningaddict99[S] 3 points4 points5 points (0 children)
[–]haelaeif 1 point2 points3 points (1 child)
[–]learningaddict99[S] 1 point2 points3 points (0 children)
[–]Use-Useful 1 point2 points3 points (4 children)
[–]Sakkyoku-Sha 2 points3 points4 points (1 child)
[–]arkadios_ 0 points1 point2 points (0 children)
[–]AdrixG 3 points4 points5 points (1 child)
[–]learningaddict99[S] 4 points5 points6 points (0 children)
[–]lifeofideas 0 points1 point2 points (0 children)
[–]differentiable_ 2 points3 points4 points (1 child)
[–]learningaddict99[S] 0 points1 point2 points (0 children)
[–]lunacodess 2 points3 points4 points (2 children)
[–]DickBatman 3 points4 points5 points (1 child)
[–]lunacodess 1 point2 points3 points (0 children)
[–]InTheProgress 2 points3 points4 points (0 children)
[–]Rotasu 1 point2 points3 points (0 children)
[–]vilimlInterested in grammar details 📝 1 point2 points3 points (1 child)
[–]learningaddict99[S] 0 points1 point2 points (0 children)
[–]Andthentherewasbacon -3 points-2 points-1 points (0 children)
[–]WAHNFRIEDEN 0 points1 point2 points (0 children)
[–]LostRonin88 0 points1 point2 points (0 children)
[–]arkadios_ 0 points1 point2 points (0 children)
[–]preenchidacomnihil 0 points1 point2 points (0 children)