These statistics are programmatically generated with several tools. Most of them are made for this purpose. Some of them are generic. For more tool information, see the Making stats page. For frequency lists, see the Frequency lists page.

Add requests to this page: Requests

As of 2017-09-28, the analyzer stats (lexeme count, coverage) deduplicates lines longer than five UTF-16 code units. See also: Duplication

If you want to edit the stats here, please use the tools presented on the Making stats page for the sake of consistency. If you can't use them, find a friend or acquaintance who can. Above all, do not edit the stats here directly based on manual statistical analysis.

Scroll down to the bottom of the page for explanations of what the columns mean.

Higher % values are easier. Higher Hayashi values are easier.

Click a column header to sort by it.

See Abbreviations for the original title of ambiguous abbreviations.

See also: https://tlwiki.org/index.php?title=VN/Eroge_Script_sizes

script name kanji (unique) kanji (2+) chars per line characters lexemes BCCWJ 5k VN 5k VN 5k sans 50 VNFreqList 90% target VNFreqList 95% target Core6k Core6k sans 250 chars per sentence Hayashi modified Hayashi modified Hayashi 2 JIS MiB JIS MiB (deduplicated)
aete mushisuru 1988 1725 16.3762 543761 290898 82.87% 90.20% 90.75% 4857.00 10056.17 82.16% 84.44% 10.23 79.29 75.43 94.98 1.3M 1.2M
ao no kanata 2130 1847 22.9081 938614 509174 86.15% 90.93% 91.15% 4533.39 9606.85 84.99% 87.27% 12.06 77.03 78.86 100.03 2.1M 2.0M
astelight shuushuubako 2983 2598 57.2095 518364 293103 70.80% 77.57% 78.04% 13961.50 23754.50 69.22% 71.05% 27.62 63.27 54.36 75.20 1.2M 1.2M
axanael 1862 1594 15.4847 485875 212618 83.38% 88.03% 88.03% 6344.42 12880.85 81.28% 85.36% 7.28 78.51 71.61 105.74 1.1M 1.1M
biman1 1751 1395 27.6612 197611 105079 73.67% 86.70% 90.01% 7441.70 19174.80 73.81% 81.31% 8.67 74.04 66.39 96.25 440K 432K
chronobox trials 1266 915 23.9832 66548 36574 82.21% 92.03% 92.36% 3887.00 8404.33 80.74% 86.13% 7.82 79.95 86.98 97.01 160K 156K
chronobox 1986 1738 25.8646 521122 287479 79.74% 89.70% 90.32% 5225.73 11377.64 77.72% 84.10% 7.67 77.31 81.83 96.55 1.3M 1.2M
cloverpoint 2259 2005 23.7604 906755 484562 81.55% 87.99% 89.02% 6111.90 12330.99 80.24% 83.15% 9.57 76.06 76.10 97.29 2.1M 2.0M
daitoshokan fandisk 2036 1761 18.9773 496826 271289 82.47% 89.81% 90.84% 5112.39 10884.55 81.58% 84.74% 11.70 75.63 77.29 93.86 1.1M 1.1M
daitoshokan 2313 2058 18.6452 1161599 635527 84.20% 90.10% 90.74% 4935.82 10726.43 82.62% 85.52% 12.15 75.33 77.94 92.81 2.6M 2.5M
dies irae 2799 2570 32.8806 1812969 958671 80.13% 86.39% 86.39% 7119.25 12654.05 77.36% 79.44% 17.03 70.99 66.98 89.46 3.9M 3.7M
dracuriot 2219 1949 27.3986 1265085 686513 85.46% 91.66% 92.16% 4017.68 8463.46 82.99% 86.55% 10.61 77.97 78.22 99.85 2.9M 2.8M
eustia 2455 2216 17.8062 1052020 573641 81.82% 87.99% 89.66% 6700.13 14335.12 80.17% 84.27% 11.88 77.03 72.77 95.02 2.3M 2.2M
fate stay night 2559 2330 26.1082 1861769 965644 80.88% 86.31% 88.16% 6925.82 14001.00 78.59% 83.65% 14.64 73.38 63.75 90.06 4.1M 3.7M
flowers1+2+3 2413 2180 27.1889 893604 519852 81.58% 87.56% 88.08% 6723.57 12557.15 80.55% 83.20% 14.46 72.30 64.66 87.41 1.9M 1.9M
flowers1 1980 1692 27.0263 330543 191812 83.80% 89.81% 90.26% 5179.60 10860.95 83.59% 85.63% 13.29 73.86 71.02 90.08 724K 712K
flowers2 2033 1734 27.0760 269271 155449 80.07% 85.99% 87.22% 7628.50 13546.45 78.53% 82.85% 15.16 70.90 61.11 86.31 584K 576K
flowers3 2085 1809 27.4825 293790 173126 80.53% 86.52% 87.10% 7410.93 13599.40 79.07% 83.15% 15.32 71.91 60.74 85.44 636K 628K
flyable heart 1850 1632 20.9062 864065 462182 89.29% 94.20% 94.20% 2637.03 5824.33 87.32% 90.06% 11.52 81.91 92.23 102.48 1.9M 1.8M
fortunearterial 2187 1946 15.6327 949197 509828 83.69% 90.75% 91.50% 4540.99 8834.22 82.52% 85.81% 11.34 78.89 84.29 97.44 2.1M 2.0M
fureraba 2084 1854 22.7575 1055501 566742 86.76% 92.18% 92.18% 3716.59 8033.06 85.29% 87.16% 10.29 77.43 75.35 99.08 2.4M 2.3M
futsuu no fantasy 2019 1784 16.5476 801501 426094 81.45% 89.12% 91.10% 5579.39 10326.20 79.04% 83.29% 10.44 78.45 77.16 98.46 1.8M 1.7M
gensou no idea 2482 2227 29.0149 865797 487206 81.87% 86.89% 87.19% 6920.11 12587.48 79.78% 81.91% 13.64 77.20 69.51 92.38 1.9M 1.9M
hanachirasu 2151 1746 27.5822 154411 88192 77.10% 80.98% 81.92% 10658.90 19259.95 74.50% 77.11% 17.13 72.73 74.77 80.07 344K 340K
hanahira 948 655 16.6173 51485 26072 85.96% 91.70% 92.69% 3942.58 9089.10 86.93% 89.32% 8.90 83.48 92.80 109.85 124K 120K
hatsukoi yohou 1835 1587 24.2353 646450 359353 86.37% 92.66% 94.14% 3263.80 7387.97 85.57% 89.48% 8.69 79.23 82.31 96.70 1.5M 1.5M
henkoi 1898 1603 30.5540 568091 309063 83.82% 90.39% 92.31% 4799.56 11636.00 81.67% 87.00% 8.11 75.54 73.49 99.67 1.4M 1.4M
hoshimemo 2081 1865 18.2732 977540 540886 83.07% 87.48% 89.39% 6324.05 11601.30 81.42% 86.97% 11.35 81.14 87.04 100.12 2.0M 2.0M
iinchou shounin 2052 1790 22.3580 631001 338262 83.41% 89.51% 91.40% 5355.55 11925.86 82.35% 87.02% 7.61 79.35 77.98 99.89 1.5M 1.5M
imakoi 1447 1158 27.9471 159416 84185 81.00% 92.53% 93.37% 3783.05 7014.05 79.32% 84.64% 11.63 80.79 70.53 100.81 336K 336K
inganock 2066 1822 26.5465 438336 231298 82.27% 85.62% 87.12% 7635.67 15164.79 78.75% 84.08% 10.61 75.52 63.57 86.62 948K 880K
itsusora 2466 2185 21.8124 646806 357207 82.66% 88.10% 88.48% 6217.43 12140.65 80.62% 83.38% 12.63 82.27 89.75 94.14 1.4M 1.4M
jingai makyou 2496 2235 27.4848 906760 453890 80.23% 87.40% 87.98% 6595.99 12952.82 79.86% 82.18% 14.52 73.92 73.86 88.72 2.0M 1.8M
kagerou 2939 2601 43.2278 853370 490409 72.12% 80.28% 83.31% 13426.93 20697.60 70.13% 75.13% 25.96 73.15 84.25 86.36 1.9M 1.9M
kajiri akebono 2862 2573 37.1914 1128495 627039 78.30% 83.33% 84.70% 9156.55 16919.95 74.85% 78.23% 19.09 81.44 85.32 86.51 2.5M 2.4M
kamimaho 2161 1947 23.0001 969307 528706 83.99% 90.58% 91.98% 4607.40 9118.41 82.38% 86.31% 15.30 77.13 85.20 92.65 2.1M 2.1M
katahane 1866 1662 24.3593 694494 335476 86.16% 89.81% 91.35% 5137.53 10194.10 84.12% 87.64% 11.16 79.93 71.30 100.92 1.5M 1.4M
kaziklu 2071 1728 38.8561 225348 123390 80.84% 86.82% 87.03% 6825.20 11773.65 77.60% 80.91% 20.81 70.55 65.37 94.41 476K 472K
kimikoe 2223 1983 19.4767 789527 428893 83.46% 89.17% 89.90% 5567.55 9687.60 83.21% 85.26% 10.46 79.05 79.94 96.03 1.8M 1.7M
leyline1+2+3 2020 1797 22.8403 1210036 675300 85.92% 92.91% 93.50% 3297.66 7004.95 83.54% 88.14% 10.78 81.54 88.31 98.27 2.7M 2.6M
leyline1 1597 1326 21.5464 353024 196110 86.66% 93.00% 94.78% 3211.65 7027.22 83.94% 89.29% 10.31 84.43 94.05 99.87 796K 776K
leyline2 1749 1474 23.2131 453826 254509 86.25% 92.89% 93.17% 3294.32 6919.88 83.94% 88.68% 11.09 80.98 87.24 98.41 1016K 988K
leyline3 1662 1444 23.6588 403186 226688 84.94% 92.87% 93.17% 3371.52 7027.07 82.77% 87.64% 10.86 80.12 84.34 96.78 908K 892K
magical charming 2119 1804 21.1496 699095 374762 85.91% 91.32% 92.23% 4061.62 9145.35 83.97% 86.63% 8.85 78.52 83.03 105.02 1.6M 1.6M
majokoi 1971 1728 18.5559 629301 325629 86.89% 91.59% 91.59% 3944.91 7982.30 84.94% 87.30% 9.44 82.25 95.66 107.73 1.5M 1.3M
muramasa 3071 2797 17.8912 1400558 739546 77.14% 80.59% 81.31% 11507.12 20317.17 73.95% 76.18% 10.84 72.65 68.82 80.45 3.1M 2.9M
nanarin 1852 1730 21.1745 750968 258415 84.19% 91.03% 92.85% 4459.48 9063.45 82.60% 86.52% 8.32 77.11 89.50 103.14 1.8M 1.1M
nanatsuiro 1544 1313 20.2550 487860 271395 88.03% 93.19% 94.62% 2801.15 7840.70 85.57% 91.77% 11.23 81.84 91.07 102.35 1.1M 1.1M
noratoto 2051 1792 19.5412 717642 380234 83.28% 90.76% 91.16% 4554.42 9892.62 82.68% 85.87% 10.30 81.72 82.43 105.84 1.7M 1.6M
oretsuba afterstory 2132 1828 28.7157 435867 235845 82.36% 88.07% 88.41% 6242.30 11862.10 80.19% 82.97% 13.94 79.29 78.71 100.24 976K 960K
oretsuba prelude 2172 1800 34.2088 283074 152902 78.85% 83.99% 83.99% 8751.50 15743.08 76.46% 79.18% 16.34 77.80 64.42 97.28 620K 612K
oretsuba 2771 2509 33.3520 1771225 922247 80.14% 85.81% 86.01% 7693.91 14749.43 78.06% 80.49% 15.93 78.94 72.32 98.81 3.8M 3.6M
parfait 2135 1888 23.0168 753595 353298 83.67% 89.65% 90.08% 5234.33 10172.78 82.92% 85.44% 10.81 69.75 77.09 95.46 1.8M 1.4M
prawfclwyd 2019 1767 22.9442 512327 284997 79.87% 88.67% 89.92% 5976.33 11865.97 79.28% 83.52% 11.11 71.44 74.14 94.96 1.2M 1.2M
princessfrontier 2343 2099 19.6364 907832 461428 80.45% 86.42% 87.40% 7726.50 16844.92 78.97% 82.60% 10.68 80.06 72.56 98.75 1.9M 1.9M
rabuobu 2170 1915 22.2998 700439 366621 81.74% 87.86% 88.56% 6476.37 13353.68 80.03% 83.14% 9.87 81.13 77.30 102.64 1.6M 1.5M
rinshin 2211 1966 30.2253 784431 420946 81.31% 86.71% 87.76% 6747.77 12228.24 79.70% 83.43% 12.39 76.45 78.03 97.81 1.8M 1.7M
satsukoi 1883 1588 15.2492 319473 171580 81.93% 89.90% 91.40% 5089.90 9388.20 81.06% 84.81% 10.61 78.77 77.24 93.61 728K 696K
senmomo 2514 2237 19.7513 876710 495776 77.18% 83.94% 86.52% 10689.70 15552.45 74.80% 81.17% 13.57 73.30 77.35 81.78 1.9M 1.9M
senrenbanka 2223 1920 23.3741 1149829 635914 85.24% 91.77% 92.33% 3801.12 9173.43 84.51% 87.91% 10.78 79.37 84.69 101.01 2.6M 2.5M
sensinkan bansenzin 2645 2379 36.9887 923229 519607 82.01% 87.57% 87.57% 6488.42 12274.00 78.94% 81.10% 19.45 76.24 82.61 90.81 2.0M 1.9M
sensinkan hatimyouzin 2859 2549 38.3709 1250645 700074 80.41% 86.11% 86.11% 7360.42 13137.82 77.15% 79.07% 20.14 77.19 82.38 89.19 2.7M 2.6M
sharnoth fvr 2125 1887 27.3698 532052 267531 83.31% 87.76% 88.31% 6480.38 12973.35 80.47% 83.74% 11.94 73.20 60.90 90.37 1.2M 1.1M
shirokuma 2433 2129 24.7324 1082543 574307 80.16% 84.77% 86.18% 8916.45 17451.49 78.12% 82.19% 10.86 77.96 70.51 102.15 2.5M 2.4M
shugaten 1764 1522 17.4341 413763 217924 82.80% 88.04% 89.91% 6512.04 15818.62 81.90% 85.87% 10.80 82.19 78.21 108.03 944K 884K
silverio vendetta 2759 2488 43.5207 954884 513667 75.39% 81.82% 82.16% 9903.99 16423.30 71.81% 74.02% 22.03 67.75 59.59 81.91 2.1M 2.1M
simulacre 1772 1516 24.3116 302739 179551 87.03% 91.18% 93.35% 4238.15 7225.15 86.16% 88.85% 11.31 80.97 89.60 95.51 672K 660K
snowwhite 1547 1247 33.6587 189737 107224 86.80% 92.28% 93.96% 3575.03 9232.55 85.69% 89.37% 18.43 78.11 80.48 97.50 408K 404K
soramitsu 2178 1982 25.9613 1101991 531524 84.90% 91.38% 91.91% 4263.15 8345.22 83.63% 85.83% 10.43 77.01 81.54 97.11 2.5M 2.1M
sourire 1895 1638 24.4430 690684 369168 86.06% 92.61% 94.50% 3303.72 8125.05 84.89% 88.76% 9.94 78.64 89.80 100.02 1.5M 1.5M
subahibi 2239 2031 22.0398 1127394 530380 86.07% 90.27% 90.58% 4818.20 10301.73 84.03% 86.49% 7.91 78.25 78.89 97.00 2.6M 2.2M
sukinara 2070 1840 24.3595 1377391 700754 83.92% 90.41% 92.21% 4730.30 11345.40 82.86% 86.98% 12.30 80.24 89.38 108.70 3.1M 2.9M
tarareba 1798 1554 26.0075 488678 275749 83.19% 91.60% 93.06% 3957.95 9910.11 81.09% 86.78% 11.79 77.55 72.70 99.03 1.1M 1.1M
trinoline 1955 1693 21.8147 545133 293111 86.06% 92.28% 93.42% 3566.54 8023.33 84.68% 87.79% 11.26 76.65 80.29 96.84 1.3M 1.2M
tsujidou ren'ai 2270 2027 16.9892 1280257 669131 81.59% 87.67% 87.67% 6398.03 12285.76 80.20% 82.68% 9.67 83.16 75.49 104.50 2.9M 2.7M
tsujidou virgin 2065 1803 17.4525 712362 375775 82.35% 87.86% 87.86% 6346.58 12129.30 80.81% 83.39% 10.17 81.79 74.57 104.01 1.6M 1.5M
tsuriotsu 2391 2104 38.4322 1093199 640516 85.86% 89.66% 91.69% 5253.54 14083.12 84.31% 87.61% 14.58 78.91 78.84 97.40 2.4M 2.4M
tsuushinbo 1869 1620 25.2615 656951 355073 84.66% 92.72% 93.95% 3327.51 7294.42 83.77% 87.52% 10.47 83.27 98.16 107.74 1.5M 1.5M
twinklecrusaders 2587 2314 23.2599 1609755 871084 81.78% 86.72% 87.31% 7098.56 14193.71 79.57% 83.23% 12.01 77.72 69.56 100.25 3.6M 3.6M
white_album 1832 1489 18.5374 492329 252542 88.73% 92.00% 93.35% 3831.16 9091.12 87.30% 91.00% 11.62 77.16 85.50 101.85 1.1M 992K
yoakena 2135 1903 15.9464 894861 478687 82.64% 89.12% 91.84% 5680.41 11646.05 82.14% 85.99% 9.25 77.66 71.40 97.13 2.0M 1.9M

Kanji (unique): The number of kanji codepoints that occur at least once in the entire script.

Kanji (2+): Same, but at least twice, not at least once.

Chars per line: Number of characters per line in file, after stripping whitespace from the front/back of the line, excluding " " from the line, and ignoring blank lines, in that order.

Characters: Total characters in the file, ignoring characters from r'『』「」[]()()【】〈〉《》«»‹›〚〛〘〙{}{} ―-~。、…‥\n\r'. That includes ignoring various whitespace characters.

Lexemes: Number of lexeme events in the script according to kuromoji-unidic with a slightly modified dictionary. Lexemes are similar to words. 限り is a lexeme. とした may be interpreted as three separate lexemes. Does not ignore any lexemes the parser understands, not even some names.

BCCWJ 5k: Coverage based on the top 5000 most common words from the BCCWJ frequency list, which was generated using mecab-unidic. The coverage value ignores grammatical lexemes, BCCWJ does not; in other words, hundreds of grammatical lexemes are inflating the BCCWJ word count required to reach a given coverage level.

VN 5k: Coverage based on the top 5k most common words from a frequency list generated from VNs (VNFreqList). Subject to massive change at any time as more VNs are added. This frequency list excludes grammatical lexemes.

VN 5k sans 50: Above, but ignoring any words in the top 50 words for that script that are not in the top 5k in the frequency list. This is slightly different from pretending that any top 50 words are known. If the entire script consisted of top 50 words that were not in the top 5k in the frequency list, the coverage would be undefined.

Core6k: Coverage based on the lexemes the analyzer recognizes from Core6k, with several hundred manual corrections. Inherently going to cover less than VN 5k, no matter what, because VN 5k is derived from the scripts it's ranking.

Core6k sans 250: Like VN 5k sans 50, but with Core6k and the top 250 from the script.

VNFreqList X% target: The number of common words from VNFreqList that you need to know to have X% coverage over the entire VN. Decimal points are because of linear interpolation. See this pastebin. Round up to the nearest whole number if you don't like that.

Chars per sentence: Like chars per line, but attempts to identify sentence boundaries.

Hayashi: An estimate of structural complexity intended for the school grade level of textbooks and reading material. For more information: http://www.lrec-conf.org/proceedings/lrec2008/pdf/165_paper.pdf

modified Hayashi: Above, but recalibrated to the ratio of kanji/hiragana/katakana in each VN's script, adjusting for a flaw in the design of the original Hayashi metric. This recalibration is fuzzy, and causes the scale to have a different linear correlation.

modified Hayashi 2: Same, but ignoring the contribution of katakana sequences entirely.

JIS MiB: Bytes in the raw script after removing blank lines and converting to Shift-JIS. Measured in 1024^2

JIS MiB (deduplicated): Same, but duplicate lines of any length are removed (reduced to one line). Measured in 10242 (M) or 10241 (K).

Stats (last edited 2017-10-18 06:52:57 by weh)