⇤ ← Revision 1 as of 2018-06-05 16:10:43
Size: 11487
Comment:
|
Size: 11551
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
''Custom metric is nonsensical here, don't bother with it.'' |
Stats for random anime subtitles. Just what was easy to dump. No quality guarantees. Not going to be maintained at all. Just for fun.
12-episode anime series have extremely small amounts of text - 100 to 200 KB in utf-8 - so any analysis of them is going to be extremely unstable. For reference, Hanahira is about 180 KB in utf-8.
Custom metric is nonsensical here, don't bother with it.
script name | kanji | kanji | lines | sentences | chars | chars | characters | lexemes | sjis bytes | sjis (dedup) | hours estimate | Hayashi | custom | freqlist | freqlist | freqlist |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
amagami brilliant park | 1198 | 888 | 6351 | 6377 | 9.44 | 9.09 | 59968 | 31252 | 122344 | 117303 | 1.89 | 81.33 | 76.19 | 5047.96 | 7095.28 | 10407.23 |
bakemonogatari | 1266 | 959 | 9237 | 9225 | 9.36 | 8.98 | 86446 | 47088 | 180066 | 170112 | 2.86 | 84.67 | 85.42 | 3714.10 | 5360.33 | 8079.55 |
black lagoon | 1588 | 1274 | 12360 | 12600 | 9.01 | 8.63 | 111390 | 60146 | 235635 | 226258 | 3.64 | 81.43 | 64.27 | 7882.46 | 10386.23 | 13947.02 |
cardcaptor sakura | 1362 | 1142 | 32645 | 32309 | 8.01 | 7.30 | 261594 | 126381 | 553771 | 463862 | 7.73 | 84.87 | 78.08 | 2805.26 | 4153.60 | 7043.47 |
cowboy bebop | 1304 | 1011 | 9382 | 9091 | 9.67 | 9.16 | 90735 | 45920 | 185698 | 171931 | 2.72 | 76.54 | 62.44 | 6433.00 | 8755.69 | 11828.17 |
demi-chan wa kataritai | 1047 | 767 | 4183 | 4267 | 11.55 | 10.93 | 48318 | 26910 | 103967 | 100954 | 1.66 | 83.84 | 75.08 | 4048.70 | 6160.28 | 9386.85 |
devilman crybaby | 996 | 647 | 3193 | 3554 | 10.62 | 8.91 | 33912 | 17285 | 71529 | 67021 | 1.08 | 82.80 | 77.71 | 4889.00 | 6567.50 | 10222.00 |
eromanga sensei | 954 | 702 | 6708 | 7178 | 8.66 | 7.76 | 58105 | 30433 | 122488 | 115845 | 1.84 | 84.37 | 74.75 | 3250.91 | 4298.81 | 6333.55 |
eureka seven | 1489 | 1240 | 24996 | 25379 | 8.20 | 7.57 | 204927 | 101555 | 432784 | 388647 | 6.16 | 79.36 | 53.95 | 6004.07 | 8039.99 | 12652.60 |
flying witch | 904 | 643 | 3860 | 3659 | 11.17 | 10.51 | 43099 | 22243 | 94274 | 83625 | 1.36 | 89.10 | 84.61 | 3177.43 | 4512.98 | 6976.65 |
fractale | 554 | 319 | 960 | 1439 | 11.87 | 7.50 | 11398 | 5777 | 27014 | 26416 | 0.36 | 86.49 | 64.06 | 5873.00 | 9855.00 | 18224.16 |
fune wo amu | 1115 | 789 | 3236 | 2993 | 12.66 | 12.26 | 40960 | 21690 | 88702 | 78438 | 1.38 | 80.40 | 74.81 | 6622.45 | 7890.63 | 11149.16 |
gabriel dropout | 1099 | 826 | 7242 | 7691 | 8.95 | 8.07 | 64811 | 31584 | 135685 | 129955 | 1.97 | 81.79 | 70.55 | 3940.10 | 5384.08 | 7638.05 |
gekkan shoujo nozaki-kun | 1066 | 752 | 4173 | 4194 | 12.99 | 11.67 | 54221 | 28293 | 118960 | 107660 | 1.76 | 81.32 | 72.36 | 3072.30 | 4236.44 | 6146.25 |
gochiusa | 1186 | 853 | 4246 | 4159 | 11.97 | 12.04 | 50844 | 27741 | 103554 | 99978 | 1.73 | 74.60 | 72.10 | 4894.10 | 6026.91 | 9014.77 |
hyouka | 1419 | 1167 | 13490 | 13460 | 8.85 | 8.60 | 119434 | 65924 | 251850 | 236702 | 4.06 | 79.33 | 75.45 | 5121.25 | 7605.94 | 10470.62 |
inu x boku | 1119 | 815 | 5691 | 5721 | 8.75 | 8.19 | 49772 | 26497 | 104383 | 96575 | 1.59 | 82.99 | 78.53 | 5550.25 | 8128.12 | 13213.75 |
jinrui | 1483 | 1090 | 5062 | 4713 | 11.92 | 11.98 | 60336 | 33090 | 129157 | 118913 | 2.06 | 76.14 | 71.82 | 6237.70 | 8171.39 | 10783.85 |
jojo | 1393 | 1142 | 14145 | 14769 | 8.28 | 7.66 | 117159 | 61076 | 251507 | 235637 | 3.71 | 79.54 | 51.31 | 6628.25 | 8652.69 | 12442.08 |
joukamachi no dandelion | 1109 | 834 | 7446 | 7293 | 9.35 | 8.67 | 69609 | 34207 | 142146 | 126896 | 2.09 | 86.91 | 86.14 | 3620.60 | 5161.35 | 7159.80 |
kono bijutsubu | 969 | 681 | 6042 | 6327 | 8.61 | 7.91 | 52008 | 25949 | 108025 | 100336 | 1.59 | 85.86 | 81.05 | 3246.28 | 4335.04 | 6378.02 |
lucky star | 1608 | 1276 | 11664 | 11555 | 13.38 | 12.43 | 156029 | 79273 | 339741 | 311283 | 4.83 | 83.25 | 74.73 | 5128.70 | 7112.67 | 10483.63 |
mahoutsukai no yome | 1222 | 923 | 9314 | 9617 | 8.24 | 7.60 | 76756 | 41461 | 163354 | 150031 | 2.46 | 87.73 | 82.88 | 3919.15 | 5679.91 | 9724.15 |
mawaru penguindrum | 1332 | 1009 | 10363 | 10877 | 10.20 | 8.56 | 105669 | 52161 | 220701 | 196808 | 3.15 | 82.64 | 77.04 | 4413.99 | 6546.18 | 11071.54 |
mikakunin | 942 | 681 | 6840 | 7289 | 8.67 | 7.76 | 59294 | 31370 | 125675 | 118803 | 1.90 | 85.11 | 81.07 | 3698.50 | 5659.50 | 9497.35 |
mob psycho 100 | 1240 | 944 | 6886 | 7473 | 9.04 | 7.92 | 62249 | 33280 | 131039 | 125729 | 2.02 | 81.91 | 75.17 | 5289.20 | 6395.90 | 9045.60 |
nagi no asukara | 1297 | 960 | 9225 | 8316 | 10.42 | 10.04 | 96147 | 50251 | 211835 | 174613 | 3.06 | 88.20 | 87.95 | 3187.85 | 5299.08 | 8652.05 |
ngsrt airantou | 1326 | 1021 | 14463 | 16011 | 9.89 | 7.81 | 143071 | 65751 | 296831 | 264531 | 4.03 | 89.39 | 86.08 | 5211.30 | 6905.48 | 10198.32 |
nichijou | 1212 | 933 | 13498 | 13719 | 7.89 | 7.22 | 106440 | 49991 | 222473 | 198180 | 3.10 | 85.32 | 69.85 | 4766.70 | 6580.79 | 9171.67 |
no game no life | 1237 | 910 | 7015 | 7192 | 8.95 | 8.44 | 62792 | 32958 | 130894 | 124833 | 2.03 | 77.91 | 60.21 | 6547.00 | 8396.25 | 10731.50 |
non non biyori | 940 | 706 | 6149 | 6435 | 8.46 | 7.65 | 52003 | 25933 | 108793 | 101528 | 1.59 | 86.99 | 82.13 | 5067.70 | 5952.78 | 8822.42 |
noragami | 1179 | 769 | 4095 | 3951 | 9.49 | 9.41 | 38869 | 22243 | 83735 | 77204 | 1.36 | 89.62 | 85.32 | 4727.70 | 6876.53 | 10335.70 |
owari no seraph 1~2 | 1125 | 890 | 10223 | 10301 | 8.34 | 7.79 | 85309 | 44984 | 180143 | 162224 | 2.76 | 82.66 | 80.02 | 3031.40 | 4059.30 | 6357.20 |
panty and stocking | 1236 | 920 | 7790 | 8175 | 8.71 | 7.77 | 67862 | 31267 | 141473 | 130912 | 1.93 | 80.34 | 36.98 | 8380.80 | 10887.67 | 13316.81 |
ping pong | 1023 | 710 | 4986 | 4956 | 8.31 | 7.86 | 41437 | 20829 | 86286 | 79877 | 1.29 | 81.17 | 58.14 | 7365.60 | 9221.06 | 12225.88 |
psycho pass | 1511 | 1240 | 10428 | 10644 | 9.40 | 9.14 | 98018 | 51624 | 208251 | 199667 | 3.27 | 69.10 | 54.69 | 7431.31 | 8925.73 | 11731.61 |
railgun 1~2 | 1555 | 1311 | 27097 | 27921 | 8.21 | 7.51 | 222468 | 114741 | 475308 | 428180 | 6.99 | 79.15 | 70.27 | 5353.60 | 7696.23 | 11423.80 |
saki | 1264 | 953 | 12185 | 12709 | 8.48 | 7.72 | 103296 | 54765 | 220784 | 197145 | 3.32 | 79.70 | 64.15 | 8363.92 | 12227.23 | 17974.68 |
samflam | 1337 | 1085 | 11742 | 12740 | 8.59 | 7.37 | 100867 | 52146 | 240007 | 225486 | 3.21 | 75.63 | 56.80 | 4993.40 | 6987.65 | 9964.52 |
samurai champloo | 1322 | 1012 | 10894 | 11001 | 7.87 | 7.38 | 85687 | 44945 | 180967 | 164467 | 2.74 | 87.90 | 84.85 | 4910.30 | 6885.58 | 10353.15 |
sayonara zetsubou sensei 1~2 | 1678 | 1317 | 13135 | 13455 | 10.26 | 9.13 | 134776 | 67299 | 281372 | 257118 | 4.19 | 80.64 | 70.23 | 6517.40 | 8918.29 | 12496.70 |
scryed | 1416 | 1108 | 10629 | 10525 | 10.82 | 10.65 | 115015 | 61643 | 238704 | 225606 | 3.80 | 78.09 | 68.70 | 6066.43 | 8285.79 | 11795.25 |
shiki | 1315 | 1003 | 6165 | 8866 | 13.65 | 8.51 | 84126 | 44718 | 194106 | 179941 | 2.76 | 87.84 | 83.60 | 4230.00 | 6232.50 | 9576.00 |
shinsekai yori | 1478 | 1195 | 11269 | 11641 | 9.26 | 8.61 | 104381 | 57416 | 219180 | 206791 | 3.51 | 80.90 | 80.40 | 7282.30 | 11578.16 | 14825.16 |
sora no woto | 1154 | 831 | 3267 | 3193 | 11.51 | 11.17 | 37596 | 20443 | 78016 | 75306 | 1.27 | 65.26 | 75.55 | 4879.00 | 6664.25 | 9202.25 |
sword art online | 1352 | 1036 | 10303 | 10234 | 8.27 | 8.26 | 85248 | 47928 | 177161 | 159707 | 2.84 | 76.63 | 58.84 | 5145.17 | 6829.05 | 10157.85 |
tamako market | 1182 | 835 | 5187 | 4789 | 10.66 | 10.58 | 55273 | 27676 | 112174 | 99236 | 1.70 | 85.56 | 87.04 | 5262.90 | 6254.85 | 10013.04 |
toradora | 1326 | 1028 | 15392 | 16280 | 8.41 | 7.48 | 129374 | 65273 | 278068 | 258639 | 4.02 | 85.23 | 75.79 | 4636.14 | 6177.10 | 9959.02 |
trigun | 1298 | 1019 | 12722 | 12879 | 8.39 | 7.65 | 106713 | 53235 | 223933 | 201750 | 3.21 | 85.57 | 71.15 | 4820.50 | 6760.75 | 9795.50 |
twintails | 1103 | 794 | 5754 | 5910 | 9.72 | 9.16 | 55914 | 27898 | 112637 | 108944 | 1.69 | 77.96 | 70.57 | 4793.77 | 6077.15 | 8647.10 |
uchouten kazoku | 1152 | 871 | 6248 | 6305 | 9.07 | 8.56 | 56662 | 31314 | 118300 | 110614 | 1.91 | 86.05 | 84.20 | 6183.60 | 8596.19 | 12169.80 |
violet evergarden | 1078 | 810 | 5926 | 5758 | 8.38 | 8.04 | 49632 | 25780 | 105498 | 95119 | 1.58 | 78.45 | 72.44 | 4716.05 | 6113.58 | 9889.52 |
youjo senki | 1330 | 1023 | 5352 | 5625 | 9.25 | 8.40 | 49531 | 26785 | 103556 | 98936 | 1.76 | 65.42 | 22.58 | 10532.60 | 13179.90 | 16504.56 |
zankyou no terror | 1095 | 782 | 2979 | 2636 | 12.42 | 11.96 | 37006 | 18453 | 80014 | 68142 | 1.16 | 72.58 | 58.55 | 6658.17 | 8697.15 | 12689.10 |
Dumper used for .srt files:
1 #!python
2
3 import sys
4 import re
5
6
7 def print_safe(string, end="\n"):
8 sys.stdout.buffer.write((str(string)+end).encode("utf-8"))
9
10 nullify = [
11 "[テレビ]",
12 "[スピーカ]",
13 r"\n",
14 r"\N",
15 "\r",
16 ]
17
18 for arg in sys.argv[1:]:
19 with open(arg, "r", encoding="utf-8-sig") as f:
20 groups = f.read().split("\n\n")
21
22 last_group = ""
23
24 for i in range(len(groups)):
25 groups[i] = groups[i].split("\n")[2:]
26
27 if "\n".join(groups[i]) == last_group:
28 continue
29 last_group = "\n".join(groups[i])
30
31 did_print = False
32 for j in range(len(groups[i])):
33 line = groups[i][j]
34 line = re.sub("([^)]*)","",line)
35 line = re.sub(r"\([^\)]*\)","",line)
36 line = line.replace("《","«")
37 line = line.replace("》","»")
38 for null in nullify:
39 line = line.replace(null,"")
40 line = line.strip()
41 if line != "":
42 #print_safe(line)
43 did_print = True
44 if did_print:
45 #print_safe("")
46 pass
47 #print_safe("")
48 print_safe(arg)
Dumper used for .ass files:
1 #!python
2
3 import sys
4 import re
5
6 def print_safe(string, end="\n"):
7 sys.stdout.buffer.write((str(string)+end).encode("utf-8"))
8
9 def parsecsv(string):
10 fields = []
11 insomething = False
12
13 nullify = [
14 "[テレビ]",
15 "[スピーカ]",
16 r"\n",
17 r"\N",
18 ]
19
20 for arg in sys.argv[1:]:
21 with open(arg, "r", encoding="utf-8") as f:
22 events = False
23 last_group = ""
24 for line in f:
25 line = line.strip("\n")
26 if events:
27 if line.startswith("Dialogue:"):
28 line = line.replace("Dialogue:","",1)
29
30 # do not use the CSV parser for this
31 fields = line.split(",",9)
32
33 if "人类_声明" in fields[:-1]:
34 continue
35 if "标题" in fields[:-1]:
36 continue
37 if "staff" in fields[:-1]:
38 continue
39 if "Opening" in fields[:-1]:
40 continue
41 if "Ending" in fields[:-1]:
42 continue
43
44
45 line = fields[-1]
46 basic_line = line
47
48 # it contains drawing instructions, which we need a parser to correctly isolate and remove
49 # line is probably just pure drawing instructions so get rid of it
50 if r"\p" in line:
51 continue
52
53 line = re.sub(r"\{[^\}]*\}","",line)
54 line = re.sub("([^)]*)","",line)
55 line = re.sub(r"\([^\)]*\)","",line)
56 line = line.strip()
57 line = line.replace("《","«")
58 line = line.replace("》","»")
59 for null in nullify:
60 line = line.replace(null,"")
61 # probably per-character karaoke or something
62 if len(line) <= 1 and "pos" in basic_line:
63 continue
64 if line != "":
65 if last_group == line:
66 continue
67 last_group = line
68 print_safe(line)
69
70 if line == "[Events]":
71 events = True