⇤ ← Revision 1 as of 2017-08-26 21:27:55
Size: 1043
Comment:
|
Size: 1585
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
These are the tools used to generate the [[Stats|stats]]. __All VN scripts must be in utf-8.__ | These are the tools used to generate the [[Stats|stats]]. |
Line 4: | Line 4: |
Place scripts under workplace/. __All VN scripts must be in utf-8.__ Unix line endings (\n, not \r\n) are preferred. |
|
Line 12: | Line 14: |
'''refresh.sh''': Calculates the hayashi score, coverages, and other stats from every frequency list and script. '''newscript.sh''': Generates/regenerates the frequency list in count/ and the frequency list in altcount/ for a single script in workspace/. '''fullredo.sh'': Runs dowork.sh, altwork.sh, and refresh.sh. Might be preferable the first time, but for adding single scripts, you're going to want to use newscript.sh and refresh.sh manually. |
These are the tools used to generate the stats.
http://wareya.moe/extra/workspace.zip
Place scripts under workplace/. All VN scripts must be in utf-8. Unix line endings (\n, not \r\n) are preferred.
analyzer.jar: the core of the stats generation. Creates a lemmatized frequency list from a given VN script. Uses kuromoji-unidic, which uses a viterbi graph and a pre-trained markov model about how what words connect to eachother and how common each lexeme is. VN script must be in utf-8. Invoked by a bash script. Github
normalizer.jar: Merges frequency lists in the format that analyzer.jar outputs. Invoked manually on most of the frequency lists generated under the count/ directory. Used in order to create the frequency list for the 5k columns. Github
dowork.sh: Generates the main frequency lists for each script in workspace/, placing the lists under count/. These lists exclude grammatical lexemes.
altwork.sh: Above, but with altcount/, and not excluding grammatical lexemes.
refresh.sh: Calculates the hayashi score, coverages, and other stats from every frequency list and script.
newscript.sh: Generates/regenerates the frequency list in count/ and the frequency list in altcount/ for a single script in workspace/.
fullredo.sh: Runs dowork.sh, altwork.sh, and refresh.sh. Might be preferable the first time, but for adding single scripts, you're going to want to use newscript.sh and refresh.sh manually.