'''This page is out of date.''' See the [[Stats]] page and read the readme in the tools linked there.

## page was renamed from MakingStats
These are the tools used to generate the [[Stats|stats]]. See [[Engines]] for information on script ripping for particular engines.

[[http://wareya.moe/extra/workspace.zip]]

These tools require Python 3, 64-bit Java, and a bash prompt.

Place scripts under workplace/. __All VN scripts must be in utf-8.__ '''See [[Ripping]] for information on script formatting.''' Unix line endings (\n, not \r\n) are preferred.

'''analyzer.jar''': the core of the stats generation. Creates a lemmatized frequency list from a given VN script. Uses kuromoji-unidic, which uses a viterbi graph and a pre-trained markov model about how what words connect to eachother and how common each lexeme is. VN script must be in utf-8. Invoked by a bash script. [[https://github.com/wareya/analyzer|Github]]

'''normalizer.jar''': Merges frequency lists in the format that analyzer.jar outputs. Invoked manually on most of the frequency lists generated under the count/ directory. Used in order to create the frequency list for the 5k columns. [[https://github.com/wareya/normalizer|Github]]

'''dowork.sh''': Generates the main frequency lists for each script in workspace/, placing the lists under count/. These lists exclude grammatical lexemes.

'''altwork.sh''': Above, but with altcount/, and not excluding grammatical lexemes.

'''refresh.sh''': Calculates the hayashi score, coverages, and other stats from every frequency list and script.

'''newscript.sh''': Generates/regenerates the frequency list in count/ and the frequency list in altcount/ for a single script in workspace/.

'''fullredo.sh''': Runs dowork.sh, altwork.sh, and refresh.sh. Might be preferable the first time, but for adding single scripts, you're going to want to use newscript.sh and refresh.sh manually.