Subcommand metrics¶
Calculate metrics based on the comparison of a hypothesis with a reference.
usage: benchmarkstt-tools metrics -r REFERENCE -h HYPOTHESIS
[-rt {infer,argument,plaintext}]
[-ht {infer,argument,plaintext}]
[-o {json,markdown,restructuredtext,simpletextbase}]
[--diffcounts [differ_class]] [--wer [mode]
[differ_class]] [--worddiffs [dialect]
[differ_class]]
[--log-level {critical,fatal,error,warn,warning,info,debug,notset}]
[--help]
Named Arguments¶
-r, --reference | |
File to use as reference | |
-h, --hypothesis | |
File to use as hypothesis | |
-o, --output-format | |
Possible choices: json, markdown, restructuredtext, simpletextbase Format of the outputted results Default: "restructuredtext" | |
--log-level | Possible choices: critical, fatal, error, warn, warning, info, debug, notset Set the logging output level Default: warning |
reference and hypothesis types¶
You can specify which file type the --reference/-r and --hypothesis/-h arguments should be treated as.
Available types: 'infer': Load from a given filename. Automatically infer file type from the filename extension. 'argument': Read the argument and treat as plain text (without reading from file) 'plaintext': Load from a given filename. Treat file as plain text
-rt, --reference-type | |
Possible choices: infer, argument, plaintext Type of reference file Default: "infer" | |
-ht, --hypothesis-type | |
Possible choices: infer, argument, plaintext Type of hypothesis file Default: "infer" |
available metrics¶
A list of metrics to calculate. At least one metric needs to be provided.
--diffcounts | Get the amount of differences between reference and hypothesis | ||||||||||
--wer | Word Error Rate, basically defined as: insertions + deletions + substitions
------------------------------------
number of reference words
See: https://en.wikipedia.org/wiki/Word_error_rate Calculates the WER using one of two algorithms: [Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python. See https://docs.python.org/3/library/difflib.html [Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
| ||||||||||
--worddiffs | Present differences on a per-word basis
|