Subcommand metrics¶

Calculate metrics based on the comparison of a hypothesis with a reference.

usage: benchmarkstt-tools metrics -r REFERENCE -h HYPOTHESIS
                                  [-rt {infer,argument,plaintext}]
                                  [-ht {infer,argument,plaintext}]
                                  [-o {json,markdown,restructuredtext,simpletextbase}]
                                  [--diffcounts [differ_class]] [--wer [mode]
                                  [differ_class]] [--worddiffs [dialect]
                                  [differ_class]]
                                  [--log-level {critical,fatal,error,warn,warning,info,debug,notset}]
                                  [--help]

Named Arguments¶

`-r, --reference`
	File to use as reference
`-h, --hypothesis`
	File to use as hypothesis
`-o, --output-format`
	Possible choices: json, markdown, restructuredtext, simpletextbase Format of the outputted results Default: "restructuredtext"
`--log-level`	Possible choices: critical, fatal, error, warn, warning, info, debug, notset Set the logging output level Default: warning

reference and hypothesis types¶

You can specify which file type the --reference/-r and --hypothesis/-h arguments should be treated as.

Available types: 'infer': Load from a given filename. Automatically infer file type from the filename extension. 'argument': Read the argument and treat as plain text (without reading from file) 'plaintext': Load from a given filename. Treat file as plain text

`-rt, --reference-type`
	Possible choices: infer, argument, plaintext Type of reference file Default: "infer"
`-ht, --hypothesis-type`
	Possible choices: infer, argument, plaintext Type of hypothesis file Default: "infer"

available metrics¶

A list of metrics to calculate. At least one metric needs to be provided.

--diffcounts Get the amount of differences between reference and hypothesis

--wer

Word Error Rate, basically defined as:

insertions + deletions + substitions
------------------------------------
     number of reference words

See: https://en.wikipedia.org/wiki/Word_error_rate

Calculates the WER using one of two algorithms:

[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python. See https://docs.python.org/3/library/difflib.html

[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance

param differ_class:
param mode:	'strict' (default), 'hunt' or 'levenshtein'.
	For future use.

--worddiffs

Present differences on a per-word basis

example dialect:
param dialect:	Presentation format. Default is 'cli'.
	'html'
param differ_class:
	For future use.