Subcommand metrics

Calculate metrics based on the comparison of a hypothesis with a reference.

usage: benchmarkstt-tools metrics -r REFERENCE -h HYPOTHESIS
                                  [-rt {infer,argument,plaintext}]
                                  [-ht {infer,argument,plaintext}]
                                  [-o {json,markdown,restructuredtext,simpletextbase}]
                                  [--diffcounts [differ_class]] [--wer [mode]
                                  [differ_class]] [--worddiffs [dialect]
                                  [differ_class]]
                                  [--log-level {critical,fatal,error,warn,warning,info,debug,notset}]
                                  [--help]

Named Arguments

-r, --reference
 File to use as reference
-h, --hypothesis
 File to use as hypothesis
-o, --output-format
 

Possible choices: json, markdown, restructuredtext, simpletextbase

Format of the outputted results

Default: "restructuredtext"

--log-level

Possible choices: critical, fatal, error, warn, warning, info, debug, notset

Set the logging output level

Default: warning

reference and hypothesis types

You can specify which file type the --reference/-r and --hypothesis/-h arguments should be treated as.

Available types: 'infer': Load from a given filename. Automatically infer file type from the filename extension. 'argument': Read the argument and treat as plain text (without reading from file) 'plaintext': Load from a given filename. Treat file as plain text

-rt, --reference-type
 

Possible choices: infer, argument, plaintext

Type of reference file

Default: "infer"

-ht, --hypothesis-type
 

Possible choices: infer, argument, plaintext

Type of hypothesis file

Default: "infer"

available metrics

A list of metrics to calculate. At least one metric needs to be provided.

--diffcounts Get the amount of differences between reference and hypothesis
--wer

Word Error Rate, basically defined as:

insertions + deletions + substitions
------------------------------------
     number of reference words

See: https://en.wikipedia.org/wiki/Word_error_rate

Calculates the WER using one of two algorithms:

[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python. See https://docs.python.org/3/library/difflib.html

[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance

param mode:'strict' (default), 'hunt' or 'levenshtein'.
param differ_class:
 For future use.
--worddiffs

Present differences on a per-word basis

param dialect:Presentation format. Default is 'cli'.
example dialect:
 'html'
param differ_class:
 For future use.