benchmarkstt.metrics.core module¶
-
class
benchmarkstt.metrics.core.
BEER
(entities_file=None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/L
which is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }
W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_N
The minimum value for weight being 0.
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
class
benchmarkstt.metrics.core.
CER
(mode=None, differ_class=None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference characters
Character error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- Parameters
mode -- 'levenshtein' (default).
differ_class -- For future use.
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
class
benchmarkstt.metrics.core.
DiffCounts
(mode=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Get the amount of differences between reference and hypothesis
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
-
class
benchmarkstt.metrics.core.
OpcodeCounts
(equal, replace, insert, delete)¶ Bases:
tuple
-
property
delete
¶ Alias for field number 3
-
property
equal
¶ Alias for field number 0
-
property
insert
¶ Alias for field number 2
-
property
replace
¶ Alias for field number 1
-
property
-
class
benchmarkstt.metrics.core.
WER
(mode=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference words
See: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- Parameters
mode -- 'strict' (default), 'hunt' or 'levenshtein'.
differ_class -- For future use.
-
DEL_PENALTY
= 1¶
-
INS_PENALTY
= 1¶
-
MODE_HUNT
= 'hunt'¶
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
MODE_STRICT
= 'strict'¶
-
SUB_PENALTY
= 1¶
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema) → float[source]¶
-
class
benchmarkstt.metrics.core.
WordDiffs
(dialect=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Present differences on a per-word basis
- Parameters
dialect -- Presentation format. Default is 'ansi'.
differ_class -- For future use.
- Example dialect
'html'
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
benchmarkstt.metrics.core.
get_differ
(a, b, differ_class: benchmarkstt.diff.Differ)[source]¶
-
benchmarkstt.metrics.core.
get_opcode_counts
(opcodes) → benchmarkstt.metrics.core.OpcodeCounts[source]¶