Available JSON-RPC methods¶

Attention

Only supported for Python versions 3.6 and above

version¶

Get the version of benchmarkstt

return str:	BenchmarkSTT version

list.normalization¶

Get a list of available core normalization

return object:	With key being the normalization name, and value its description

normalization.config¶

Use config file notation to define normalization rules. This notation is a list of normalizers, one per line.

Each normalizer that is based needs a file is followed by a file name of a csv, and can be optionally followed by the file encoding (if different than default). All options are loaded in from this csv and applied to the normalizer.

The normalizers can be any of the core normalizers, or you can refer to your own normalizer class (like you would use in a python import, eg. my.own.package.MyNormalizerClass).

Additional rules:

Normalizer names are case-insensitive.
Arguments MAY be wrapped in double quotes.
If an argument contains a space, newline or double quote, it MUST be wrapped in double quotes.
A double quote itself is represented in this quoted argument as two double quotes: "".

The normalization rules are applied top-to-bottom and follow this format:

[normalization]
# This is a comment

# (Normalizer2 has no arguments)
lowercase

# loads regex expressions from regexrules.csv in "utf 8" encoding
regex regexrules.csv "utf 8"

# load another config file, [section1] and [section2]
config configfile.ini section1
config configfile.ini section2

# loads replace expressions from replaces.csv in default encoding
replace     replaces.csv

example encoding:
param file:	The config file
param encoding:	The file encoding
param section:	The subsection of the config file to use, defaults to 'normalization'
example text:	"He bravely turned his tail and fled"
example file:	"./resources/test/normalizers/configfile.conf"
	"UTF-8"
example return:	"ha bravalY Turnad his tail and flad"
param text:	The text to normalize
param bool return_logs:
	Return normalization logs

normalization.lowercase¶

Lowercase the text

param bool return_logs:
example text:	"Easy, Mungo, easy... Mungo..."
example return:	"easy, mungo, easy... mungo..."
param text:	The text to normalize
	Return normalization logs

normalization.regex¶

Simple regex replace. By default the pattern is interpreted case-sensitive.

Case-insensitivity is supported by adding inline modifiers.

You might want to use capturing groups to preserve the case. When replacing a character not captured, the information about its case is lost...

Eg. would replace "HAHA! Hahaha!" to "HeHe! Hehehe!":

search replace

(?i)(h)a \1e

No regex flags are set by default, you can set them yourself though in the regex, and combine them at will, eg. multiline, dotall and ignorecase.

Eg. would replace "New<CRLF>line" to "newline":

search replace

(?msi)new.line newline

example replace:
example text:	"HAHA! Hahaha!"
example search:	'(?i)(h)a'
	'\1e'
example return:	"HeHe! Hehehe!"
param text:	The text to normalize
param bool return_logs:
	Return normalization logs

normalization.replace¶

Simple search replace

example replace:
param search:	Text to search for
param replace:	Text to replace with
example text:	"Nudge nudge!"
example search:	"nudge"
	"wink"
example return:	"Nudge wink!"
param text:	The text to normalize
param bool return_logs:
	Return normalization logs

normalization.replacewords¶

Simple search replace that only replaces "words", the first letter will be checked case insensitive as well with preservation of case..

example replace:
param search:	Word to search for
param replace:	Replace with
example text:	"She has a heart of formica"
example search:	"a"
	"the"
example return:	"She has the heart of formica"
param text:	The text to normalize
param bool return_logs:
	Return normalization logs

normalization.unidecode¶

Unidecode characters to ASCII form, see Python's Unidecode package for more info.

param bool return_logs:
example text:	"𝖂𝖊𝖓𝖓 𝖎𝖘𝖙 𝖉𝖆𝖘 𝕹𝖚𝖓𝖘𝖙ü𝖈𝖐 𝖌𝖎𝖙 𝖚𝖓𝖉 𝕾𝖑𝖔𝖙𝖊𝖗𝖒𝖊𝖞𝖊𝖗?"
example return:	"Wenn ist das Nunstuck git und Slotermeyer?"
param text:	The text to normalize
	Return normalization logs

list.metrics¶

Get a list of available core metrics

return object:	With key being the metrics name, and value its description

metrics.diffcounts¶

Get the amount of differences between reference and hypothesis

param ref:	Reference text
param hyp:	Hypothesis text

metrics.wer¶

Word Error Rate, basically defined as:

insertions + deletions + substitions
------------------------------------
     number of reference words

See: https://en.wikipedia.org/wiki/Word_error_rate

Calculates the WER using one of two algorithms:

[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python. See https://docs.python.org/3/library/difflib.html

[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance

param differ_class:
param mode:	'strict' (default), 'hunt' or 'levenshtein'.
	For future use.
param ref:	Reference text
param hyp:	Hypothesis text

metrics.worddiffs¶

Present differences on a per-word basis

example dialect:
param dialect:	Presentation format. Default is 'cli'.
	'html'
param differ_class:
	For future use.
param ref:	Reference text
param hyp:	Hypothesis text

list.benchmark¶

Get a list of available core benchmark

return object:	With key being the benchmark name, and value its description

benchmark.diffcounts¶

Get the amount of differences between reference and hypothesis

param bool return_logs:
param ref:	Reference text
param hyp:	Hypothesis text
param config:	The config to use
	Return normalization logs
example ref:	'Hello darkness my OLD friend'
example hyp:	'Hello darkness my old foe'
example config:	[normalization] # using a simple config file Lowercase
example result:	""

benchmark.wer¶

Word Error Rate, basically defined as:

insertions + deletions + substitions
------------------------------------
     number of reference words

See: https://en.wikipedia.org/wiki/Word_error_rate

Calculates the WER using one of two algorithms:

[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python. See https://docs.python.org/3/library/difflib.html

[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance

param differ_class:
param mode:	'strict' (default), 'hunt' or 'levenshtein'.
	For future use.
param ref:	Reference text
param hyp:	Hypothesis text
param config:	The config to use
param bool return_logs:
	Return normalization logs
example ref:	'Hello darkness my OLD friend'
example hyp:	'Hello darkness my old foe'
example config:	[normalization] # using a simple config file Lowercase
example result:	""

benchmark.worddiffs¶

Present differences on a per-word basis

example dialect:
param dialect:	Presentation format. Default is 'cli'.
	'html'
param differ_class:
	For future use.
param ref:	Reference text
param hyp:	Hypothesis text
param config:	The config to use
param bool return_logs:
	Return normalization logs
example ref:	'Hello darkness my OLD friend'
example hyp:	'Hello darkness my old foe'
example config:	[normalization] # using a simple config file Lowercase
example result:	""

help¶

Returns available api methods

return object:	With key being the method name, and value its description