BenchmarkSTT¶
About¶
This is a command line tool for benchmarking Automatic Speech Recognition engines.
It is designed for non-academic production environments, and prioritises ease of use and relative benchmarking over scientific procedure and high-accuracy absolute scoring.
Because of the wide range of languages, algorithms and audio characteristics, no single STT engine can be expected to excel in all circumstances. For this reason, this tool places responsibility on the users to design their own benchmarking procedure and to decide, based on the combination of test data and metrics, which engine is best suited for their particular use case.
Usage examples¶
Returns the number of word insertions, deletions, replacements and matches for the hypothesis transcript compared to the reference:
benchmarkstt --reference reference.txt --hypothesis hypothesis.txt --diffcounts
Returns the Word Error Rate after lowercasing both reference and hypothesis. This normlization improves the accuracy of the Word Error Rate as it removes diffs that might otherwise be considered errors:
benchmarkstt -r reference.txt -h hypothesis.txt --wer --lowercase
Returns a visual diff after applying all the normalization rules specified in the config file:
benchmarkstt -r reference.txt -h hypothesis.txt --worddiffs --config conf
Further information¶
This is a collaborative project to create a library for benchmarking AI/ML applications. It was created in response to the needs of broadcasters and providers of Access Services to media organisations, but anyone is welcome to contribute. The group behind this project is the EBU's Media Information Management & AI group.
Currently the group is focussing on Speech-to-Text, but it will consider creating benchmarking tools for other AI/ML services.
For general information about this project, including the motivations and guiding principles, please see the project wiki
To install and start using the tool, go to the documentation.
License¶
Copyright 2019 EBU
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Installation¶
BenchmarkSTT requires Python version 3.5 or above. If you wish to make use of the API, Python version 3.6 or above is required.
From PyPI (preferred)¶
This is the easiest and preferred way of installing benchmarkstt
.
Install Python 3.5 or above (latest stable version for your OS is preferred):
Use the guides available at The Hitchhiker’s Guide to Python
Warning
Some dependent packages require python-dev to be installed. On Debian-based systems this can be done using e.g. apt-get install python3.7-dev
for Python 3.7, Red Hat-based systems would use e.g. yum install python3.7-devel
.
Install the package using
pip
, this will also install all requirements:python3 -m pip install benchmarkstt
Test and use
BenchmarkSTT should now be installed and usable.
$> benchmarkstt --versionbenchmarkstt: 1.1$> echo IT WORKS! | benchmarkstt-tools normalization --lowercaseit works!Use the
--help
option to get all available options:benchmarkstt --help benchmarkstt-tools normalization --help
See Usage for more information on how to use.
From the repository¶
For building the documentation locally and working with a development copy see Development
Removing benchmarkstt¶
BenchmarkSTT can be easily uninstalled using:
python3 -m pip uninstall benchmarkstt
Docker¶
See instructions for setting up and running as a docker image at:
Using docker¶
Warning
This assumes docker is already installed on your system.
Build the image¶
Download the code from github at https://github.com/ebu/benchmarkstt/archive/master.zip
Unzip the file
Inside the benchmarkstt folder run:
docker build -t benchmarkstt:latest .Run the image¶
You can change port for the api, just change the
1234
to the port you want to bind to:docker run --name benchmarkstt -p 1234:8080 --rm benchmarkstt:latestThe json-rpc api is then automatically available at:
http://localhost:1234/api
While the docker image is running you can use the CLI application like this (see Usage for more information about which commands are available):
docker exec -it benchmarkstt benchmarkstt --version docker exec -it benchmarkstt benchmarkstt --help docker exec -it benchmarkstt benchmarkstt-tools --helpStopping the image¶
You can stop the docker image running by running:
docker stop benchmarkstt
Tutorial¶
Word Error Rate and normalization¶
In this step-by-step tutorial you will compare the Word Error Rate (WER) of two machine-generated transcripts. The WER is calculated against a less-than-perfect reference made from a human-generated subtitle file. You will also use normalization rules to improve the accuracy of the results.
To follow this tutorial you will need a working installation of benchmarkstt
and these source files saved to your working folder:
This demo shows the capabilities of Release 1 of the library, which benchmarks the accuracy of word recognition only. The library supports adding new metrics in future releases. Contributions are welcome.
Creating the plain text reference file¶
Creating accurate verbatim transcripts for use as reference is time-consuming and expensive. As a quick and easy alternative, we will make a "reference" from a subtitles file. Subtitles are slightly edited and they include additional text like descriptions of sounds and actions, so they are not a verbatim transcription of the speech. Consequently, they are not suitable for calculating absolute WER. However, we are interested in calculating relative WER for illustration purposes only, so this use of subtitles is deemed acceptable.
Warning
Evaluations in this tutorial are not done for the purpose of assessing tools. The use of subtitles as reference will skew the results so they should not be taken as an indication of overall performance or as an endorsement of a particular vendor or engine.
We will use the subtitles file for the BBC's Question Time Brexit debate. This program was chosen for its length (90 minutes) and because live debates are particularly challenging to transcribe.
The subtitles file includes a lot of extra text in XML tags. This text shouldn't be used in the calculation: for both reference and hypotheses, we want to run the tool on plain text only. To strip out the XML tags, we will use the
benchmarkstt-tools
command, with the normalization
subcommand:
benchmarkstt-tools normalization --inputfile qt_subs.xml --outputfile qt_reference.txt --regex "</?[?!\[\]a-zA-Z][^>]*>" " "
The normalization rule --regex
takes two parameters: a regular expression pattern and the replacement string.
In this case all XML tags will be replaced with a space. This will result in a lot of space characters, but these are ignored by the diff algorithm later so we don't have to clean these up. --inputfile
and --outputfile
are the input and output files.
The file qt_reference.txt
has been created. You can see that the XML tags are gone, but the file still contains non-dialogue text like 'APPLAUSE'.
For better results you can manually clean up the text, or run the command again with a different normalization rule (not included in this demo). But we will stop the normalization at this point.
We now have a simple text file that will be used as the reference. The next step is to get the machine-generated transcripts for benchmarking.
Creating the plain text hypotheses files¶
The first release of benchmarkstt
does not integrate directly with STT vendors or engines, so transcripts for benchmarking have to be retrieved separately and converted to plain text.
For this demo, two machine transcripts were retrieved for the Question Time audio: from AWS Transcribe and from the BBC's version of Kaldi, an open-source STT framework.
Both AWS and BBC-Kaldi return the transcript in JSON format, with word-level timings. They also contain a field with the entire transcript as a single string, and this is the value we will use (we don't benchmark timings in this version).
To make the hypothesis file for AWS, we will use the transcript
JSON field from the transcript generated by AWS, and save it as a new document qt_aws_hypothesis.txt
.
We can automate this again using benchmarkstt-tools normalization
and a somewhat more complex regex parameter:
benchmarkstt-tools normalization --inputfile qt_aws.json --outputfile qt_aws_hypothesis.txt --regex '^.*"transcript":"([^"]+)".*' '\1'
To make the BBC-Kaldi transcript file we will use the text
JSON field from the transcript generated by Kaldi, and save it as a new document qt_kaldi_hypothesis.txt
.
Again, benchmarkstt-tools normalization
with a --regex
argument will be used for this:
benchmarkstt-tools normalization --inputfile qt_kaldi.json --outputfile qt_kaldi_hypothesis.txt --regex '^.*"text":"([^"]+)".*' '\1'
You'll end up with two files similar to these:
Benchmark!¶
We can now compare each of the hypothesis files to the reference in order to calculate the Word Error Rate. We process one file at a time, now using the main benchmarkstt
command, with two flags: --wer
is the metric we are most interested in, while --diffcounts
outputs the number of insertions, deletions, substitutions and correct words (the basis for WER calculation).
Calculate WER for AWS Transcribe:
benchmarkstt --reference qt_reference.txt --hypothesis qt_aws_hypothesis.txt --wer --diffcounts
The output should look like this:
Now calculate the WER and "diff counts" for BBC-Kaldi:
benchmarkstt --reference qt_reference.txt --hypothesis qt_kaldi_hypothesis.txt --wer --diffcounts
The output should look like this:
After running these two commands, you can see that the WER for both transcripts is quite high (around 35%). Let's see the actual differences between the reference and the hypotheses by using the --worddiffs
flag:
benchmarkstt --reference qt_reference.txt --hypothesis qt_kaldi_hypothesis.txt --worddiffs
The output should look like this (example output is truncated):
Normalize¶
You can see that a lot of the differences are due to capitalization and punctuation. Because we are only interested in the correct identification of words, these types of differences should not count as errors. To get a more accurate WER, we will remove punctuation marks and convert all letters to lowercase. We will do this for the reference and both hypothesis files by using the benchmarkstt-tools normalization
subcommand again, with two rules: the built-in --lowercase
rule and the --regex
rule:
benchmarkstt-tools normalization -i qt_reference.txt -o qt_reference_normalized.txt --lowercase --regex "[,.-]" " "
benchmarkstt-tools normalization -i qt_kaldi_hypothesis.txt -o qt_kaldi_hypothesis_normalized.txt --lowercase --regex "[,.-]" " "
benchmarkstt-tools normalization -i qt_aws_hypothesis.txt -o qt_aws_hypothesis_normalized.txt --lowercase --regex "[,.-]" " "
We now have normalized versions of the reference and two hypothesis files.
Benchmark again¶
Let's run the benchmarkstt
command again, this time calculating WER based on the normalized files:
benchmarkstt --reference qt_reference_normalized.txt --hypothesis qt_kaldi_hypothesis_normalized.txt --wer --diffcounts --worddiffs
The output should look like this (example output is truncated):
You can see that this time there are fewer differences between the reference and hypothesis. Accordingly, the WER is much lower for both hypotheses. The transcript with the lower WER is closer to the reference made from subtitles.
Do it all in one step!¶
Above, we used two commands: benchmarkstt-tools
for the normalization and benchmarkstt
for calculating the WER. But we can combine all these steps into a single command using a rules file and a config file that references it.
First, let's create a file for the regex normalization rules. Create a text document with this content:
# Replace XML tags with a space
"</?[?!\[\]a-zA-Z][^>]*>"," "
# Replace punctuation with a space
"[,.-]"," "
Save this file as rules.regex
.
Now let's create a config file that contains all the normalization rules. They must be listed under the [normalization]
section (in this release, there is only one implemented section). The section references the regex rules file we created above, and also includes one of the built-in rules:
[normalization]
# Load regex rules file and tell the processor it's a regex type
Regex rules.regex
# Built in rule
lowercase
Save the above as config.conf
. These rules will be applied to both hypothesis and reference, in the order in which they are listed.
Now run benchmarkstt
with the --conf
argument. We also need to tell the tool to treat the XML as plain text, otherwise it will look for an xml
processor and fail. We do this with the reference type argument --reference-type
:
benchmarkstt --reference qt_subs.xml --reference-type plaintext --hypothesis qt_kaldi_hypothesis.txt --config config.conf --wer
Output:
And we do the same for the AWS transcript, this time using the short form for arguments:
benchmarkstt -r qt_subs.xml -rt plaintext -h qt_aws_hypothesis.txt --config config.conf --wer
Output:
You now have WER scores for each of the machine-generated transcripts, calculated against a subtitles reference file.
As a next step, you could add more normalization rules or implement your own metrics or normalizer classes and submit them back to this project.
Word Error Rate variants¶
In this tutorial we used the WER parameter with the mode argument omitted, defaulting to strict
WER variant. This variant uses Python's built-in diff algorithm in the calculation of the WER, which is stricter and results in a slightly higher WER than the commonly used Levenshtein Distance algorithm (see more detail here).
If you use BenchmarkSTT to compare different engines then this is not a problem since the relative ranking will not be affected. However, for better compatibility with other benchmarking tools, a WER variant that uses the Levenshtein edit distance algorithm is provided. To use it, specify --wer levenshtein
.
Bag of Entities Error Rate (BEER)¶
In this tutorial you compute the Bag of Entities Error Rate (BEER) on a machine-generated transcript. It assumes knowledge of the first part of this tutorial.
The Word Error Rate is the standard metric for benchmarking ASR models, but it can be a blunt tool. It treats all words as equally important but in reality some words, like proper nouns and phrases, are more significant than common words. When these are recognized correctly by a model, they should be given more weight in the assessment of the model.
Consider for example this sentence in the reference transcript: 'The European Union headquarters'. If engine A returns 'The European onion headquarters' and engine B returns 'The European Union headache', the Word Error Rate would be similar for both engines since in both cases one word was transcribed inaccurately. But engine B should be 'rewarded' for preserving the phrase 'European Union'. The BEER is the metric that takes such considerations into account.
Another use for this metric is compensating for distortions of WER that are caused by normalization rules. For example, you may convert both reference and hypothesis transcripts to lower case or remove punctuation marks so that they don't affect the WER. In this case, the distinction between 'Theresa May' and 'Theresa may' is lost. But you can instruct BenchmarkSTT to score higher the engine that produced 'Theresa May'.
The BEER is useful to evaluate:
the suitability of transcript files as input to a tagging system,
the performances of STT services on key entities depending on the contexts, for instance highlights and players names for sport events,
the performances of a list of entities automatically selected in the reference text by TF/IDF approach which intend to reflect how important a word is.
An entity is a word or an ordered list of words including capital letters and punctuation. To calculate BEER, BenchmarkSTT needs a list of entities. It does not make this list for you. It is expected that the user will create the list outside of BSTT, manually or by using an NLP library to extract proper nouns from the reference.
BEER definition¶
The BEER is defined as the error rate per entity with a bag of words approach. In this approach the order of the entities in the documents does not affect the measure.
The weighted averaged BEER of a set of entities e1, e2 ... en measures the global performances of the n entities, a weight wn is attributed to each entity.
The weights being normalised by the tool
Calculating BEER¶
BenchmarkSTT does not have a built-in list of entities. You must provide your own in a JSON input file defining the list of entities and the weight per entity.
The file has this structure:
{ "entity1" : weight1, "entity2" : weight2, "entity3" : weight2 .. }
Let's create an example list. Save the below list as file entities.json
:
{"Theresa May" : 0.5, "Abigail" : 0.5, "EU": 0.75, "Griffin" : 0.5, "I" : 0.25}
We'll also tell BenchmarkSTT to normalize the reference and hypothesis file but without lowercasing both. We do this in the config.conf file:
[normalization]
# Load regex rules file and tell the processor it's a regex type
Regex rules.regex
Now compute the BEER in one line, using the same files from the previous section of this tutorial. The tool provides the BEER and the number of occurrence in the reference file for each entity, with the weighted averaged BEER:
benchmarkstt --reference qt_subs.xml --reference-type plaintext --hypothesis qt_aws_hypothesis.txt --config config.conf --beer entities.json
To automate the task, you can generate a JSON result file by adding the -o
option:
benchmarkstt --reference qt_subs.xml --reference-type plaintext --hypothesis qt_aws_hypothesis.txt --config config.conf --beer entities.json -o json >> beer_aws.json
Usage¶
The tool is accessible as:
Command line tool¶
usage: benchmarkstt -r REFERENCE -h HYPOTHESIS [-rt {infer,argument,plaintext}] [-ht {infer,argument,plaintext}] [-o {json,markdown,restructuredtext}] [--beer [entities_file]] [--cer [mode] [differ_class]] [--diffcounts [mode] [differ_class]] [--wer [mode] [differ_class]] [--worddiffs [dialect] [differ_class]] [--config file [section] [encoding]] [--file normalizer file [encoding] [path]] [--lowercase] [--regex search replace] [--replace search replace] [--replacewords search replace] [--unidecode] [--log] [--version] [--log-level {critical,fatal,error,warn,warning,info,debug,notset}] [--load MODULE_NAME [MODULE_NAME ...]] [--help]named arguments¶
- -r, --reference
File to use as reference
- -h, --hypothesis
File to use as hypothesis
- -o, --output-format
Possible choices: json, markdown, restructuredtext
Format of the outputted results
Default: "restructuredtext"
- --log
show normalization logs (warning: for large files with many normalization rules this will cause a significant performance penalty and a lot of output data)
Default: False
- --version
Output benchmarkstt version number
Default: False
- --log-level
Possible choices: critical, fatal, error, warn, warning, info, debug, notset
Set the logging output level
Default: warning
- --load
Load external code that may contain additional classes for normalization, etc. E.g. if the classes are contained in a python file named myclasses.py in the directory where your are calling benchmarkstt from, you would pass --load myclasses. All classes that are recognized will be automatically documented in the --help command and available for use.
reference and hypothesis types¶
You can specify which file type the --reference/-r and --hypothesis/-h arguments should be treated as.
- Available types:
'infer': Load from a given filename. Automatically infer file type from the filename extension. 'argument': Read the argument and treat as plain text (without reading from file) 'plaintext': Load from a given filename. Treat file as Plain text.
- -rt, --reference-type
Possible choices: infer, argument, plaintext
Type of reference file
Default: "infer"
- -ht, --hypothesis-type
Possible choices: infer, argument, plaintext
Type of hypothesis file
Default: "infer"
available metrics¶
A list of metrics to calculate. At least one metric needs to be provided.
- --beer
Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/Lwhich is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_NThe minimum value for weight being 0.
- --cer
Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference charactersCharacter error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- param mode
'levenshtein' (default).
- param differ_class
For future use.
- --diffcounts
Get the amount of differences between reference and hypothesis
- --wer
Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference wordsSee: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- param mode
'strict' (default), 'hunt' or 'levenshtein'.
- param differ_class
For future use.
- --worddiffs
Present differences on a per-word basis
- param dialect
Presentation format. Default is 'ansi'.
- example dialect
'html'
- param differ_class
For future use.
available normalizers¶
A list of normalizers to execute on the input, can be one or more normalizers which are applied sequentially. The program will automatically find the normalizer in benchmarkstt.normalization.core, then benchmarkstt.normalization and finally in the global namespace.
- --config
Use config file notation to define normalization rules. This notation is a list of normalizers, one per line.
Each normalizer that is based needs a file is followed by a file name of a csv, and can be optionally followed by the file encoding (if different than default). All options are loaded in from this csv and applied to the normalizer.
The normalizers can be any of the core normalizers, or you can refer to your own normalizer class (like you would use in a python import, eg. my.own.package.MyNormalizerClass).
- Additional rules:
Normalizer names are case-insensitive.
Arguments MAY be wrapped in double quotes.
If an argument contains a space, newline or double quote, it MUST be wrapped in double quotes.
A double quote itself is represented in this quoted argument as two double quotes:
""
.The normalization rules are applied top-to-bottom and follow this format:
[normalization] # This is a comment # (Normalizer2 has no arguments) lowercase # loads regex expressions from regexrules.csv in "utf 8" encoding regex regexrules.csv "utf 8" # load another config file, [section1] and [section2] config configfile.ini section1 config configfile.ini section2 # loads replace expressions from replaces.csv in default encoding replace replaces.csv
- param file
The config file
- param encoding
The file encoding
- param section
The subsection of the config file to use, defaults to 'normalization'
- example text
"He bravely turned his tail and fled"
- example file
"./resources/test/normalizers/configfile.conf"
- example encoding
"UTF-8"
- example return
"ha bravalY Turnad his tail and flad"
- --file
Read one per line and pass it to the given normalizer
- param str|class normalizer
Normalizer name (or class)
- param file
The file to read rules from
- param encoding
The file encoding
- example text
"This is an Ex-Parakeet"
- example normalizer
"regex"
- example file
"./resources/test/normalizers/regex/en_US"
- example encoding
"UTF-8"
- example return
"This is an Ex Parrot"
- --lowercase
Lowercase the text
- example text
"Easy, Mungo, easy... Mungo..."
- example return
"easy, mungo, easy... mungo..."
- --regex
Simple regex replace. By default the pattern is interpreted case-sensitive.
Case-insensitivity is supported by adding inline modifiers.
You might want to use capturing groups to preserve the case. When replacing a character not captured, the information about its case is lost...
Eg. would replace "HAHA! Hahaha!" to "HeHe! Hehehe!":
search
replace
(?i)(h)a
\1e
No regex flags are set by default, you can set them yourself though in the regex, and combine them at will, eg. multiline, dotall and ignorecase.
Eg. would replace "New<CRLF>line" to "newline":
search
replace
(?msi)new.line
newline
- example text
"HAHA! Hahaha!"
- example search
'(?i)(h)a'
- example replace
'\1e'
- example return
"HeHe! Hehehe!"
- --replace
Simple search replace
- param search
Text to search for
- param replace
Text to replace with
- example text
"Nudge nudge!"
- example search
"nudge"
- example replace
"wink"
- example return
"Nudge wink!"
- --replacewords
Simple search replace that only replaces "words", the first letter will be checked case insensitive as well with preservation of case..
- param search
Word to search for
- param replace
Replace with
- example text
"She has a heart of formica"
- example search
"a"
- example replace
"the"
- example return
"She has the heart of formica"
- --unidecode
Unidecode characters to ASCII form, see Python's Unidecode package for more info.
- example text
"𝖂𝖊𝖓𝖓 𝖎𝖘𝖙 𝖉𝖆𝖘 𝕹𝖚𝖓𝖘𝖙ü𝖈𝖐 𝖌𝖎𝖙 𝖚𝖓𝖉 𝕾𝖑𝖔𝖙𝖊𝖗𝖒𝖊𝖞𝖊𝖗?"
- example return
"Wenn ist das Nunstuck git und Slotermeyer?"
Implementation¶
The benchmarkstt command line tool links the different modules (input, normalization, metrics, etc.) in the following way:
Additional tools¶
Some additional helpful tools are available through
benchmarkstt-tools
, which provides these subcommands:Subcommand api¶
See API for more information on usage and available jsonrpc methods.
Make benchmarkstt available through a rudimentary JSON-RPC interface
Attention
Only supported for Python versions 3.6 and above
usage: benchmarkstt-tools api [--debug] [--host HOST] [--port PORT] [--entrypoint ENTRYPOINT] [--list-methods] [--with-explorer] [--log-level {critical,fatal,error,warn,warning,info,debug,notset}] [--load MODULE_NAME [MODULE_NAME ...]] [--help]Named Arguments¶
- --debug
Run in debug mode
Default: False
- --host
Hostname or ip to serve api
- --port
Port used by the server
Default: 8080
- --entrypoint
The jsonrpc api address
Default: "/api"
- --list-methods
List the available jsonrpc methods
Default: False
- --with-explorer
Also create the explorer to test api calls with, this is a rudimentary feature currently only meant for testing and debugging. Warning: the API explorer is provided as-is, without any tests or code reviews. This is marked as a low-priority feature.
Default: False
- --log-level
Possible choices: critical, fatal, error, warn, warning, info, debug, notset
Set the logging output level
Default: warning
- --load
Load external code that may contain additional classes for normalization, etc. E.g. if the classes are contained in a python file named myclasses.py in the directory where your are calling benchmarkstt from, you would pass --load myclasses. All classes that are recognized will be automatically documented in the --help command and available for use.
Subcommand normalization¶
Apply normalization to given input
usage: benchmarkstt-tools normalization [--log] [-i file] [-o file] [--config file [section] [encoding]] [--file normalizer file [encoding] [path]] [--lowercase] [--regex search replace] [--replace search replace] [--replacewords search replace] [--unidecode] [--log-level {critical,fatal,error,warn,warning,info,debug,notset}] [--load MODULE_NAME [MODULE_NAME ...]] [--help]Named Arguments¶
- --log
show normalization logs (warning: for large files with many normalization rules this will cause a significant performance penalty and a lot of output data)
Default: False
- --log-level
Possible choices: critical, fatal, error, warn, warning, info, debug, notset
Set the logging output level
Default: warning
- --load
Load external code that may contain additional classes for normalization, etc. E.g. if the classes are contained in a python file named myclasses.py in the directory where your are calling benchmarkstt from, you would pass --load myclasses. All classes that are recognized will be automatically documented in the --help command and available for use.
input and output files¶ You can provide multiple input and output files, each preceded by -i and -o respectively. If no input file is given, only one output file can be used. If using both multiple input and output files there should be an equal amount of each. Each processed input file will then be written to the corresponding output file.
- -i, --inputfile
read input from this file, defaults to STDIN
- -o, --outputfile
write output to this file, defaults to STDOUT
available normalizers¶ A list of normalizers to execute on the input, can be one or more normalizers which are applied sequentially. The program will automatically find the normalizer in benchmarkstt.normalization.core, then benchmarkstt.normalization and finally in the global namespace.
- --config
Use config file notation to define normalization rules. This notation is a list of normalizers, one per line.
Each normalizer that is based needs a file is followed by a file name of a csv, and can be optionally followed by the file encoding (if different than default). All options are loaded in from this csv and applied to the normalizer.
The normalizers can be any of the core normalizers, or you can refer to your own normalizer class (like you would use in a python import, eg. my.own.package.MyNormalizerClass).
- Additional rules:
Normalizer names are case-insensitive.
Arguments MAY be wrapped in double quotes.
If an argument contains a space, newline or double quote, it MUST be wrapped in double quotes.
A double quote itself is represented in this quoted argument as two double quotes:
""
.The normalization rules are applied top-to-bottom and follow this format:
[normalization] # This is a comment # (Normalizer2 has no arguments) lowercase # loads regex expressions from regexrules.csv in "utf 8" encoding regex regexrules.csv "utf 8" # load another config file, [section1] and [section2] config configfile.ini section1 config configfile.ini section2 # loads replace expressions from replaces.csv in default encoding replace replaces.csv
- param file
The config file
- param encoding
The file encoding
- param section
The subsection of the config file to use, defaults to 'normalization'
- example text
"He bravely turned his tail and fled"
- example file
"./resources/test/normalizers/configfile.conf"
- example encoding
"UTF-8"
- example return
"ha bravalY Turnad his tail and flad"
- --file
Read one per line and pass it to the given normalizer
- param str|class normalizer
Normalizer name (or class)
- param file
The file to read rules from
- param encoding
The file encoding
- example text
"This is an Ex-Parakeet"
- example normalizer
"regex"
- example file
"./resources/test/normalizers/regex/en_US"
- example encoding
"UTF-8"
- example return
"This is an Ex Parrot"
- --lowercase
Lowercase the text
- example text
"Easy, Mungo, easy... Mungo..."
- example return
"easy, mungo, easy... mungo..."
- --regex
Simple regex replace. By default the pattern is interpreted case-sensitive.
Case-insensitivity is supported by adding inline modifiers.
You might want to use capturing groups to preserve the case. When replacing a character not captured, the information about its case is lost...
Eg. would replace "HAHA! Hahaha!" to "HeHe! Hehehe!":
search
replace
(?i)(h)a
\1e
No regex flags are set by default, you can set them yourself though in the regex, and combine them at will, eg. multiline, dotall and ignorecase.
Eg. would replace "New<CRLF>line" to "newline":
search
replace
(?msi)new.line
newline
- example text
"HAHA! Hahaha!"
- example search
'(?i)(h)a'
- example replace
'\1e'
- example return
"HeHe! Hehehe!"
- --replace
Simple search replace
- param search
Text to search for
- param replace
Text to replace with
- example text
"Nudge nudge!"
- example search
"nudge"
- example replace
"wink"
- example return
"Nudge wink!"
- --replacewords
Simple search replace that only replaces "words", the first letter will be checked case insensitive as well with preservation of case..
- param search
Word to search for
- param replace
Replace with
- example text
"She has a heart of formica"
- example search
"a"
- example replace
"the"
- example return
"She has the heart of formica"
- --unidecode
Unidecode characters to ASCII form, see Python's Unidecode package for more info.
- example text
"𝖂𝖊𝖓𝖓 𝖎𝖘𝖙 𝖉𝖆𝖘 𝕹𝖚𝖓𝖘𝖙ü𝖈𝖐 𝖌𝖎𝖙 𝖚𝖓𝖉 𝕾𝖑𝖔𝖙𝖊𝖗𝖒𝖊𝖞𝖊𝖗?"
- example return
"Wenn ist das Nunstuck git und Slotermeyer?"
Subcommand metrics¶
Calculate metrics based on the comparison of a hypothesis with a reference.
usage: benchmarkstt-tools metrics -r REFERENCE -h HYPOTHESIS [-rt {infer,argument,plaintext}] [-ht {infer,argument,plaintext}] [-o {json,markdown,restructuredtext}] [--beer [entities_file]] [--cer [mode] [differ_class]] [--diffcounts [mode] [differ_class]] [--wer [mode] [differ_class]] [--worddiffs [dialect] [differ_class]] [--log-level {critical,fatal,error,warn,warning,info,debug,notset}] [--load MODULE_NAME [MODULE_NAME ...]] [--help]Named Arguments¶
- -r, --reference
File to use as reference
- -h, --hypothesis
File to use as hypothesis
- -o, --output-format
Possible choices: json, markdown, restructuredtext
Format of the outputted results
Default: "restructuredtext"
- --log-level
Possible choices: critical, fatal, error, warn, warning, info, debug, notset
Set the logging output level
Default: warning
- --load
Load external code that may contain additional classes for normalization, etc. E.g. if the classes are contained in a python file named myclasses.py in the directory where your are calling benchmarkstt from, you would pass --load myclasses. All classes that are recognized will be automatically documented in the --help command and available for use.
reference and hypothesis types¶ You can specify which file type the --reference/-r and --hypothesis/-h arguments should be treated as.
- Available types:
'infer': Load from a given filename. Automatically infer file type from the filename extension. 'argument': Read the argument and treat as plain text (without reading from file) 'plaintext': Load from a given filename. Treat file as Plain text.
- -rt, --reference-type
Possible choices: infer, argument, plaintext
Type of reference file
Default: "infer"
- -ht, --hypothesis-type
Possible choices: infer, argument, plaintext
Type of hypothesis file
Default: "infer"
available metrics¶ A list of metrics to calculate. At least one metric needs to be provided.
- --beer
Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/Lwhich is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_NThe minimum value for weight being 0.
- --cer
Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference charactersCharacter error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- param mode
'levenshtein' (default).
- param differ_class
For future use.
- --diffcounts
Get the amount of differences between reference and hypothesis
- --wer
Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference wordsSee: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- param mode
'strict' (default), 'hunt' or 'levenshtein'.
- param differ_class
For future use.
- --worddiffs
Present differences on a per-word basis
- param dialect
Presentation format. Default is 'ansi'.
- example dialect
'html'
- param differ_class
For future use.
Bash completion¶
Bash completion is supported through
argcomplete
.Setting up bash completion¶
If you use
bash
as your shell,benchmarkstt
andbenchmarkstt-tools
can use argcomplete for auto-completion.For this
argcomplete
needs to be installed and enabled.Installing argcomplete¶
Install argcomplete using:
python3 -m pip install argcompleteFor global activation of all argcomplete enabled python applications, run:
activate-global-python-argcompleteAlternative argcomplete configuration¶
For permanent (but not global)
benchmarkstt
activation, use:register-python-argcomplete benchmarkstt >> ~/.bashrc register-python-argcomplete benchmarkstt-tools >> ~/.bashrcFor one-time activation of argcomplete for
benchmarkstt
only, use:eval "$(register-python-argcomplete benchmarkstt; register-python-argcomplete benchmarkstt-tools)"API¶
BenchmarkSTT exposes its functionality through a JSON-RPC api.
Attention
Only supported for Python versions 3.6 and above!
Starting the server¶
You can launch a server to make the api available via:
Subcommand api (for debugging and local use only)
gunicorn, by running
gunicorn -b :8080 benchmarkstt.api.gunicorn
Usage¶
All requests must be HTTP POST requests, with the content containing valid JSON.
Using curl, for example:
curl -X POST \ http://localhost:8080/api \ -H 'Content-Type: application/json-rpc' \ -d '{ "jsonrpc": "2.0", "method": "help", "id": null }'If you started the service with parameter
--with-explorer
(see Subcommand api), you can easily test the available JSON-RPC api calls by visiting the api url (eg. http://localhost:8080/api in the above example).Important
The API explorer is provided as-is, without any tests or code reviews. This is marked as a low-priority feature.
Available JSON-RPC methods¶
Attention
Only supported for Python versions 3.6 and above
version¶ Get the version of benchmarkstt
- return str
BenchmarkSTT version
list.normalization¶ Get a list of available core normalization
- return object
With key being the normalization name, and value its description
normalization.config¶ Use config file notation to define normalization rules. This notation is a list of normalizers, one per line.
Each normalizer that is based needs a file is followed by a file name of a csv, and can be optionally followed by the file encoding (if different than default). All options are loaded in from this csv and applied to the normalizer.
The normalizers can be any of the core normalizers, or you can refer to your own normalizer class (like you would use in a python import, eg. my.own.package.MyNormalizerClass).
- Additional rules:
Normalizer names are case-insensitive.
Arguments MAY be wrapped in double quotes.
If an argument contains a space, newline or double quote, it MUST be wrapped in double quotes.
A double quote itself is represented in this quoted argument as two double quotes:
""
.The normalization rules are applied top-to-bottom and follow this format:
[normalization] # This is a comment # (Normalizer2 has no arguments) lowercase # loads regex expressions from regexrules.csv in "utf 8" encoding regex regexrules.csv "utf 8" # load another config file, [section1] and [section2] config configfile.ini section1 config configfile.ini section2 # loads replace expressions from replaces.csv in default encoding replace replaces.csv
- param file
The config file
- param encoding
The file encoding
- param section
The subsection of the config file to use, defaults to 'normalization'
- example text
"He bravely turned his tail and fled"
- example file
"./resources/test/normalizers/configfile.conf"
- example encoding
"UTF-8"
- example return
"ha bravalY Turnad his tail and flad"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.file¶ Read one per line and pass it to the given normalizer
- param str|class normalizer
Normalizer name (or class)
- param file
The file to read rules from
- param encoding
The file encoding
- example text
"This is an Ex-Parakeet"
- example normalizer
"regex"
- example file
"./resources/test/normalizers/regex/en_US"
- example encoding
"UTF-8"
- example return
"This is an Ex Parrot"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.lowercase¶ Lowercase the text
- example text
"Easy, Mungo, easy... Mungo..."
- example return
"easy, mungo, easy... mungo..."
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.regex¶ Simple regex replace. By default the pattern is interpreted case-sensitive.
Case-insensitivity is supported by adding inline modifiers.
You might want to use capturing groups to preserve the case. When replacing a character not captured, the information about its case is lost...
Eg. would replace "HAHA! Hahaha!" to "HeHe! Hehehe!":
search
replace
(?i)(h)a
\1e
No regex flags are set by default, you can set them yourself though in the regex, and combine them at will, eg. multiline, dotall and ignorecase.
Eg. would replace "New<CRLF>line" to "newline":
search
replace
(?msi)new.line
newline
- example text
"HAHA! Hahaha!"
- example search
'(?i)(h)a'
- example replace
'\1e'
- example return
"HeHe! Hehehe!"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.replace¶ Simple search replace
- param search
Text to search for
- param replace
Text to replace with
- example text
"Nudge nudge!"
- example search
"nudge"
- example replace
"wink"
- example return
"Nudge wink!"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.replacewords¶ Simple search replace that only replaces "words", the first letter will be checked case insensitive as well with preservation of case..
- param search
Word to search for
- param replace
Replace with
- example text
"She has a heart of formica"
- example search
"a"
- example replace
"the"
- example return
"She has the heart of formica"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
normalization.unidecode¶ Unidecode characters to ASCII form, see Python's Unidecode package for more info.
- example text
"𝖂𝖊𝖓𝖓 𝖎𝖘𝖙 𝖉𝖆𝖘 𝕹𝖚𝖓𝖘𝖙ü𝖈𝖐 𝖌𝖎𝖙 𝖚𝖓𝖉 𝕾𝖑𝖔𝖙𝖊𝖗𝖒𝖊𝖞𝖊𝖗?"
- example return
"Wenn ist das Nunstuck git und Slotermeyer?"
- param text
The text to normalize
- param bool return_logs
Return normalization logs
list.metrics¶ Get a list of available core metrics
- return object
With key being the metrics name, and value its description
metrics.beer¶ Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/Lwhich is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_NThe minimum value for weight being 0.
- param ref
Reference text
- param hyp
Hypothesis text
metrics.cer¶ Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference charactersCharacter error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- param mode
'levenshtein' (default).
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
metrics.diffcounts¶ Get the amount of differences between reference and hypothesis
- param ref
Reference text
- param hyp
Hypothesis text
metrics.wer¶ Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference wordsSee: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- param mode
'strict' (default), 'hunt' or 'levenshtein'.
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
metrics.worddiffs¶ Present differences on a per-word basis
- param dialect
Presentation format. Default is 'ansi'.
- example dialect
'html'
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
list.benchmark¶ Get a list of available core benchmark
- return object
With key being the benchmark name, and value its description
benchmark.beer¶ Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/Lwhich is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_NThe minimum value for weight being 0.
- param ref
Reference text
- param hyp
Hypothesis text
- param config
The config to use
- param bool return_logs
Return normalization logs
- example ref
'Hello darkness my OLD friend'
- example hyp
'Hello darkness my old foe'
- example config
[normalization] # using a simple config file Lowercase- example result
""
benchmark.cer¶ Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference charactersCharacter error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- param mode
'levenshtein' (default).
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
- param config
The config to use
- param bool return_logs
Return normalization logs
- example ref
'Hello darkness my OLD friend'
- example hyp
'Hello darkness my old foe'
- example config
[normalization] # using a simple config file Lowercase- example result
""
benchmark.diffcounts¶ Get the amount of differences between reference and hypothesis
- param ref
Reference text
- param hyp
Hypothesis text
- param config
The config to use
- param bool return_logs
Return normalization logs
- example ref
'Hello darkness my OLD friend'
- example hyp
'Hello darkness my old foe'
- example config
[normalization] # using a simple config file Lowercase- example result
""
benchmark.wer¶ Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference wordsSee: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- param mode
'strict' (default), 'hunt' or 'levenshtein'.
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
- param config
The config to use
- param bool return_logs
Return normalization logs
- example ref
'Hello darkness my OLD friend'
- example hyp
'Hello darkness my old foe'
- example config
[normalization] # using a simple config file Lowercase- example result
""
benchmark.worddiffs¶ Present differences on a per-word basis
- param dialect
Presentation format. Default is 'ansi'.
- example dialect
'html'
- param differ_class
For future use.
- param ref
Reference text
- param hyp
Hypothesis text
- param config
The config to use
- param bool return_logs
Return normalization logs
- example ref
'Hello darkness my OLD friend'
- example hyp
'Hello darkness my old foe'
- example config
[normalization] # using a simple config file Lowercase- example result
""
help¶ Returns available api methods
- return object
With key being the method name, and value its description
Development¶
Setting up environment¶
This assumes git
and Python
3.5 or above are already installed on your system (see Installation).
Fork the repository source code from github to your own account.
Clone the repository from github to your local development environment (replace
[YOURUSERNAME]
with your github username):git clone https://github.com/[YOURUSERNAME]/benchmarkstt.git cd benchmarkstt
Create and activate a local environment:
python3 -m pip install venv python3 -m venv env source env/bin/activate
Install the package, this will also install all requirements. This does an "editable" install, i.e. it creates a symbolic link to the source code:
make dev
You now have a local development environment where you can commit and push to your own forked repository. It is recommended to run the tests to check your local copy passes all unit tests:
make test
Warning
The development version of benchmarkstt
and benchmarkstt-tools
is only available in your current venv environment. Make sure to run source env/bin/activate
to activate your local venv before making calls to benchmarkstt
or benchmarkstt-tools
.
Building the documentation¶
First install the dependencies for building the documentation (sphinx, etc.) using:
make setupdocs
This only needs to be done once.
Then to build the documentation locally:
make docs
The documentation will be created in /docs/build/html/
Contributing¶
Contributing¶
[Status: Draft]
This project has a Code of Conduct that we expect all of our contributors to abide by, please check it out before contributing.
Pull requests and branching¶
Before working on a feature always create a new branch first. (or fork the project).
Branches should be short lived, except branches specifically labelled 'experiment'.
Once work is complete push the branch up on to GitHub for review. Make sure your branch is up to date with
master
before making a pull request. Eg. usegit merge origin/master
Once a branch has been merged into
master
, delete it.
master
is never committed to directly unless the change is very trivial or a code review is unnecessary (code formatting or documentation updates for example).
License¶
By contributing to benchmarkstt, you agree that your contributions will be licensed under the :doc:LICENSE.md
file in the root directory of this source tree.
Code of Conduct¶
Status: Draft
We are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, disability, ethnicity, religion, or similar personal characteristic.
We’ve written this code of conduct not because we expect bad behaviour from our community—which, in our experience, is overwhelmingly kind and civil—but because we believe a clear code of conduct is one necessary part of building a respectful community space.
We are committed to providing a welcoming and inspiring community for all and expect our code of conduct to be honored. Anyone who violates this code of conduct may be banned from the community.
Please be kind and courteous. There's no need to be mean or rude. Respect that people have differences of opinion and that every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer, merely an optimal answer given a set of values and circumstances.
Our open community strives to:
Be friendly and patient.
Be considerate: Your work will be used by other people, and you in turn will depend on the work of others. Any decision you take will affect users and colleagues, and you should take those consequences into account when making decisions. Remember that we’re a world-wide community, so you might not be communicating in someone else’s primary language.
Be respectful: Not all of us will agree all the time, but disagreement is no excuse for poor behaviour and poor manners. We might all experience some frustration now and then, but we cannot allow that frustration to impact others. It’s important to remember that a community where people feel uncomfortable or threatened is not a productive one.
Be careful in the words that we choose: we are a community of professionals, and we conduct ourselves professionally. Be kind to others. Do not insult or put down other participants. Harassment and other exclusionary behaviour aren’t acceptable.
Try to understand why we disagree: Disagreements, both social and technical, happen all the time. It is important that we resolve disagreements and differing views constructively. Remember that we’re different. The strength of our community comes from its diversity, people from a wide range of backgrounds. Different people have different perspectives on issues. Being unable to understand why someone holds a viewpoint doesn’t mean that they’re wrong. Don’t forget that it is human to err and blaming each other doesn’t get us anywhere. Instead, focus on helping to resolve issues and learning from mistakes.
What goes around comes a round. We believe in open source, and are excited by what happens when people add value to each others work in a collaborative way.
Take care of each other. Alert a member of the project team if you notice a dangerous situation, someone in distress, or violations of this code of conduct, even if they seem inconsequential.
If any participants engages in harassing behaviour, the project team may take any lawful action we deem appropriate, including but not limited to warning the offender or asking the offender to leave the project.
Diversity Statement¶
We encourage everyone to participate and are committed to building a community for all. Although we will fail at times, we seek to treat everyone both as fairly and equally as possible. Whenever a participant has made a mistake, we expect them to take responsibility for it. If someone has been harmed or offended, it is our responsibility to listen carefully and respectfully, and do our best to right the wrong.
Reporting Issues¶
NOTE: no contact has yet been decided
If you experience or witness unacceptable behaviour—or have any other concerns—please report it by contacting us via [TODO] All reports will be handled with discretion. In your report please include:
Your contact information.
Names (real, nicknames, or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public slack channel), please include a link.
Any additional information that may be helpful.
After filing a report, a representative will contact you personally, review the incident, follow up with any additional questions, and make a decision as to how to respond. If the person who is harassing you is part of the response team, they will recuse themselves from handling your incident. If the complaint originates from a member of the response team, it will be handled by a different member of the response team. We will respect confidentiality requests for the purpose of protecting victims of abuse.
Feedback¶
We welcome your feedback on this and every other aspect of this project and we thank you for working with us to make it a safe, enjoyable, and friendly experience for everyone who participates.
Attribution & Acknowledgements¶
We all stand on the shoulders of giants across many open source communities. We’d like to thank the communities and projects that established code of conducts and diversity statements as our inspiration:
benchmarkstt package¶
Package benchmarkstt
Subpackages¶
benchmarkstt.api package¶
Responsible for providing a JSON-RPC api.
Subpackages¶
-
benchmarkstt.api.entrypoints.benchmark.
callback
(cls, ref: str, hyp: str, config: str = None, return_logs: bool = None, *args, **kwargs)[source]¶ - Parameters
ref -- Reference text
hyp -- Hypothesis text
config -- The config to use
return_logs (bool) -- Return normalization logs
- Example ref
'Hello darkness my OLD friend'
- Example hyp
'Hello darkness my old foe'
- Example config
[normalization] # using a simple config file Lowercase
- Example result
""
Submodules¶
Entry point for a gunicorn server, serves at /api
Make benchmarkstt available through a rudimentary JSON-RPC interface
Warning
Only supported for Python versions 3.6 and above!
-
class
benchmarkstt.api.jsonrpc.
MagicMethods
[source]¶ Bases:
object
-
static
is_safe_path
(path)[source]¶ Determines whether the file or path is within the current working directory
- Parameters
path (str|PathLike) --
- Returns
bool
-
load
(name, module)[source]¶ Load all possible callbacks for a given module
- Parameters
name --
module (Module) --
-
possible_path_args
= ['file', 'path']¶
-
static
benchmarkstt.cli package¶
Responsible for handling the command line tools.
-
class
benchmarkstt.cli.
CustomHelpFormatter
(*args, **kwargs)[source]¶ Bases:
argparse.HelpFormatter
Custom formatter for argparse that allows us to properly display _ActionWithArguments and docblock documentation, as well as allowing newlines inside the description.
-
benchmarkstt.cli.
action_with_arguments
(action, required_args, optional_args)[source]¶ Custom argparse action to support a variable amount of arguments
- Parameters
action -- name of the action
required_args (list) -- required arguments
optional_args (list) -- optional arguments
- Return type
ActionWithArguments
Subpackages¶
Make benchmarkstt available through a rudimentary JSON-RPC interface
Attention
Only supported for Python versions 3.6 and above
-
benchmarkstt.cli.entrypoints.api.
argparser
(parser)[source]¶ Adds the help and arguments specific to this module
Do a complete flow of input -> normalization -> segmentation -> metrics
Calculate metrics based on the comparison of a hypothesis with a reference.
Apply normalization to given input
-
benchmarkstt.cli.entrypoints.normalization.
argparser
(parser: argparse.ArgumentParser)[source]¶ Adds the help and arguments specific to this module
Submodules¶
benchmarkstt.diff package¶
Responsible for calculating differences.
-
class
benchmarkstt.diff.
Differ
(a, b)[source]¶ Bases:
abc.ABC
-
abstract
get_opcodes
()[source]¶ Return list of 5-tuples describing how to turn a into b.
Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 equals the i2 from the tuple preceding it, and likewise for j1 equals the previous j2.
The tags are strings, with these meanings:
'replace': a[i1:i2] should be replaced by b[j1:j2]
'delete': a[i1:i2] should be deleted. Note that j1==j2 in this case.
'insert': b[j1:j2] should be inserted at a[i1:i1]. Note that i1==i2 in this case.
'equal': a[i1:i2] == b[j1:j2]
-
abstract
Submodules¶
Core Diff algorithms
-
class
benchmarkstt.diff.core.
RatcliffObershelp
(a, b, **kwargs)[source]¶ Bases:
benchmarkstt.diff.Differ
Diff according to Ratcliff and Obershelp (Gestalt) matching algorithm.
From difflib.SequenceMatcher (Copyright 2001-2020, Python Software Foundation.)
SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people.
-
get_opcodes
()[source]¶ Return list of 5-tuples describing how to turn a into b.
Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 equals the i2 from the tuple preceding it, and likewise for j1 equals the previous j2.
The tags are strings, with these meanings:
'replace': a[i1:i2] should be replaced by b[j1:j2]
'delete': a[i1:i2] should be deleted. Note that j1==j2 in this case.
'insert': b[j1:j2] should be inserted at a[i1:i1]. Note that i1==i2 in this case.
'equal': a[i1:i2] == b[j1:j2]
-
-
class
benchmarkstt.diff.formatter.
ANSIDiffDialect
(show_color_key=None)[source]¶ Bases:
benchmarkstt.diff.formatter.Dialect
-
delete_format
= '\x1b[31m%s\x1b[0m'¶
-
insert_format
= '\x1b[32m%s\x1b[0m'¶
-
-
class
benchmarkstt.diff.formatter.
Dialect
[source]¶ Bases:
object
-
delete_format
= '%s'¶
-
equal_format
= '%s'¶
-
insert_format
= '%s'¶
-
preprocessor
= None¶
-
replace_format
= None¶
-
property
stream
¶
-
-
class
benchmarkstt.diff.formatter.
DiffFormatter
(dialect=None, *args, **kwargs)[source]¶ Bases:
object
-
diff_dialects
= {'ansi': <class 'benchmarkstt.diff.formatter.ANSIDiffDialect'>, 'html': <class 'benchmarkstt.diff.formatter.HTMLDiffDialect'>, 'json': <class 'benchmarkstt.diff.formatter.JSONDiffDialect'>, 'list': <class 'benchmarkstt.diff.formatter.ListDialect'>, 'rst': <class 'benchmarkstt.diff.formatter.RestructuredTextDialect'>, 'text': <class 'benchmarkstt.diff.formatter.UTF8Dialect'>}¶
-
-
class
benchmarkstt.diff.formatter.
HTMLDiffDialect
[source]¶ Bases:
benchmarkstt.diff.formatter.Dialect
-
delete_format
= '<span class="delete">%s</span>'¶
-
insert_format
= '<span class="insert">%s</span>'¶
-
-
class
benchmarkstt.diff.formatter.
ListDialect
[source]¶ Bases:
benchmarkstt.diff.formatter.Dialect
-
delete_format
(txt)[source]¶ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
-
equal_format
(txt)[source]¶ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
-
insert_format
(txt)[source]¶ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
-
-
class
benchmarkstt.diff.formatter.
RestructuredTextDialect
(show_color_key=None)[source]¶ Bases:
benchmarkstt.diff.formatter.ANSIDiffDialect
-
delete_format
= '\\ :diffdelete:`%s`\\ '¶
-
insert_format
= '\\ :diffinsert:`%s`\\ '¶
-
-
class
benchmarkstt.diff.formatter.
UTF8Dialect
[source]¶ Bases:
benchmarkstt.diff.formatter.Dialect
-
delete_format
(txt)[source]¶ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
-
insert_format
(txt)[source]¶ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
-
benchmarkstt.input package¶
Responsible for dealing with input formats and converting them to benchmarkstt native schema
Submodules¶
Default input formats
-
class
benchmarkstt.input.core.
File
(file, input_type=None, normalizer=None)[source]¶ Bases:
benchmarkstt.input.Input
Load from a given filename.
-
class
benchmarkstt.input.core.
PlainText
(text, normalizer=None, segmenter=None)[source]¶ Bases:
benchmarkstt.input.Input
Plain text.
benchmarkstt.metrics package¶
Responsible for calculating metrics.
-
class
benchmarkstt.metrics.
Metric
[source]¶ Bases:
abc.ABC
Base class for metrics
-
abstract
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
abstract
Submodules¶
-
class
benchmarkstt.metrics.core.
BEER
(entities_file=None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Bag of Entities Error Rate, BEER, is defined as the error rate per entity with a bag of words approach:
abs(ne_hyp - ne_ref) BEER (entity) = ---------------------- ne_ref
ne_hyp = number of detections of the entity in the hypothesis file
ne_ref = number of detections of the entity in the reference file
The WA_BEER for a set of N entities is defined as the weighted average of the BEER for the set of entities:
WA_BEER ([entity_1, ... entity_N) = w_1*BEER (entity_1)*L_1/L + ... + w_N*BEER (entity_N))*L_N/L
which is equivalent to:
w_1*abs(ne_hyp_1 - ne_ref_1) + ... + w_N*abs(ne_hyp_N - ne_ref_N) WA_BEER ([entity_1, ... entity_N) = ------------------------------------------------------------------ L
L_1 = number of occurrences of entity 1 in the reference document
L = L_1 + ... + L_N
the weights being normalised by the tool:
w_1 + ... + w_N = 1
The input file defines the list of entities and the weight per entity, w_n. It is processed as a json file with the following structure:
{ "entity_1":W_1, "entity_2" : W_2, "entity_3" :W_3 .. }
W_n being the non-normalized weight, the normalization of the weights is performed by the tool as:
W_n w_n = --------------- W_1 + ... +W_N
The minimum value for weight being 0.
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
class
benchmarkstt.metrics.core.
CER
(mode=None, differ_class=None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Character Error Rate, basically defined as:
insertions + deletions + substitutions -------------------------------------- number of reference characters
Character error rate, CER, compare the differences between reference and hypothesis on a character level. A CER measure is usually lower than WER measure, since words might differ on only one or a few characters, and be classified as fully different.
The CER metric might be useful as a perspective on the WER metric. Word endings might be less relevant if the text will be preprocessed with stemming, or minor spelling mistakes might be acceptable in certain situations. A CER metric might also be used to evaluate a source (an ASR) which output a stream of characters rather than words.
Important: The current implementation of the CER metric ignores whitespace characters. A string like 'aa bb cc' will first be split into words, ['aa','bb','cc'], and then merged into a final string for evaluation: 'aabbcc'.
- Parameters
mode -- 'levenshtein' (default).
differ_class -- For future use.
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
class
benchmarkstt.metrics.core.
DiffCounts
(mode=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Get the amount of differences between reference and hypothesis
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
-
class
benchmarkstt.metrics.core.
OpcodeCounts
(equal, replace, insert, delete)¶ Bases:
tuple
-
property
delete
¶ Alias for field number 3
-
property
equal
¶ Alias for field number 0
-
property
insert
¶ Alias for field number 2
-
property
replace
¶ Alias for field number 1
-
property
-
class
benchmarkstt.metrics.core.
WER
(mode=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Word Error Rate, basically defined as:
insertions + deletions + substitions ------------------------------------ number of reference words
See: https://en.wikipedia.org/wiki/Word_error_rate
Calculates the WER using one of two algorithms:
[Mode: 'strict' or 'hunt'] Insertions, deletions and substitutions are identified using the Hunt–McIlroy diff algorithm. The 'hunt' mode applies 0.5 weight to insertions and deletions. This algorithm is the one used internally by Python.
See https://docs.python.org/3/library/difflib.html
[Mode: 'levenshtein'] In the context of WER, Levenshtein distance is the minimum edit distance computed at the word level. This implementation uses the Editdistance c++ implementation by Hiroyuki Tanaka: https://github.com/aflc/editdistance. See: https://en.wikipedia.org/wiki/Levenshtein_distance
- Parameters
mode -- 'strict' (default), 'hunt' or 'levenshtein'.
differ_class -- For future use.
-
DEL_PENALTY
= 1¶
-
INS_PENALTY
= 1¶
-
MODE_HUNT
= 'hunt'¶
-
MODE_LEVENSHTEIN
= 'levenshtein'¶
-
MODE_STRICT
= 'strict'¶
-
SUB_PENALTY
= 1¶
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema) → float[source]¶
-
class
benchmarkstt.metrics.core.
WordDiffs
(dialect=None, differ_class: benchmarkstt.diff.Differ = None)[source]¶ Bases:
benchmarkstt.metrics.Metric
Present differences on a per-word basis
- Parameters
dialect -- Presentation format. Default is 'ansi'.
differ_class -- For future use.
- Example dialect
'html'
-
compare
(ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema)[source]¶
-
benchmarkstt.metrics.core.
get_differ
(a, b, differ_class: benchmarkstt.diff.Differ)[source]¶
-
benchmarkstt.metrics.core.
get_opcode_counts
(opcodes) → benchmarkstt.metrics.core.OpcodeCounts[source]¶
benchmarkstt.normalization package¶
Responsible for normalization of text.
-
class
benchmarkstt.normalization.
File
(normalizer, file, encoding=None, path=None)[source]¶ Bases:
benchmarkstt.normalization.Normalizer
Read one per line and pass it to the given normalizer
- Parameters
normalizer (str|class) -- Normalizer name (or class)
file -- The file to read rules from
encoding -- The file encoding
- Example text
"This is an Ex-Parakeet"
- Example normalizer
"regex"
- Example file
"./resources/test/normalizers/regex/en_US"
- Example encoding
"UTF-8"
- Example return
"This is an Ex Parrot"
-
class
benchmarkstt.normalization.
NormalizationAggregate
(title=None)[source]¶ Bases:
benchmarkstt.normalization.Normalizer
Combining normalizers
-
class
benchmarkstt.normalization.
Normalizer
[source]¶ Bases:
benchmarkstt.normalization._NormalizerNoLogs
Abstract base class for normalization
-
normalize
(text)¶ Returns normalized text with rules supplied by the called class.
-
-
class
benchmarkstt.normalization.
NormalizerWithFileSupport
[source]¶ Bases:
benchmarkstt.normalization.Normalizer
This kind of normalization class supports loading the values from a file, i.e. being wrapped in a core.File wrapper.
Submodules¶
Some basic/simple normalization classes
-
class
benchmarkstt.normalization.core.
Config
(file, section=None, encoding=None)[source]¶ Bases:
benchmarkstt.normalization.Normalizer
Use config file notation to define normalization rules. This notation is a list of normalizers, one per line.
Each normalizer that is based needs a file is followed by a file name of a csv, and can be optionally followed by the file encoding (if different than default). All options are loaded in from this csv and applied to the normalizer.
The normalizers can be any of the core normalizers, or you can refer to your own normalizer class (like you would use in a python import, eg. my.own.package.MyNormalizerClass).
- Additional rules:
Normalizer names are case-insensitive.
Arguments MAY be wrapped in double quotes.
If an argument contains a space, newline or double quote, it MUST be wrapped in double quotes.
A double quote itself is represented in this quoted argument as two double quotes:
""
.
The normalization rules are applied top-to-bottom and follow this format:
[normalization] # This is a comment # (Normalizer2 has no arguments) lowercase # loads regex expressions from regexrules.csv in "utf 8" encoding regex regexrules.csv "utf 8" # load another config file, [section1] and [section2] config configfile.ini section1 config configfile.ini section2 # loads replace expressions from replaces.csv in default encoding replace replaces.csv
- Parameters
file -- The config file
encoding -- The file encoding
section -- The subsection of the config file to use, defaults to 'normalization'
- Example text
"He bravely turned his tail and fled"
- Example file
"./resources/test/normalizers/configfile.conf"
- Example encoding
"UTF-8"
- Example return
"ha bravalY Turnad his tail and flad"
-
MAIN_SECTION
= <object object>¶
-
exception
benchmarkstt.normalization.core.
ConfigSectionNotFoundError
[source]¶ Bases:
ValueError
Raised when a requested config section was not found
-
class
benchmarkstt.normalization.core.
Lowercase
[source]¶ Bases:
benchmarkstt.normalization.Normalizer
Lowercase the text
- Example text
"Easy, Mungo, easy... Mungo..."
- Example return
"easy, mungo, easy... mungo..."
-
class
benchmarkstt.normalization.core.
Regex
(search: str, replace: str)[source]¶ Bases:
benchmarkstt.normalization.NormalizerWithFileSupport
Simple regex replace. By default the pattern is interpreted case-sensitive.
Case-insensitivity is supported by adding inline modifiers.
You might want to use capturing groups to preserve the case. When replacing a character not captured, the information about its case is lost...
Eg. would replace "HAHA! Hahaha!" to "HeHe! Hehehe!":
search
replace
(?i)(h)a
\1e
No regex flags are set by default, you can set them yourself though in the regex, and combine them at will, eg. multiline, dotall and ignorecase.
Eg. would replace "New<CRLF>line" to "newline":
search
replace
(?msi)new.line
newline
- Example text
"HAHA! Hahaha!"
- Example search
'(?i)(h)a'
- Example replace
'\1e'
- Example return
"HeHe! Hehehe!"
-
class
benchmarkstt.normalization.core.
Replace
(search: str, replace: str)[source]¶ Bases:
benchmarkstt.normalization.NormalizerWithFileSupport
Simple search replace
- Parameters
search -- Text to search for
replace -- Text to replace with
- Example text
"Nudge nudge!"
- Example search
"nudge"
- Example replace
"wink"
- Example return
"Nudge wink!"
-
class
benchmarkstt.normalization.core.
ReplaceWords
(search: str, replace: str)[source]¶ Bases:
benchmarkstt.normalization.NormalizerWithFileSupport
Simple search replace that only replaces "words", the first letter will be checked case insensitive as well with preservation of case..
- Parameters
search -- Word to search for
replace -- Replace with
- Example text
"She has a heart of formica"
- Example search
"a"
- Example replace
"the"
- Example return
"She has the heart of formica"
-
class
benchmarkstt.normalization.core.
Unidecode
[source]¶ Bases:
benchmarkstt.normalization.Normalizer
Unidecode characters to ASCII form, see Python's Unidecode package for more info.
- Example text
"𝖂𝖊𝖓𝖓 𝖎𝖘𝖙 𝖉𝖆𝖘 𝕹𝖚𝖓𝖘𝖙ü𝖈𝖐 𝖌𝖎𝖙 𝖚𝖓𝖉 𝕾𝖑𝖔𝖙𝖊𝖗𝖒𝖊𝖞𝖊𝖗?"
- Example return
"Wenn ist das Nunstuck git und Slotermeyer?"
-
class
benchmarkstt.normalization.logger.
DiffLoggingDictFormatterDialect
[source]¶ Bases:
benchmarkstt.normalization.logger.DiffLoggingFormatterDialect
-
class
benchmarkstt.normalization.logger.
DiffLoggingFormatter
(dialect=None, diff_formatter_dialect=None, title=None, *args, **kwargs)[source]¶ Bases:
logging.Formatter
-
diff_logging_formatter_dialects
= {'dict': <class 'benchmarkstt.normalization.logger.DiffLoggingDictFormatterDialect'>, 'text': <class 'benchmarkstt.normalization.logger.DiffLoggingTextFormatterDialect'>}¶
-
format
(record)[source]¶ Format the specified record as text.
The record's attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
-
-
class
benchmarkstt.normalization.logger.
DiffLoggingTextFormatterDialect
[source]¶ Bases:
benchmarkstt.normalization.logger.DiffLoggingFormatterDialect
-
class
benchmarkstt.normalization.logger.
ListHandler
[source]¶ Bases:
logging.StreamHandler
-
emit
(record)[source]¶ Emit a record.
If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an 'encoding' attribute, it is used to determine how to do the output to the stream.
-
property
logs
¶
-
-
class
benchmarkstt.normalization.logger.
LogCapturer
(*args, **kwargs)[source]¶ Bases:
object
-
property
logs
¶
-
property
benchmarkstt.output package¶
Responsible for dealing with output formats
-
class
benchmarkstt.output.
SimpleTextBase
[source]¶ Bases:
benchmarkstt.output.Output
Submodules¶
-
class
benchmarkstt.output.core.
Json
[source]¶ Bases:
benchmarkstt.output.Output
benchmarkstt.segmentation package¶
Responsible for segmenting text.
Submodules¶
Core segmenters, each segmenter must be Iterable returning a Item
-
class
benchmarkstt.segmentation.core.
Simple
(text: str, pattern='[\\n\\t\\s]+', normalizer=None)[source]¶ Bases:
benchmarkstt.segmentation.Segmenter
Simplest case, split into words by white space
Submodules¶
benchmarkstt.config module¶
benchmarkstt.csv module¶
Module providing a custom CSV file parser with support for whitespace trimming, empty lines filtering and comment lines
-
exception
benchmarkstt.csv.
CSVParserError
(message, line, char, index)[source]¶ Bases:
ValueError
Some error occured while attempting to parse the file
-
class
benchmarkstt.csv.
DefaultDialect
[source]¶ Bases:
benchmarkstt.csv.Dialect
-
commentchar
= '#'¶
-
delimiter
= ','¶
-
ignoreemptylines
= True¶
-
quotechar
= '"'¶
-
trimleft
= ' \t\n\r'¶
-
trimright
= ' \t\n\r'¶
-
-
class
benchmarkstt.csv.
Dialect
[source]¶ Bases:
object
-
commentchar
= None¶
-
delimiter
= None¶
-
quotechar
= None¶
-
trimleft
= None¶
-
trimright
= None¶
-
-
exception
benchmarkstt.csv.
InvalidDialectError
[source]¶ Bases:
ValueError
An invalid dialect was supplied
-
class
benchmarkstt.csv.
Reader
(file: TextIO, dialect: benchmarkstt.csv.Dialect, debug=None)[source]¶ Bases:
object
CSV-like file reader with support for comment chars, ignoring empty lines and whitespace trimming on both sides of each field.
-
exception
benchmarkstt.csv.
UnallowedQuoteError
(message, line, char, index)[source]¶ Bases:
benchmarkstt.csv.CSVParserError
A quote is not allowed there
-
exception
benchmarkstt.csv.
UnclosedQuoteError
(message, line, char, index)[source]¶ Bases:
benchmarkstt.csv.CSVParserError
A quote wasn't properly closed
-
exception
benchmarkstt.csv.
UnknownDialectError
[source]¶ Bases:
ValueError
An unknown dialect was requested
-
class
benchmarkstt.csv.
WhitespaceDialect
[source]¶ Bases:
benchmarkstt.csv.DefaultDialect
-
delimiter
= ' \t'¶
-
-
benchmarkstt.csv.
reader
(file: TextIO, dialect: Union[None, str, benchmarkstt.csv.Dialect] = None, **kwargs) → benchmarkstt.csv.Reader[source]¶
benchmarkstt.decorators module¶
-
benchmarkstt.decorators.
log_call
(logger: logging.Logger, log_level=None, result=None)[source]¶ Decorator to log all calls to decorated function to given logger
>>> import logging, sys, io >>> >>> logger = logging.getLogger('logger_name') >>> logger.setLevel(logging.DEBUG) >>> ch = logging.StreamHandler(sys.stdout) >>> ch.setFormatter(logging.Formatter('%(levelname)s:%(name)s: %(message)s')) >>> logger.addHandler(ch) >>> >>> @log_call(logger, logging.WARNING) ... def test(*args, **kwargs): ... return 'result' >>> test('arg1', arg2='someval', arg3='someotherval') WARNING:logger_name: test('arg1', arg2='someval', arg3='someotherval') 'result' >>> @log_call(logger, result=True) ... def test(*args, **kwargs): ... return 'result' >>> test(arg2='someval', arg3='someotherval') DEBUG:logger_name: test(arg2='someval', arg3='someotherval') DEBUG:logger_name: test returned: result 'result'
benchmarkstt.deferred module¶
benchmarkstt.docblock module¶
-
class
benchmarkstt.docblock.
Docblock
(docs, params, result, result_type)¶ Bases:
tuple
-
property
docs
¶ Alias for field number 0
-
property
params
¶ Alias for field number 1
-
property
result
¶ Alias for field number 2
-
property
result_type
¶ Alias for field number 3
-
property
-
class
benchmarkstt.docblock.
DocblockParam
(name, type, value)¶ Bases:
tuple
-
property
name
¶ Alias for field number 0
-
property
type
¶ Alias for field number 1
-
property
value
¶ Alias for field number 2
-
property
-
class
benchmarkstt.docblock.
Param
(name, type, type_doc, is_required, description, examples)¶ Bases:
tuple
-
property
description
¶ Alias for field number 4
-
property
examples
¶ Alias for field number 5
-
property
is_required
¶ Alias for field number 3
-
property
name
¶ Alias for field number 0
-
property
type
¶ Alias for field number 1
-
property
type_doc
¶ Alias for field number 2
-
property
-
class
benchmarkstt.docblock.
TextWriter
[source]¶ Bases:
docutils.writers.Writer
-
translate
()[source]¶ Do final translation of self.document into self.output. Called from write. Override in subclasses.
Usually done with a docutils.nodes.NodeVisitor subclass, in combination with a call to docutils.nodes.Node.walk() or docutils.nodes.Node.walkabout(). The
NodeVisitor
subclass must support all standard elements (listed in docutils.nodes.node_class_names) and possibly non-standard elements used by the current Reader as well.
-
benchmarkstt.factory module¶
-
class
benchmarkstt.factory.
ClassConfig
(name, cls, docs, optional_args, required_args)[source]¶ Bases:
benchmarkstt.factory.ClassConfigTuple
-
property
docs
¶ Alias for field number 2
-
property
-
class
benchmarkstt.factory.
Factory
(base_class, namespaces=None, methods=None)[source]¶ Bases:
benchmarkstt.registry.Registry
Factory class with auto-loading of namespaces according to a base class.
-
is_valid
(tocheck)[source]¶ Checks that tocheck is a valid class extending base_class
- Parameters
tocheck -- The class to check
- Return type
bool
-
static
normalize_class_name
(clsname)[source]¶ Normalizes the class name for automatic lookup of a class, by default this means lowercasing the class name, but may be overrided by a child class.
- Parameters
clsname -- The class name
- Returns
The normalized class name
- Return type
str
-
benchmarkstt.helpers module¶
Some helper methods that can be re-used across submodules
benchmarkstt.modules module¶
benchmarkstt.registry module¶
benchmarkstt.schema module¶
Defines the main schema for comparison and implements json serialization
-
class
benchmarkstt.schema.
Item
(*args, **kwargs)[source]¶ Bases:
collections.abc.Mapping
Basic structure of each field to compare
- Raises
ValueError, SchemaInvalidItemError
-
class
benchmarkstt.schema.
JSONDecoder
(*args, **kwargs)[source]¶ Bases:
json.decoder.JSONDecoder
Custom JSON decoding for schema
-
class
benchmarkstt.schema.
JSONEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Bases:
json.encoder.JSONEncoder
Custom JSON encoding for schema
-
default
(o)[source]¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
-
class
benchmarkstt.schema.
Meta
[source]¶ Bases:
collections.defaultdict
Containing metadata for an item, such as skipped
-
class
benchmarkstt.schema.
Schema
(data=None)[source]¶ Bases:
object
Basically a list of
Item
-
append
(obj: benchmarkstt.schema.Item)[source]¶
-
-
exception
benchmarkstt.schema.
SchemaError
[source]¶ Bases:
ValueError
Top Error class for all schema related exceptions
-
exception
benchmarkstt.schema.
SchemaInvalidItemError
[source]¶ Bases:
benchmarkstt.schema.SchemaError
Attempting to add an invalid item
-
exception
benchmarkstt.schema.
SchemaJSONError
[source]¶ Bases:
benchmarkstt.schema.SchemaError
When loading incompatible JSON
Changelog¶
[Unreleased]¶
Added¶
Documentation:
add auto-generated UML diagrams
add tutorial Jupyter Notebooks
add support for loading external/local code (--load) #142
Tests:
add python 3.8 to github workflow, re-enable excluded python versions
Changed¶
Cleanup/refactors:
group cli and api entrypoints in their respective packages
moved all documentation specific code outside main package
update sphinx to latest
use more descriptive names for Base classes (Normalizer, Differ, etc.)
rename CLIDiffDialect to ANSIDiffDialect, "cli" -> "ansi"
rename NormalizationComposite -> NormalizationAggregate
allow ducktyped custom classes to be recognized as valid
proper abstract base classes
Documentation:
custom autodoc templates
Normalizer Unidecode and dependency 'Unidecode>=1.1.0' replaced by version working for python3.9
Fixed¶
Makefile:
ensure pip is installed (in some cases needed for development, avoids user confusion)
use environment python if available, otherwise use python3
Dockerfile:
fixed missing python package by specifying its version #138
1.0.0 - 2020-04-23¶
Initial version