Normalisation

Normalisation is the first step in both template storage and utterance matching. It ensures that superficial textual differences (apostrophe variants, double spaces, case) do not prevent a correct fuzzy match. All normalisation utilities are in nebulento/bracket_expansion.py.

Functions

`normalize_utterance(text)`

nebulento/bracket_expansion.py:86

Normalise a plain query utterance before matching. Applied to every utterance by IntentContainer._norm().

Steps:

Replace all apostrophe variants with a single space (_drop_apostrophes).
Collapse runs of whitespace to a single space and strip leading/trailing whitespace (_normalize_whitespace).

Does not touch entity placeholder syntax — there are no {...} tokens in a plain utterance.

from nebulento.bracket_expansion import normalize_utterance

normalize_utterance("it's  fine")
# "it s fine"

normalize_utterance("  hello   world  ")
# "hello world"

`normalize_example(example)`

nebulento/bracket_expansion.py:69

Normalise a training template for storage. Applied to each line before expand_template in add_intent.

Steps (in order):

clean_braces(example) — {{entity}} → {entity}
translate_padatious(example) — :0 → {word0}, {word1}, …
_drop_apostrophes(text) — all apostrophe variants → space
_normalize_whitespace(text) — collapse whitespace

Entity placeholders ({name}) are preserved through all steps.

from nebulento.bracket_expansion import normalize_example

normalize_example("it's {{item}}  please")
# "it s {item} please"

normalize_example("set :0 timer")
# "set {word0} timer"

`translate_padatious(example)`

nebulento/bracket_expansion.py:45

Convert Padatious :0 word-slot tokens to numbered {wordN} entity placeholders. Numbering is per-line; each :0 within a single call increments the counter.

from nebulento.bracket_expansion import translate_padatious

translate_padatious("set a timer for :0 minutes")
# 'set a timer for {word0} minutes'

translate_padatious("play :0 by :0")
# 'play {word0} by {word1}'

# No-op when :0 is absent:
translate_padatious("hello world")
# 'hello world'

`clean_braces(example)`

nebulento/bracket_expansion.py:33

Normalise accidental double-braces. This is a guard against template strings copied from Python f-string contexts where {{ is the escaped form of {.

from nebulento.bracket_expansion import clean_braces

clean_braces("buy {{item}} today")
# 'buy {item} today'

Apostrophe Normalisation

nebulento/bracket_expansion.py:9-19

All eight apostrophe-like Unicode characters are replaced with a single ASCII space:

Character	Unicode	Name
`'`	U+0027	ASCII apostrophe
`'`	U+2019	RIGHT SINGLE QUOTATION MARK
`'`	U+2018	LEFT SINGLE QUOTATION MARK
`ʼ`	U+02BC	MODIFIER LETTER APOSTROPHE
`ʹ`	U+02B9	MODIFIER LETTER PRIME
`	U+0060	GRAVE ACCENT
`´`	U+00B4	ACUTE ACCENT
`＇`	U+FF07	FULLWIDTH APOSTROPHE

The effect is that contractions like "it's", "I'm", "don't" are split into two tokens: "it s", "i m", "don t". Both the training template and the utterance go through identical normalisation, so contractions match consistently regardless of which apostrophe variant the user typed or the STT produced.

from nebulento.bracket_expansion import normalize_utterance

normalize_utterance("I don't want it")
# 'I don t want it'

normalize_utterance("it’s fine")   # RIGHT SINGLE QUOTATION MARK
# 'it s fine'

Case Handling

nebulento/container.py:61-69 — IntentContainer._norm(text)

_norm applies normalize_utterance and then optionally lowercases:

def _norm(self, text: str) -> str:
    text = normalize_utterance(text)
    if self.ignore_case:
        text = text.lower()
    return text

ignore_case=True (the default) means all comparisons are case-insensitive. Template strings are stored lowercase; utterances are lowercased before comparison.

To preserve case sensitivity (e.g. for code or acronym matching):

container = IntentContainer(ignore_case=False)

Whitespace Collapsing

nebulento/bracket_expansion.py:28-30 — _normalize_whitespace(text)

Any run of one or more whitespace characters (spaces, tabs, newlines) is collapsed to a single ASCII space, and leading/trailing whitespace is stripped.

from nebulento.bracket_expansion import _normalize_whitespace

_normalize_whitespace("  hello   world\n")
# 'hello world'

This happens after apostrophe replacement, so the space introduced by ' → is subject to collapse:

normalize_utterance("it's  great")
# Step 1: "it s  great"  (apostrophe → space)
# Step 2: "it s great"   (double space collapsed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalisation

Functions

`normalize_utterance(text)`

`normalize_example(example)`

`translate_padatious(example)`

`clean_braces(example)`

Apostrophe Normalisation

Case Handling

Whitespace Collapsing

FilesExpand file tree

normalisation.md

Latest commit

History

normalisation.md

File metadata and controls

Normalisation

Functions

normalize_utterance(text)

normalize_example(example)

translate_padatious(example)

clean_braces(example)

Apostrophe Normalisation

Case Handling

Whitespace Collapsing

`normalize_utterance(text)`

`normalize_example(example)`

`translate_padatious(example)`

`clean_braces(example)`