Profanity filtering primitives for composable text moderation.
Add the GitHub Packages registry for the @textfilters scope:
@textfilters:registry=https://npm.pkg.github.comInstall with GitHub npm authentication configured. GitHub Packages requires authentication for npm installs, including public packages.
npm install @textfilters/core @textfilters/profanityimport { createProfanityFilter, filter } from "@textfilters/profanity";
const safeText = filter.censor("message text");
const hasProfanity = filter.check("message text");
const matches = filter.analyze("message text");
const tenantFilter = createProfanityFilter(["strict-term"], ["loose-term"]);
const tenantSafeText = tenantFilter.censor("message text");The default shared instance is exported as filter and uses the built-in strict
and loose term lists. It is mutable through setStrict, setLoose,
addStrict, and addLoose, so changes affect later calls that use the same
shared instance.
Use createProfanityFilter(...) when per-request, per-tenant, or test-local
dictionaries must be isolated from the shared mutable filter.
Returns accepted match ranges as UTF-16 offsets into the original input. Each
range is an array-like [start, end] value with mode and optional rule
metadata:
const matches = filter.analyze("blocked text");
for (const match of matches) {
console.log(match[0], match[1], match.mode);
console.log(match.ruleId, match.category, match.severity);
}ruleId, category, and severity are present when the matched rule has
taxonomy metadata. Built-in Russian dictionary rules include semantic rule ids
and taxonomy metadata. Runtime string terms remain unclassified and omit those
fields unless callers provide structured runtime rules with metadata.
Taxonomy options can narrow matches to rules with specific metadata:
const vulgarMatches = filter.analyze("blocked text", {
categories: ["VULGAR"],
});
const highSeverityMatches = filter.analyze("blocked text", {
severities: ["high"],
});
const mediumOrHigherMatches = filter.analyze("blocked text", {
minSeverity: "medium",
});
const hasHighSeverityMatch = filter.check("blocked text", {
severities: ["high"],
});
const censoredVulgarText = filter.censor("blocked text", {
categories: ["VULGAR"],
minSeverity: "low",
});Severity thresholds use this package-defined order:
soft < low < medium < high. minSeverity matches rules whose severity is
equal to or stronger than the requested threshold, and applies only to
taxonomy-metadata-backed rules. When both severities and minSeverity are
provided, a match must satisfy the exact severity set and the threshold
intersection. When categories is combined with severity filters, a match must
satisfy every requested taxonomy filter.
Taxonomy metadata-backed filters only match rules where the requested metadata is available. Omitting taxonomy options preserves the default matching behavior.
The taxonomy filtering contract is:
categories,severities, andminSeverityare exposed onProfanityMatchOptions.- Calls without taxonomy options keep the same default
analyze(),check(), andcensor()behavior. - Taxonomy filters exclude metadata-less string-backed matches.
categoriescombined withseveritiesis an intersection.categoriescombined withminSeverityis an intersection.severitiescombined withminSeverityis the intersection between the exact severity set and the threshold.- The severity order is
soft < low < medium < high.
For taxonomy-backed rules, runtime match output includes the available metadata:
const strict = createProfanityFilter(
[{ source: "абв", category: "STRONG_INSULT", severity: "medium" }],
[],
);
strict.analyze("абв ok");
// [Object.assign([0, 3], {
// mode: "strict",
// category: "STRONG_INSULT",
// severity: "medium",
// })]Returns a censored copy of text. Matching is performed on a normalized
same-length copy of the input, and mask ranges are applied back to the original
UTF-16 string. Taxonomy options censor only matching metadata-backed ranges.
Returns true when the current filter instance would censor at least one range.
Use this when a boolean moderation decision is enough and the masked text is not
needed. Taxonomy options apply the same match narrowing as analyze().
Creates a new mutable filter instance. Without arguments it uses compiled views of the built-in Russian dictionary. Passing arrays replaces that side with runtime dictionary terms:
const strictOnly = createProfanityFilter(["blocked"], []);
const looseOnly = createProfanityFilter([], ["banned"]);
const builtIn = createProfanityFilter();All filter instances expose stable name: "profanity" plus check, censor,
analyze, setStrict, setLoose, addStrict, and addLoose.
The package exports a minimal language dictionary API for callers that need an isolated filter built from a maintained language dictionary:
import {
createProfanityFilterFromDictionary,
russianProfanityDictionary,
validateProfanityLanguageDictionary,
type ProfanityLanguageDictionary,
} from "@textfilters/profanity";
const dictionary: ProfanityLanguageDictionary = russianProfanityDictionary;
const issues = validateProfanityLanguageDictionary(dictionary);
const russianFilter = createProfanityFilterFromDictionary(dictionary);
if (issues.length > 0) {
throw new Error(JSON.stringify(issues, null, 2));
}
russianFilter.analyze("message text");createProfanityFilterFromDictionary(dictionary) compiles strict and loose
views from the dictionary and returns a mutable ProfanityFilter instance. The
instance is isolated from the shared filter export, so later calls to
setStrict, setLoose, addStrict, or addLoose affect only that instance.
Dictionary-backed matches preserve semantic rule ids, categories, and
severities in analyze() output, and taxonomy filters apply to those metadata
fields. Runtime dictionary terms remain normalized literals; language
dictionaries are the supported boundary for maintained language-specific rule
data. This release intentionally keeps the public surface small and does not
add new languages or separate packages.
validateProfanityLanguageDictionary(dictionary) checks the source dictionary
contract and returns stable issues with path, code, and message fields.
Valid dictionaries return []; ordinary validation errors are reported as
issues instead of thrown exceptions. The validator does not judge moderation
quality, false-positive behavior, language coverage, taxonomy choices, or
whether a rule should exist.
The package also includes a small CLI for validating a JSON source dictionary:
profanity-validate-language-dictionary path/to/profanity.jsonThe command exits 0 for valid dictionaries, 1 when validation issues are
found, and 2 for usage, file read, or JSON parse errors. Validation issue
output includes the same stable path, code, and message fields as the
programmatic validator.
Text output is the default:
Dictionary validation failed:
- rules[0].source source_not_trimmed: Rule source must not include leading or trailing whitespace.
Machine-readable JSON output is available for CI and authoring tools:
profanity-validate-language-dictionary --format json --pretty path/to/profanity.jsonThe JSON report always includes ok, file, issueCount, issues, and
summary. Validation failures exit 1 and print the report to stdout with
stable issue objects:
{
"ok": false,
"file": "path/to/profanity.json",
"issueCount": 1,
"issues": [
{
"path": "rules[0].source",
"code": "source_not_trimmed",
"message": "Rule source must not include leading or trailing whitespace."
}
],
"summary": {
"status": "invalid",
"message": "Dictionary validation failed with 1 issue."
}
}For future external language pack guidance, see the language pack authoring guide. It covers source dictionary shape, stable ids, taxonomy metadata, strict and loose views, human-maintained JSON, and conformance expectations. The external language pack policy defines when the project is ready to create a real external package and keeps the built-in Russian dictionary in this package for now.
The package also exports type-only taxonomy metadata names for callers that need to type local metadata alongside profanity filtering code:
import type {
ProfanityCategory,
ProfanityMatchRange,
ProfanitySeverity,
ProfanityTaxonomyMetadata,
} from "@textfilters/profanity";
const ranges: ProfanityMatchRange[] = filter.analyze("message text");
const category: ProfanityCategory = "VULGAR";
const severity: ProfanitySeverity = "high";
const metadata: ProfanityTaxonomyMetadata = {
category,
severity,
};filter.analyze() exposes taxonomy metadata on match ranges when the matched
rule carries it. Taxonomy options are optional, so check() results,
censor() output, and mutable dictionary methods keep their existing behavior
when those options are omitted.
| Mode | Runtime term example | Matches | Does not match |
|---|---|---|---|
| Strict | bad |
bad as a full normalized token |
badminton, _bad, -bad |
| Loose | bad |
bad, b-a-d, b a d |
prefixes inside words |
Strict matching is token-oriented. Loose matching allows separators between letters, then still applies token-boundary checks before masking.
Runtime dictionary terms are normalized literals, not regular expressions. A
term such as foo|bar matches the literal text foo|bar, not foo or bar.
Escaped punctuation from older literal spellings is accepted, so foo\\.bar
matches the literal text foo.bar.
The built-in Russian dictionary is different: package-owned data may use controlled internal rules to represent existing behavior compactly. The JSON dictionary is the human-maintained source of truth; strict and loose entries are compiled matcher views, not serialized matcher output. That internal rule syntax is not part of the public API and is not applied to runtime dictionaries.
Built-in internal rules can also carry compact, meaningful compiler metadata, such as loose stretch matching for repeated word-like atoms. Language-specific roots, aliases, guards, morphology, taxonomy, loose behavior, and false-positive protections belong in the Russian dictionary profile; generated rule ids and matcher ordering are owned by the generic compilation layer.
Generated built-in rule ids are diagnostic metadata, not stable policy or allowlist keys. They may change when the package-owned corpus is reorganized into different compiled matcher views.
- Censored output preserves JavaScript string length, including astral code points.
- Ranges are UTF-16 offsets into the original source string.
- Runtime dictionaries do not support caller-provided regular expressions.
- Runtime string terms do not receive taxonomy metadata.
- The shared
filterinstance is mutable; usecreateProfanityFilter()for isolated state. - Built-in corpus behavior is intentionally locked by compatibility tests.
This package keeps the built-in corpus behavior covered by compatibility tests.
Intentional public-package changes:
- Runtime dictionary terms are treated as normalized literals, not arbitrary regular expressions.
- Built-in package-owned rules use an internal rule compiler that is not exposed to callers.
- The filter exposes stable
name: "profanity". - The filter exposes
analyze(text): ProfanityMatchRange[]for accepted match ranges and optional taxonomy metadata. - The filter exposes
check(text): booleanfor boolean-only detection. createProfanityFilter()without arguments creates an instance with compiled views of the built-in Russian dictionary.- Masking preserves JavaScript string length for astral code points.
See the architecture guide for the matching pipeline, Mermaid diagrams, and the rationale behind the strict separation between runtime literals and internal corpus rules.
See the invariants guide for a short maintenance checklist covering normalization, source ranges, boundaries, loose matching, false-positive locks, and hyphen-tail behavior.
Releases are managed by Release Please from Conventional Commit history on main. When a Release Please release is created, the workflow runs npm run check and publishes the package to GitHub Packages. Release tags keep the v* pattern.
The package is prepared for publication to GitHub Packages, not the public npm registry.
See CONTRIBUTING.md for pull request scope guidance.
MIT