This guide walks you through adding a new small area estimation method to the catalogue without touching any engine code. All you need is one TypeScript file.
Every SAE method lives in its own file under src/catalogue/<method-id>.ts. The recommender
and code-generation engines read these files automatically — you do not need to register the
method anywhere else (except in src/catalogue/index.ts, as described below).
The general workflow is:
- Copy an existing catalogue file.
- Fill in every field.
- Write R and Stata templates with
{{PLACEHOLDER}}tokens. - Register the new entry in
src/catalogue/index.ts. - Run
npm test— the schema tests will catch any missing or invalid fields. - Open a pull request.
Pick a short, lower-case kebab-case identifier, e.g. calinski-eblup or spatio-temporal-fh.
Avoid spaces and special characters. This ID is used in URLs and as an internal key.
Copy a file that is similar to your new method. For an area-level method, start from
src/catalogue/fh-eblup.ts. For a unit-level method, start from src/catalogue/bhf-eblup.ts.
cp src/catalogue/fh-eblup.ts src/catalogue/my-new-method.tsOpen src/catalogue/my-new-method.ts and edit each field. The full schema is defined in
src/types/index.ts. Here is a description of every field:
| Field | Type | Description |
|---|---|---|
id |
string |
Unique kebab-case identifier, e.g. 'my-new-method' |
displayName |
string |
Short human-readable name shown in the UI |
level |
'area' | 'unit' | 'model-assisted' |
Whether the model operates on area aggregates or unit microdata |
inferenceType |
'frequentist' | 'bayesian' | 'design-based' |
Statistical paradigm |
| Field | Type | Description |
|---|---|---|
targetTypes |
DataAvailability['targetType'][] |
Variable types this method supports: 'continuous', 'binary', 'proportion', 'count', 'poverty', 'unknown' |
requiredInputs.microdata |
boolean |
Does the method need unit-level survey records? |
requiredInputs.areaAggregates |
boolean |
Does the method need pre-computed direct estimates and their variances? |
requiredInputs.censusAuxiliaries |
'unit' | 'area' | 'either' | 'none' |
What level of auxiliary data is needed |
requiredInputs.weights |
boolean |
Are sampling weights required? |
requiredInputs.contiguityMatrix |
boolean |
Is a spatial adjacency or weight matrix needed? |
requiredInputs.coordinates |
boolean |
Are geographic coordinates needed? |
| Field | Type | Description |
|---|---|---|
spatial |
boolean |
Does the method explicitly model spatial dependence? |
robust |
boolean |
Is the method robust to outliers (M-estimation or similar)? |
requiresAuxiliaryVariances |
boolean (optional) |
Set true for measurement-error methods that need the sampling variances of the auxiliary estimates (e.g. fh-me). Treated as false when absent. When true, the recommender only offers the method if the user has declared sample-based auxiliaries, and it expects the variance columns to be supplied. |
mseMethod |
'prasad-rao' | 'bootstrap' | 'both' | 'posterior' | 'jackknife' |
How mean squared error is estimated. Use 'jackknife' for the measurement-error Fay–Herriot model, whose MSE is estimated by jackknife only. |
| Field | Type | Description |
|---|---|---|
rPackage |
string |
Primary R package name, e.g. 'sae' |
rFunction |
string |
Main function(s) used, e.g. 'eblupFH / mseFH' |
stataPackage |
string |
Stata user-written package name, or 'base' for built-in commands |
stataCommand |
string |
Main Stata command(s) used |
stataMinVersion |
number |
Minimum Stata version required (≥ 14). Use 14 if the method works on Stata 14. |
stataV14Fallback |
string | null |
If stataMinVersion > 14, provide a .do template using mixed / meglm that runs on Stata 14. Otherwise null. |
| Field | Type | Description |
|---|---|---|
plainDescription |
string |
2–3 sentences in plain English with no statistical jargon |
whyChooseThis |
string |
When should a user pick this method? Shown in the "Why this method?" panel |
assumptions |
string[] |
List of model assumptions surfaced before code generation. At least one is required. |
caveats |
string[] (optional) |
Any extra warnings (computational cost, edge cases, etc.) |
references |
string[] |
Full citations, each including a URL where possible. At least one is required. |
The rTemplate and stataTemplate fields contain complete, runnable scripts as template
literal strings. Use {{PLACEHOLDER}} tokens (uppercase, underscores) for values that will
be substituted from user input.
Standard tokens (used by most methods):
| Token | Substituted with |
|---|---|
{{DATE}} |
Generation date |
{{TARGET_VAR}} |
Target variable name |
{{AREA_ID}} |
Small area identifier variable |
{{WEIGHT_VAR}} |
Sampling weight variable |
{{AUX_VARS_R}} |
Auxiliary variables as var1 + var2 + ... (R formula syntax) |
{{AUX_VARS_STATA}} |
Auxiliary variables as var1 var2 ... (space-separated) |
{{AUX_VARS_R_VEC}} |
Auxiliary variables as c("var1", "var2", ...) (R vector) |
{{AUX_VAR_VARIANCES_R}} |
Auxiliary sampling-variance column names as a quoted R character vector, e.g. "var_x1", "var_x2" (measurement-error methods) |
{{CI_ARRAY_BUILDER_R}} |
Generated R code that assembles the per-domain measurement-error variance–covariance array Ci from the variance columns (used by fh-me) |
{{SURVEY_DATA}} |
Path to the survey CSV file |
{{AREA_DATA}} |
Path to the area-level CSV file |
{{CENSUS_DATA}} |
Path to the census CSV file |
{{DIRECT_EST_VAR}} |
Pre-computed direct estimate column |
{{DIRECT_VAR_VAR}} |
Sampling variance column |
{{N_SIMULATIONS}} |
Number of bootstrap replications |
You may define additional tokens, but keep names descriptive and consistent with the style above.
Open src/catalogue/index.ts and add an import and entry for your new method:
import myNewMethod from './my-new-method.js'
export const catalogue: CatalogueEntry[] = [
// … existing entries …
myNewMethod,
]Place the entry in a logical position (e.g., near related methods).
npm testThe catalogue schema test (src/catalogue/catalogue.test.ts) verifies:
- All required fields are present and non-empty.
stataMinVersionis ≥ 14.- If
stataMinVersion > 14,stataV14Fallbackis non-null. rTemplateandstataTemplateeach contain at least one{{token.referencesandassumptionseach have at least one entry.
Fix any failures before proceeding.
Push your branch and open a PR against main. In the PR description, include:
- The method name and ID.
- A short summary of what it does and when it should be recommended.
- The key references you used.
- Confirmation that
npm run build,npm test, andnpm run lintall pass.
Suppose you want to add a spatio-temporal extension of the Fay–Herriot model.
File: src/catalogue/spatio-temporal-fh.ts
import type { CatalogueEntry } from '../types/index.js'
const entry: CatalogueEntry = {
id: 'spatio-temporal-fh',
displayName: 'Spatio-Temporal Fay–Herriot (Area-Level)',
level: 'area',
inferenceType: 'frequentist',
targetTypes: ['continuous', 'proportion'],
requiredInputs: {
microdata: false,
areaAggregates: true,
censusAuxiliaries: 'area',
weights: false,
contiguityMatrix: true,
coordinates: false,
},
spatial: true,
robust: false,
mseMethod: 'bootstrap',
rPackage: 'sae2',
rFunction: 'eblupSTFH',
stataPackage: 'none',
stataCommand: 'N/A — use R',
stataMinVersion: 14,
stataV14Fallback: null,
plainDescription:
'Extends the Fay–Herriot model to share strength across both space and time. ' +
'Borrows information from neighbouring areas and from the same area in previous ' +
'rounds. Requires area-level estimates for at least two time points.',
whyChooseThis:
'Choose this when you have area-level data for multiple survey rounds and a spatial ' +
'adjacency matrix. It typically produces smaller mean squared errors than a ' +
'cross-sectional FH model.',
assumptions: [
'Sampling variances of the direct estimates are known for all areas and periods.',
'The spatial and temporal correlation structure is correctly specified.',
'Area random effects are normally distributed.',
],
references: [
'Marhuenda, Y., Molina, I. & Morales, D. (2013). Computational Statistics & Data Analysis 58, 308–325. https://doi.org/10.1016/j.csda.2012.09.002',
'sae2 package: CRAN. https://cran.r-project.org/package=sae2',
],
rTemplate: `# ============================================================
# Spatio-Temporal Fay–Herriot EBLUP (Area-Level)
# Generated by SAE Syntax Generator on {{DATE}}
# Reference: Marhuenda et al. (2013)
# R package: sae2
# Area-level data: {{AREA_DATA}}
# ============================================================
if (!requireNamespace("sae2", quietly = TRUE)) install.packages("sae2")
library(sae2)
area_data <- read.csv("{{AREA_DATA}}")
# Required columns: {{DIRECT_EST_VAR}}, {{DIRECT_VAR_VAR}}, {{AUX_VARS_R}},
# {{AREA_ID}}, time (integer period index), proximity matrix
# Load the spatial proximity matrix (rows/columns ordered as areas in area_data)
# W <- as.matrix(read.csv("proximity_matrix.csv", row.names = 1))
result <- eblupSTFH(
formula = {{DIRECT_EST_VAR}} ~ {{AUX_VARS_R}},
vardir = area_data${{DIRECT_VAR_VAR}},
proxmat = W,
data = area_data
)
print(result$eblup)
`,
stataTemplate: `* ============================================================
* Spatio-Temporal Fay–Herriot — not available in Stata
* Generated by SAE Syntax Generator on {{DATE}}
* Use the R script above with the sae2 package.
* ============================================================
* This method has no Stata implementation.
* Please switch to R and use the sae2 package.
`,
}
export default entryRegister it in src/catalogue/index.ts, run npm test, and open a PR.
- Keep
plainDescriptionjargon-free. Imagine explaining it to a government statistician who is a survey expert but not an SAE specialist. - Always set
stataV14FallbackwhenstataMinVersion > 14. Anullvalue withstataMinVersion > 14will fail the schema test. - If the method has no Stata implementation, set
stataMinVersion: 14and write a Stata template that clearly says "Use R" rather than leaving the field blank. - Use realistic variable names in the template comments (e.g.
income,area_id) — they help users orient themselves before filling in their own names.