From 9c278232a272d1872c7607dbca315c45149c2ae8 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 29 Apr 2026 11:32:09 -0500 Subject: [PATCH 1/5] docs: clarify hidden attribute behavior, frame as platform feature MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hidden attributes (names starting with `_`) were primarily designed for platform operations — DataJoint itself uses them for `_job_start_time`, `_job_duration`, `_job_version` on Computed/Imported tables and for the `_singleton` implementation detail. Some functionality is intentionally exposed to users (notably: a unique index can reference a hidden column, making `_params_hash`-style derived columns useful), but the feature is not intended as a general column-hiding tool. Reframe section 3.4 around that intent, and replace the previous behavior table with a verified one drawn from the actual code paths: - Distinguishes platform-managed (auto-injected) from user-defined. - Documents the exact filter point (Heading.attributes) and lists every user-facing surface that consumes it: fetch, proj, joins, dict vs. string restrictions, insert/update1, repr, describe. - Calls out that fetch1("_name")/proj("_name") explicitly *is* allowed, matching the test_hidden_job_metadata.py spec. - Adds a round-trip caveat for describe(): platform-managed hidden columns regenerate fine because they're re-injected on declare, but user-defined hidden columns (like _params_hash) are silently dropped from describe() output. - Adds guidance on when to declare a hidden attribute vs. a regular one. Aligns with #1433 (which made user-defined hidden attributes parsable in the first place). --- src/reference/specs/table-declaration.md | 73 +++++++++++++++--------- 1 file changed, 45 insertions(+), 28 deletions(-) diff --git a/src/reference/specs/table-declaration.md b/src/reference/specs/table-declaration.md index 19596e2f..0b0fa30f 100644 --- a/src/reference/specs/table-declaration.md +++ b/src/reference/specs/table-declaration.md @@ -158,54 +158,71 @@ attribute_name [= default_value] : type [# comment] ### 3.4 Hidden Attributes -Attributes with names starting with underscore (`_`) are **hidden**: +Attributes with names starting with an underscore (`_`) are **hidden**. The hidden-attribute mechanism was designed for platform operations — bookkeeping columns DataJoint itself adds to support the data pipeline — and is filtered out of normal user-facing query results. Some hidden-attribute functionality is exposed to users as well, but the feature is not intended as a general column-hiding tool. + +**Platform-managed hidden attributes** are added automatically when DataJoint declares certain table types. Users do not write these in the definition: + +| Hidden attribute | Added to | Purpose | +|------------------|----------|---------| +| `_job_start_time` | `Computed`, `Imported` | Wall-clock start of the populate call | +| `_job_duration` | `Computed`, `Imported` | Elapsed seconds for the populate call | +| `_job_version` | `Computed`, `Imported` | Library version that produced the row | +| `_singleton` | Singleton tables | Implementation detail of the singleton pattern | + +**User-defined hidden attributes.** A definition may also declare hidden attributes directly. The most common use case is storing a derived value (for example, a hash of a JSON column) that backs a unique index but should not appear in query results: ```python -definition = """ -session_id : int32 ---- -result : float64 -_job_start_time : datetime(3) # hidden -_job_duration : float32 # hidden -""" +@schema +class TaskParams(dj.Manual): + definition = """ + task_id : int + --- + tool : varchar(32) + params : json + _params_hash : varchar(32) + unique index (tool, _params_hash) + """ ``` -**Behavior:** +**Behavior.** Hidden attributes are filtered out of nearly every user-facing surface. The filter is implemented in `Heading.attributes`, which all visible code paths consume; raw SQL strings bypass it. -| Context | Hidden Attributes | +| Context | Hidden attributes | |---------|-------------------| -| `heading.attributes` | Excluded | -| `heading._attributes` | Included | -| Default table display | Excluded | -| `to_dicts()` / `to_pandas()` | Excluded unless explicitly projected | -| Join matching (namesakes) | Excluded | -| Dict restrictions | Excluded (silently ignored) | -| String restrictions | Included (passed to SQL) | +| `heading.attributes`, `heading.names`, `heading.primary_key` | Excluded | +| `heading._attributes` (internal) | Included | +| Table display / `repr` / `_repr_html_` | Excluded | +| `fetch()`, `fetch1()`, `to_dicts()`, `to_pandas()` (default) | Excluded | +| `fetch("_name")` / `fetch1("_name")` (explicit) | Included | +| `proj("_name")` (explicit) | Included | +| Natural-join namesake matching | Excluded | +| Dict restriction `Table & {"_name": value}` | Silently ignored | +| String restriction `Table & "_name = ..."` | Included (passes to SQL) | +| `insert()`, `insert1()`, `update1()` | Rejected — key not in heading | +| `describe()` / reverse-engineered definition | **Excluded** — see caveat below | +| `unique index (..., _name)` | Allowed | + +**Round-trip caveat.** `describe()` walks `heading.attributes`, so it omits hidden attributes from the regenerated definition. For platform-managed hidden columns this is harmless: re-declaring a `Computed` or `Imported` table re-injects `_job_*` automatically. For *user-defined* hidden columns (such as `_params_hash` above), the regenerated definition is incomplete — re-applying it would create a table without the hidden column. Treat `describe()` output as a starting point for review, not as a faithful round-trip when user-defined hidden columns are present. **Accessing hidden attributes:** ```python -# Visible attributes only (default) +# Default fetch — hidden columns excluded results = MyTable.to_dicts() -# Explicitly include hidden attributes +# Explicit projection promotes a hidden column to visible results = MyTable.proj('result', '_job_start_time').to_dicts() -# Or with fetch1 for single row +# Explicit fetch by name returns hidden columns row = (MyTable & key).fetch1('result', '_job_start_time') -# String restriction works with hidden attributes +# String restriction works (passes through to SQL) MyTable & "_job_start_time > '2024-01-01'" -# Dict restriction IGNORES hidden attributes -MyTable & {'_job_start_time': some_date} # no effect +# Dict restriction is silently dropped — does NOT filter +MyTable & {'_job_start_time': some_date} # ⚠ ignored ``` -**Use cases:** - -- Job metadata (`_job_start_time`, `_job_duration`, `_job_version`) -- Internal tracking fields -- Attributes that should not participate in automatic joins +**When to declare a hidden attribute.** Reach for the `_` prefix when the column is part of the table's schema-level contract — needed for an index, a constraint, or platform bookkeeping — but should not appear in default fetches, joins, or displays. If you simply want a column that users *usually* don't see but might want to query with a dict, prefer a regular attribute and use `proj()` to control visibility at the call site. ### 3.5 Examples From 4a0753090a3b614af7829c8cfa45befa62a23e2f Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 29 Apr 2026 11:32:37 -0500 Subject: [PATCH 2/5] docs: use int32 core type instead of native int in hidden-attribute example --- src/reference/specs/table-declaration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/reference/specs/table-declaration.md b/src/reference/specs/table-declaration.md index 0b0fa30f..b45b5528 100644 --- a/src/reference/specs/table-declaration.md +++ b/src/reference/specs/table-declaration.md @@ -175,7 +175,7 @@ Attributes with names starting with an underscore (`_`) are **hidden**. The hidd @schema class TaskParams(dj.Manual): definition = """ - task_id : int + task_id : int32 --- tool : varchar(32) params : json From d2c0360ebeb1402b26e3f63873ac20922349e9d6 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 29 Apr 2026 11:54:45 -0500 Subject: [PATCH 3/5] docs: add write caveat for hidden attributes (insert/update1/raw-SQL) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expand §3.4 with a write caveat covering the three observed behaviors: 1. update1 raises "Attribute '_name' not found" — heading.names is filtered (heading.py:232). 2. insert raises "Field '_name' not in table heading" — Heading.__iter__ walks the filtered view (heading.py:367). 3. insert(..., ignore_extra_fields=True) silently *drops* the hidden key without writing it. Less obvious than the loud error and easy to miss. Also note that platform-managed hidden columns (_job_start_time, etc.) are populated by DataJoint internals via raw SQL during populate() (autopopulate.py:786), not via insert/update1. There is no public-API path to write to a hidden column today; users with a declared hidden column must reach for connection.query() or compute the value inside an auto_populate step. Tracks the write side of the gap that #1441 leaves open. --- src/reference/specs/table-declaration.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/reference/specs/table-declaration.md b/src/reference/specs/table-declaration.md index b45b5528..6e096cf3 100644 --- a/src/reference/specs/table-declaration.md +++ b/src/reference/specs/table-declaration.md @@ -197,10 +197,13 @@ class TaskParams(dj.Manual): | Natural-join namesake matching | Excluded | | Dict restriction `Table & {"_name": value}` | Silently ignored | | String restriction `Table & "_name = ..."` | Included (passes to SQL) | -| `insert()`, `insert1()`, `update1()` | Rejected — key not in heading | -| `describe()` / reverse-engineered definition | **Excluded** — see caveat below | +| `insert()`, `insert1()`, `update1()` | Rejected — see write caveat below | +| `insert(..., ignore_extra_fields=True)` | Silently dropped (key not written) | +| `describe()` / reverse-engineered definition | **Excluded** — see round-trip caveat below | | `unique index (..., _name)` | Allowed | +**Write caveat.** Neither `insert`/`insert1` nor `update1` accepts hidden attributes through the public API. `update1` raises `DataJointError: Attribute '_name' not found.` `insert` raises `Field '_name' not in table heading` — unless `ignore_extra_fields=True` is passed, in which case the hidden key is *silently dropped* and never written. There is currently no public-API path to populate a user-defined hidden column. Platform-managed hidden columns (the `_job_*` group) are populated by DataJoint internals via raw SQL during the `populate()` lifecycle (see `autopopulate.py`), not via the user-facing `insert`/`update1` methods. If you declare a user-defined hidden column today and need to populate it, you must do so via `connection.query()` with a raw `INSERT` or `UPDATE`, or compute it from a non-hidden column inside an `auto_populate` step. + **Round-trip caveat.** `describe()` walks `heading.attributes`, so it omits hidden attributes from the regenerated definition. For platform-managed hidden columns this is harmless: re-declaring a `Computed` or `Imported` table re-injects `_job_*` automatically. For *user-defined* hidden columns (such as `_params_hash` above), the regenerated definition is incomplete — re-applying it would create a table without the hidden column. Treat `describe()` output as a starting point for review, not as a faithful round-trip when user-defined hidden columns are present. **Accessing hidden attributes:** From cacff6371294e54a9ede0a7818ec2d67d0e6a404 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 29 Apr 2026 11:57:09 -0500 Subject: [PATCH 4/5] =?UTF-8?q?docs:=20tighten=20hidden-attribute=20guidan?= =?UTF-8?q?ce=20=E2=80=94=20high=20bar,=20app-code=20heuristic?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous "when to declare hidden" paragraph allowed too much: backing an index was treated as sufficient reason to hide. It isn't. The clean heuristic is: if application code touches the column (computes it, inserts it, queries on it, wants it in describe() output), it should be a regular attribute. Hidden is for platform/implementation concerns the application code never references — _job_* populated by autopopulate internals, _singleton's implementation pattern, or fields that would actively interfere with natural-join semantics. Use the params_hash-with-unique-index case as a concrete example of when NOT to hide: even though it backs an index, the application code computes and inserts the hash, so it should be regular and let proj() handle visibility at the call site if needed. --- src/reference/specs/table-declaration.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/reference/specs/table-declaration.md b/src/reference/specs/table-declaration.md index 6e096cf3..d3b16e6b 100644 --- a/src/reference/specs/table-declaration.md +++ b/src/reference/specs/table-declaration.md @@ -225,7 +225,9 @@ MyTable & "_job_start_time > '2024-01-01'" MyTable & {'_job_start_time': some_date} # ⚠ ignored ``` -**When to declare a hidden attribute.** Reach for the `_` prefix when the column is part of the table's schema-level contract — needed for an index, a constraint, or platform bookkeeping — but should not appear in default fetches, joins, or displays. If you simply want a column that users *usually* don't see but might want to query with a dict, prefer a regular attribute and use `proj()` to control visibility at the call site. +**When to declare a hidden attribute.** The bar is high. Reach for the `_` prefix only when the column is purely a platform/implementation concern that application code never reads, writes, or references — for example, `_job_start_time` (populated by `populate()` lifecycle internals), `_singleton` (an implementation detail of the singleton pattern), or a field whose values would actively interfere with natural-join semantics if visible. + +If your application code computes the column, inserts it, queries on it, or wants to see it in `describe()` output, **declare it as a regular attribute** even when you don't want it featured prominently. Backing a unique index, on its own, is not a sufficient reason to hide a column — for example, a `params_hash` column that backs `unique index (tool, params_hash)` should be a regular attribute because the application code is the one computing and inserting the hash. Hiding it forfeits `insert1`, dict restrictions, and `describe()` round-trip without buying anything you couldn't get from `proj()` at the call site for visibility control. ### 3.5 Examples From 832c95f42639bcfa56369da2a1921c651426cbe3 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 29 Apr 2026 12:03:19 -0500 Subject: [PATCH 5/5] docs: hidden attributes are platform-only; users should not declare them MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated to reflect the design decision in datajoint/datajoint-python#1441: the parser keeps rejecting leading-underscore attribute names and now returns a clear DataJointError instead of a cryptic ParseException. Reframe §3.4 around the platform-managed-only intent: - Lead paragraph states up-front that user-defined hidden attributes are not supported, and shows the new error message users will see. - Drop the "User-defined hidden attributes" subsection and the _params_hash hidden example. - Keep the platform-attributes table and the behavior matrix — both are still useful for users encountering platform-managed hidden columns (_job_start_time, etc.) in fetch results, joins, and describe output. - Add an explanation paragraph ("Why users can't declare them") covering the no-write-path / no-round-trip / silent-filter rationale. - Replace the user-defined example with a regular-attribute example (params_hash backing a unique index), demonstrating the recommended pattern: declare as a regular attribute, use proj() at the call site for visibility control. --- src/reference/specs/table-declaration.md | 62 ++++++++++++++---------- 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/src/reference/specs/table-declaration.md b/src/reference/specs/table-declaration.md index d3b16e6b..1eb55187 100644 --- a/src/reference/specs/table-declaration.md +++ b/src/reference/specs/table-declaration.md @@ -158,9 +158,16 @@ attribute_name [= default_value] : type [# comment] ### 3.4 Hidden Attributes -Attributes with names starting with an underscore (`_`) are **hidden**. The hidden-attribute mechanism was designed for platform operations — bookkeeping columns DataJoint itself adds to support the data pipeline — and is filtered out of normal user-facing query results. Some hidden-attribute functionality is exposed to users as well, but the feature is not intended as a general column-hiding tool. +Attributes with names starting with an underscore (`_`) are **hidden**. The hidden-attribute mechanism is reserved for **platform-managed** columns — bookkeeping that DataJoint itself adds to support the data pipeline — and is intentionally not exposed for user-defined attributes. Attempting to declare an attribute name with a leading underscore raises: -**Platform-managed hidden attributes** are added automatically when DataJoint declares certain table types. Users do not write these in the definition: +```text +DataJointError: Attribute name in line "_hidden: bool" starts with an underscore. +Names with leading underscore are reserved for platform-managed columns +(e.g. _job_start_time, _singleton). Use a regular attribute name; if you +need to control visibility at the call site, use proj(). +``` + +**Platform-managed hidden attributes** are added automatically when DataJoint declares certain table types. Users do not write these in the definition; the framework injects them programmatically after parsing. | Hidden attribute | Added to | Purpose | |------------------|----------|---------| @@ -169,22 +176,9 @@ Attributes with names starting with an underscore (`_`) are **hidden**. The hidd | `_job_version` | `Computed`, `Imported` | Library version that produced the row | | `_singleton` | Singleton tables | Implementation detail of the singleton pattern | -**User-defined hidden attributes.** A definition may also declare hidden attributes directly. The most common use case is storing a derived value (for example, a hash of a JSON column) that backs a unique index but should not appear in query results: - -```python -@schema -class TaskParams(dj.Manual): - definition = """ - task_id : int32 - --- - tool : varchar(32) - params : json - _params_hash : varchar(32) - unique index (tool, _params_hash) - """ -``` +These columns are populated by DataJoint internals via raw SQL during the `populate()` lifecycle, not via `insert`/`update1`. They are filtered out of every public API surface so they don't clutter joins, fetches, or displays. -**Behavior.** Hidden attributes are filtered out of nearly every user-facing surface. The filter is implemented in `Heading.attributes`, which all visible code paths consume; raw SQL strings bypass it. +**Behavior.** The filter is implemented in `Heading.attributes`, which all visible code paths consume; raw SQL strings bypass it. | Context | Hidden attributes | |---------|-------------------| @@ -197,16 +191,14 @@ class TaskParams(dj.Manual): | Natural-join namesake matching | Excluded | | Dict restriction `Table & {"_name": value}` | Silently ignored | | String restriction `Table & "_name = ..."` | Included (passes to SQL) | -| `insert()`, `insert1()`, `update1()` | Rejected — see write caveat below | +| `insert()`, `insert1()`, `update1()` | Rejected (`Field not in table heading`) | | `insert(..., ignore_extra_fields=True)` | Silently dropped (key not written) | -| `describe()` / reverse-engineered definition | **Excluded** — see round-trip caveat below | +| `describe()` / reverse-engineered definition | Excluded | | `unique index (..., _name)` | Allowed | -**Write caveat.** Neither `insert`/`insert1` nor `update1` accepts hidden attributes through the public API. `update1` raises `DataJointError: Attribute '_name' not found.` `insert` raises `Field '_name' not in table heading` — unless `ignore_extra_fields=True` is passed, in which case the hidden key is *silently dropped* and never written. There is currently no public-API path to populate a user-defined hidden column. Platform-managed hidden columns (the `_job_*` group) are populated by DataJoint internals via raw SQL during the `populate()` lifecycle (see `autopopulate.py`), not via the user-facing `insert`/`update1` methods. If you declare a user-defined hidden column today and need to populate it, you must do so via `connection.query()` with a raw `INSERT` or `UPDATE`, or compute it from a non-hidden column inside an `auto_populate` step. - -**Round-trip caveat.** `describe()` walks `heading.attributes`, so it omits hidden attributes from the regenerated definition. For platform-managed hidden columns this is harmless: re-declaring a `Computed` or `Imported` table re-injects `_job_*` automatically. For *user-defined* hidden columns (such as `_params_hash` above), the regenerated definition is incomplete — re-applying it would create a table without the hidden column. Treat `describe()` output as a starting point for review, not as a faithful round-trip when user-defined hidden columns are present. +**Why users can't declare them.** Allowing user-defined hidden attributes would expose a feature with no public-API write path (`insert`/`update1` reject the keys; `ignore_extra_fields=True` drops them silently), no `describe()` round-trip (the regenerated definition would be missing the column), and silent filtering on dict restrictions. The cases users typically reach for hidden attributes — most commonly an index-backing derived column — are better served by a regular attribute. -**Accessing hidden attributes:** +**Inspecting platform-managed hidden columns:** ```python # Default fetch — hidden columns excluded @@ -225,9 +217,29 @@ MyTable & "_job_start_time > '2024-01-01'" MyTable & {'_job_start_time': some_date} # ⚠ ignored ``` -**When to declare a hidden attribute.** The bar is high. Reach for the `_` prefix only when the column is purely a platform/implementation concern that application code never reads, writes, or references — for example, `_job_start_time` (populated by `populate()` lifecycle internals), `_singleton` (an implementation detail of the singleton pattern), or a field whose values would actively interfere with natural-join semantics if visible. +**Use a regular attribute instead.** When you want a column that's part of the schema-level contract (backing an index, storing a derived value, etc.) but isn't featured in default displays, declare it as a regular attribute and use `proj()` at the call site if you want to omit it from a particular query result. For example, a hash column backing a unique index: -If your application code computes the column, inserts it, queries on it, or wants to see it in `describe()` output, **declare it as a regular attribute** even when you don't want it featured prominently. Backing a unique index, on its own, is not a sufficient reason to hide a column — for example, a `params_hash` column that backs `unique index (tool, params_hash)` should be a regular attribute because the application code is the one computing and inserting the hash. Hiding it forfeits `insert1`, dict restrictions, and `describe()` round-trip without buying anything you couldn't get from `proj()` at the call site for visibility control. +```python +@schema +class TaskParams(dj.Manual): + definition = """ + task_id : int32 + --- + tool : varchar(32) + params : json + params_hash : varchar(32) + unique index (tool, params_hash) + """ + +# Inserts work directly: +TaskParams.insert1({'task_id': 1, 'tool': 't', 'params': {...}, 'params_hash': h}) + +# Dict restrictions work: +TaskParams & {'params_hash': h} + +# Hide from a specific result set with proj() if needed: +TaskParams.proj('tool', 'params').fetch() +``` ### 3.5 Examples