Skip to content

feat: error log admin diagnostics + multi-role dean login fix#397

Merged
y4nder merged 1 commit into
developfrom
feat/error-log-admin-diagnostics
May 12, 2026
Merged

feat: error log admin diagnostics + multi-role dean login fix#397
y4nder merged 1 commit into
developfrom
feat/error-log-admin-diagnostics

Conversation

@y4nder
Copy link
Copy Markdown
Member

@y4nder y4nder commented May 12, 2026

Summary

Two related changes bundled together — the multi-role dean login 500 was the catalyst for the diagnostic page, so both ship together.

  • Error log admin diagnostics: new error_log table, BullMQ-async global exception filter that captures unhandled 5xx with sanitized request bodies, admin /error-logs endpoints, 90-day cleanup cron. Surfaces in the admin.faculytics errors page so the "Internal server error" wall in staging is no longer opaque.
  • Multi-role dean login fix (ucmn-t-67092): guards against null moodleCategory after populate in resolveInstitutionalRoles. Single-role deans never hit the (DEAN ∧ CHAIRPERSON) cleanup branch — only the intersection user tripped the TypeError.
  • Migration: also adds the missing analysis_pipeline_trigger_check CHECK constraint (verified safe on staging: 0 violating rows, no name collision). Intentionally does not drop uq_analysis_pipeline_active_scope — that's the FAC-132 partial unique index the CLI can't model. See new gotcha in src/entities/CLAUDE.md.

Frontend pairing

The admin diagnostics page itself lives in admin.faculytics PR.

Test plan

  • Unit suite green: npm run test — 1148 passing
  • Lint clean: npm run lint (0 errors in changed files; pre-existing warnings unrelated)
  • Boots locally: npm run start:dev registers SystemErrorsModule and mounts ErrorLogController {/api/error-logs}
  • Migration applies on a fresh DB (auto-applied on dev boot)
  • Apply migration to staging: npx mikro-orm migration:up — verified non-destructive (creates error_log + adds analysis_pipeline_trigger_check constraint only)
  • Trigger a 5xx on staging (e.g. login as ucmn-t-67092 BEFORE merging this fix) and confirm it appears in the admin errors page
  • Login as ucmn-t-67092 AFTER merging — should succeed (defensive guards in place)
  • Verify 90-day cleanup cron fires nightly at 04:00 UTC (check logs day-after)

🤖 Generated with Claude Code

Two related changes bundled together — the dean login 500 was the
catalyst for the diagnostic page, so both ship in the same PR.

## Error log admin diagnostics

Captures unhandled 5xx exceptions so they're visible on the admin
console instead of vanishing into a generic 500 response.

- new `error_log` table (entity + migration) — never soft-deleted,
  indexed on status_code / path / user_id / error_name /
  acknowledged_at / occurred_at for filter performance
- `SystemErrorsModule` (`src/modules/system-errors/`) mirrors
  `AuditModule` shape: `ErrorLogService.Emit()` async-enqueues via
  BullMQ, `ErrorLogProcessor` persists, `ErrorLogQueryService` handles
  paginated reads + acknowledge flow, `ErrorLogController` exposes
  `/error-logs` endpoints behind a `@UseJwtGuard(SUPER_ADMIN)`
- `ErrorCaptureFilter` global exception filter: catches everything,
  only persists ≥500, redacts sensitive keys
  (`password`/`token`/`refreshToken`/`authorization` etc.) and caps
  payload depth/breadth/string length, then defers to
  `BaseExceptionFilter` so the wire response is byte-identical to today
- `ErrorLogCleanupJob` daily at 04:00 UTC — 90-day retention via
  `nativeDelete` on `occurred_at`
- 12 new unit tests covering sanitizer behaviour and filter capture
  paths (4xx skipped, 5xx captured, non-http hosts skipped, capture
  failures swallowed without re-throwing)

## Multi-role dean login fix (ucmn-t-67092)

Adds defensive guards to `MoodleUserHydrationService.resolveInstitutionalRoles`
so users with both DEAN and CHAIRPERSON institutional roles can log in
when the populated `moodleCategory` relation is null (soft-delete filter
or drift after Moodle restructure). Seven previously-unguarded
`ir.moodleCategory.moodleCategoryId` dereferences now optional-chain
+ filter; orphaned auto rows are now removed instead of crashing.
Regression tests cover both DEAN-orphan and CHAIRPERSON-orphan shapes.

Single-role deans skip the (DEAN ∧ CHAIRPERSON) cleanup branch, which
is why only the intersection user tripped the TypeError.

## Migration scope (verified safe on staging)

This migration also adds `analysis_pipeline_trigger_check` —
`@Enum(() => PipelineTrigger)` is declared on the entity but the CHECK
was missing from the DB. Verified: 0 violating rows, no name collision.

Does NOT include the CLI-suggested
`drop index uq_analysis_pipeline_active_scope`. That's the FAC-132
partial unique index — MikroORM decorators can't represent it (see
new gotcha entry in `src/entities/CLAUDE.md`) and dropping it would
reintroduce the duplicate-active-pipeline bug it fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@y4nder y4nder self-assigned this May 12, 2026
@y4nder y4nder merged commit 474dd23 into develop May 12, 2026
2 checks passed
y4nder added a commit that referenced this pull request May 12, 2026
…398)

Two related changes bundled together — the dean login 500 was the
catalyst for the diagnostic page, so both ship in the same PR.

## Error log admin diagnostics

Captures unhandled 5xx exceptions so they're visible on the admin
console instead of vanishing into a generic 500 response.

- new `error_log` table (entity + migration) — never soft-deleted,
  indexed on status_code / path / user_id / error_name /
  acknowledged_at / occurred_at for filter performance
- `SystemErrorsModule` (`src/modules/system-errors/`) mirrors
  `AuditModule` shape: `ErrorLogService.Emit()` async-enqueues via
  BullMQ, `ErrorLogProcessor` persists, `ErrorLogQueryService` handles
  paginated reads + acknowledge flow, `ErrorLogController` exposes
  `/error-logs` endpoints behind a `@UseJwtGuard(SUPER_ADMIN)`
- `ErrorCaptureFilter` global exception filter: catches everything,
  only persists ≥500, redacts sensitive keys
  (`password`/`token`/`refreshToken`/`authorization` etc.) and caps
  payload depth/breadth/string length, then defers to
  `BaseExceptionFilter` so the wire response is byte-identical to today
- `ErrorLogCleanupJob` daily at 04:00 UTC — 90-day retention via
  `nativeDelete` on `occurred_at`
- 12 new unit tests covering sanitizer behaviour and filter capture
  paths (4xx skipped, 5xx captured, non-http hosts skipped, capture
  failures swallowed without re-throwing)

## Multi-role dean login fix (ucmn-t-67092)

Adds defensive guards to `MoodleUserHydrationService.resolveInstitutionalRoles`
so users with both DEAN and CHAIRPERSON institutional roles can log in
when the populated `moodleCategory` relation is null (soft-delete filter
or drift after Moodle restructure). Seven previously-unguarded
`ir.moodleCategory.moodleCategoryId` dereferences now optional-chain
+ filter; orphaned auto rows are now removed instead of crashing.
Regression tests cover both DEAN-orphan and CHAIRPERSON-orphan shapes.

Single-role deans skip the (DEAN ∧ CHAIRPERSON) cleanup branch, which
is why only the intersection user tripped the TypeError.

## Migration scope (verified safe on staging)

This migration also adds `analysis_pipeline_trigger_check` —
`@Enum(() => PipelineTrigger)` is declared on the entity but the CHECK
was missing from the DB. Verified: 0 violating rows, no name collision.

Does NOT include the CLI-suggested
`drop index uq_analysis_pipeline_active_scope`. That's the FAC-132
partial unique index — MikroORM decorators can't represent it (see
new gotcha entry in `src/entities/CLAUDE.md`) and dropping it would
reintroduce the duplicate-active-pipeline bug it fixed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant