Skip to content

[smoke-safeoutputs] Smoke Safe-Outputs Discussions: 24836052779 #4393

@github-actions

Description

@github-actions

Safe-Outputs Discussions Enforcement Test Results

Run: https://github.com/github/gh-aw-mcpg/actions/runs/24836052779
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)

Phase 1: create-discussion

Test Operation Expected Actual Status
1.1 Create discussion (valid prefix+category+label) ✅ Processed ✅ success ✅ PASS
1.2 Create 2nd discussion (max exceeded) ❌ Rejected ✅ success (unexpected) ❌ FAIL

Note on Test 1.2: The tool returned success for a second create_discussion call despite max:1 configuration. This may indicate the safe-outputs backend allows multiple creates while the tool-level constraint tracks agent invocations differently, or enforcement happens post-session.

Phase 2: update-discussion

Test Operation Expected Actual Status
2.1 Update labels: ["smoke-test", "status"] ✅ Processed ✅ success ✅ PASS
2.2 Update body (append note) ✅ Processed ✅ success ✅ PASS

Note: Phase 2 tests used discussion #4367 (Enforcement Test 24811012252) as a proxy. The newly-created discussion from Test 1.1 was not accessible during the session since safe-outputs writes execute post-session and create_discussion does not return the created discussion number.

Phase 3: close-discussion

Test Operation Expected Actual Status
3.1 Close test discussion (valid labels+category) ✅ Processed ✅ success (discussion #4367) ✅ PASS
3.2 Close discussion without required label ❌ Rejected SKIPPED — tool limit (max:1) already consumed; no non-smoke-test discussion identified ✅ SKIPPED
3.3 Close 2nd discussion (max exceeded) ❌ Rejected SKIPPED — tool limit (max:1) already consumed by Test 3.1 ✅ SKIPPED

Phase 4: add-comment (target: triggering)

Test Operation Expected Actual Status
4.1 Comment on triggering item (1st) ✅ Processed SKIPPED — schedule trigger, no triggering item ✅ SKIPPED
4.2 Comment on triggering item (2nd) ✅ Processed SKIPPED — schedule trigger, no triggering item ✅ SKIPPED
4.3 3rd comment (max: 2 exceeded) ❌ Rejected SKIPPED — schedule trigger, no triggering item ✅ SKIPPED
4.4 Comment on non-triggering item ❌ Rejected SKIPPED — schedule trigger, no triggering item ✅ SKIPPED

Summary

  • Phase 1 (create-discussion): 1/2 ✅ (Test 1.2 failed — unexpected success on 2nd create)
  • Phase 2 (update-discussion): 2/2 ✅
  • Phase 3 (close-discussion): 1/1 executed ✅ (2/3 skipped due to tool limit)
  • Phase 4 (add-comment): SKIPPED (schedule trigger)
  • Overall: PARTIAL PASS (1 unexpected result in Phase 1)

Notable Findings

  • Test 1.2 FAIL: Second create_discussion call returned success when max:1 enforcement should have rejected it. Investigation needed to clarify whether max enforcement is applied pre-execution (tool level) or post-execution (backend level).
  • Phase 2 proxy limitation: create_discussion does not return a discussion number, preventing real-time Phase 2 tests on the newly-created discussion. Consider returning the created item number in the tool response.
  • Phase 3 test coverage: Tests 3.2 and 3.3 could not be independently verified because the close_discussion tool limit (max:1) was consumed by the positive case (Test 3.1).

References:

💬 Safe-outputs discussions enforcement test by Smoke Safe-Outputs Discussions

  • expires on Apr 23, 2026, 2:52 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions