Safe-Outputs Discussions Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/24836052779
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)
Phase 1: create-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 1.1 |
Create discussion (valid prefix+category+label) |
✅ Processed |
✅ success |
✅ PASS |
| 1.2 |
Create 2nd discussion (max exceeded) |
❌ Rejected |
✅ success (unexpected) |
❌ FAIL |
Note on Test 1.2: The tool returned success for a second create_discussion call despite max:1 configuration. This may indicate the safe-outputs backend allows multiple creates while the tool-level constraint tracks agent invocations differently, or enforcement happens post-session.
Phase 2: update-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 2.1 |
Update labels: ["smoke-test", "status"] |
✅ Processed |
✅ success |
✅ PASS |
| 2.2 |
Update body (append note) |
✅ Processed |
✅ success |
✅ PASS |
Note: Phase 2 tests used discussion #4367 (Enforcement Test 24811012252) as a proxy. The newly-created discussion from Test 1.1 was not accessible during the session since safe-outputs writes execute post-session and create_discussion does not return the created discussion number.
Phase 3: close-discussion
| Test |
Operation |
Expected |
Actual |
Status |
| 3.1 |
Close test discussion (valid labels+category) |
✅ Processed |
✅ success (discussion #4367) |
✅ PASS |
| 3.2 |
Close discussion without required label |
❌ Rejected |
SKIPPED — tool limit (max:1) already consumed; no non-smoke-test discussion identified |
✅ SKIPPED |
| 3.3 |
Close 2nd discussion (max exceeded) |
❌ Rejected |
SKIPPED — tool limit (max:1) already consumed by Test 3.1 |
✅ SKIPPED |
Phase 4: add-comment (target: triggering)
| Test |
Operation |
Expected |
Actual |
Status |
| 4.1 |
Comment on triggering item (1st) |
✅ Processed |
SKIPPED — schedule trigger, no triggering item |
✅ SKIPPED |
| 4.2 |
Comment on triggering item (2nd) |
✅ Processed |
SKIPPED — schedule trigger, no triggering item |
✅ SKIPPED |
| 4.3 |
3rd comment (max: 2 exceeded) |
❌ Rejected |
SKIPPED — schedule trigger, no triggering item |
✅ SKIPPED |
| 4.4 |
Comment on non-triggering item |
❌ Rejected |
SKIPPED — schedule trigger, no triggering item |
✅ SKIPPED |
Summary
- Phase 1 (create-discussion): 1/2 ✅ (Test 1.2 failed — unexpected success on 2nd create)
- Phase 2 (update-discussion): 2/2 ✅
- Phase 3 (close-discussion): 1/1 executed ✅ (2/3 skipped due to tool limit)
- Phase 4 (add-comment): SKIPPED (schedule trigger)
- Overall: PARTIAL PASS (1 unexpected result in Phase 1)
Notable Findings
- Test 1.2 FAIL: Second
create_discussion call returned success when max:1 enforcement should have rejected it. Investigation needed to clarify whether max enforcement is applied pre-execution (tool level) or post-execution (backend level).
- Phase 2 proxy limitation:
create_discussion does not return a discussion number, preventing real-time Phase 2 tests on the newly-created discussion. Consider returning the created item number in the tool response.
- Phase 3 test coverage: Tests 3.2 and 3.3 could not be independently verified because the close_discussion tool limit (max:1) was consumed by the positive case (Test 3.1).
References:
💬 Safe-outputs discussions enforcement test by Smoke Safe-Outputs Discussions
Safe-Outputs Discussions Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/24836052779
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)
Phase 1: create-discussion
Phase 2: update-discussion
Phase 3: close-discussion
Phase 4: add-comment (target: triggering)
Summary
Notable Findings
create_discussioncall returnedsuccesswhenmax:1enforcement should have rejected it. Investigation needed to clarify whether max enforcement is applied pre-execution (tool level) or post-execution (backend level).create_discussiondoes not return a discussion number, preventing real-time Phase 2 tests on the newly-created discussion. Consider returning the created item number in the tool response.References: