Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
bfd5cbf
[AISOS-1888] Define StatsState mixin and StageStats TypedDict
ekuris-redhat Jun 24, 2026
4e0974b
[AISOS-1889] Integrate StatsState mixin into FeatureState and BugState
ekuris-redhat Jun 24, 2026
4922ace
[AISOS-1890] Implement core stats recording utility functions
ekuris-redhat Jun 24, 2026
83f2b81
[AISOS-1891] Add cost alert threshold configuration
ekuris-redhat Jun 24, 2026
becb2df
[AISOS-1892] Define workflow stage constants for stats tracking
ekuris-redhat Jun 24, 2026
ab3e012
[AISOS-1893] Integrate stats recording into PRD and Spec generation n…
ekuris-redhat Jun 24, 2026
93f3b49
[AISOS-1894] Implement Stats Summary Formatter Module
ekuris-redhat Jun 24, 2026
42ea5f4
[AISOS-1895] Create Stats Comment Posting Service
ekuris-redhat Jun 24, 2026
6c9a430
[AISOS-1896] Implement idempotency guard for stats comments
ekuris-redhat Jun 24, 2026
575838a
[AISOS-1897] Implement re-post mechanism for final stats comment
ekuris-redhat Jun 24, 2026
147d73e
[AISOS-1898] Create terminal event stats posting node
ekuris-redhat Jun 24, 2026
63cef09
[AISOS-1899] Integrate stats posting into Feature and Bug workflow gr…
ekuris-redhat Jun 24, 2026
8c81297
[AISOS-1900] Add cost alert posting to stats summary
ekuris-redhat Jun 24, 2026
3b283e3
[AISOS-1901] Implement /forge stats Jira Comment Command Handler
ekuris-redhat Jun 24, 2026
3bfc8df
[AISOS-1902] Implement /forge stats retry subcommand handler
ekuris-redhat Jun 24, 2026
36c1a61
[AISOS-1903] Implement forge stats CLI command
ekuris-redhat Jun 24, 2026
d0286a5
[AISOS-1904] Create stats retrieval service module
ekuris-redhat Jun 24, 2026
d4c1077
[AISOS-1905] Add CLI stats formatter for terminal output
ekuris-redhat Jun 24, 2026
12d1f2b
[AISOS-1906] Add integration tests for on-demand stats commands
ekuris-redhat Jun 24, 2026
885c34f
[AISOS-1907] Implement Weekly Report Data Aggregation Module
ekuris-redhat Jun 24, 2026
14d6a12
[AISOS-1908] Implement per-feature rollup aggregation for epic-linked…
ekuris-redhat Jun 24, 2026
e37ee5b
[AISOS-1909] Implement Weekly Report Formatters (CLI, Markdown, JSON)
ekuris-redhat Jun 24, 2026
0e5371c
[AISOS-1910] Implement forge weekly-report CLI Command
ekuris-redhat Jun 24, 2026
6d92fd6
[AISOS-1911] Implement Report Ticket Resolution and Auto-Creation
ekuris-redhat Jun 24, 2026
58afae0
[AISOS-1912] Implement Jira-native notification delivery to project r…
ekuris-redhat Jun 24, 2026
c778fb2
[AISOS-1913] Add integration tests for weekly reporting system
ekuris-redhat Jun 24, 2026
421f3ed
[AISOS-1883-review] Fix lint issues found during local code review
ekuris-redhat Jun 24, 2026
62cb18b
[AISOS-1883-docs] docs: update documentation for /forge stats Jira co…
ekuris-redhat Jun 24, 2026
c4d5eff
[AISOS-1883-ci-fix] Fix Python 3.11 incompatibility in TypedDict inhe…
ekuris-redhat Jun 24, 2026
a16c29f
[AISOS-1883-rebase] Resolve merge conflicts with main for AISOS-1883
ekuris-redhat Jun 24, 2026
076defc
[AISOS-1883-rebase] Resolve merge conflicts with main for AISOS-1883
ekuris-redhat Jun 24, 2026
50376c2
[AISOS-1883-ci-fix] Remove feature-specific nodes from _TERMINAL_NODES
ekuris-redhat Jun 24, 2026
57ae24e
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 25, 2026
a508126
[AISOS-1883-review-fix] Implement PR review plan for AISOS-1883
ekuris-redhat Jun 25, 2026
02bc55b
[AISOS-1883-review-review-impl] Fix bug formatter stages and Redis by…
ekuris-redhat Jun 25, 2026
8119240
[AISOS-1883-rebase] Resolve merge conflicts with main for AISOS-1883
ekuris-redhat Jun 25, 2026
43b81d8
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 25, 2026
a6dc133
[AISOS-1883-review-review-impl] Fix incorrect keyword argument name i…
ekuris-redhat Jun 25, 2026
cb1ffad
Merge remote-tracking branch 'origin/main' into forge/aisos-1883
ekuris-redhat Jun 28, 2026
9507f01
[AISOS-1883] review: address PR feedback
Jun 28, 2026
7af92a1
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 28, 2026
d4a61c6
[AISOS-1883-review-fix] Implement PR review plan for AISOS-1883
ekuris-redhat Jun 28, 2026
73c8164
[AISOS-1883-ci-analyze] Analyze CI failures (attempt 1)
ekuris-redhat Jun 28, 2026
6553a49
[AISOS-1883-ci-fix] Fix task generation unit test mock return values
ekuris-redhat Jun 28, 2026
005a5c0
[AISOS-1883-review-ci-fix-1] Fix Mocking issues in PRD rejected tests
ekuris-redhat Jun 28, 2026
dc74954
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 28, 2026
c511b70
[AISOS-1883-review-review-impl] Post-review-impl code review and form…
ekuris-redhat Jun 28, 2026
dd93a2e
[AISOS-1883-review-review-impl] Post-review-impl code review and vali…
ekuris-redhat Jun 28, 2026
674a844
[AISOS-1883-review-analyze] Analyze PR review feedback for AISOS-1883
ekuris-redhat Jun 28, 2026
98b1bad
[AISOS-1883] review: address PR feedback
Jun 28, 2026
2151989
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 28, 2026
fea489c
[AISOS-1883-review-review-impl] Remove redundant sequential timing ca…
ekuris-redhat Jun 28, 2026
58e45d0
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 28, 2026
abb7c96
[AISOS-1883-review-fix] review: address PR feedback
ekuris-redhat Jun 29, 2026
5949782
[AISOS-1883-review-review-impl] Resolve all type-checking and type-sa…
ekuris-redhat Jun 29, 2026
37291c2
[AISOS-1883-review-analyze] Analyze PR review feedback for AISOS-1883
ekuris-redhat Jun 29, 2026
fe8ab0d
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
1cc78fb
[AISOS-1883-review-review-impl] Resolve remaining stats-related type-…
ekuris-redhat Jun 29, 2026
da09359
[AISOS-1883-review-review-impl] Post-review-impl code review and test…
ekuris-redhat Jun 29, 2026
bb9cb37
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
35d6e94
[AISOS-1883-review-review-impl] Fix Redis connection errors in weekly…
ekuris-redhat Jun 29, 2026
27e246e
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
92936db
[AISOS-1883-review-review-impl] Post-review-impl code review fixes
ekuris-redhat Jun 29, 2026
98349c4
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
f404d03
[AISOS-1883-review-review-impl] Post-review-impl code review and type…
ekuris-redhat Jun 29, 2026
9291359
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
74a2223
[AISOS-1883-review-review-impl] Post-review-impl code review
ekuris-redhat Jun 29, 2026
6981331
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 29, 2026
1fae412
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 30, 2026
8ed5744
[AISOS-1883-review-review-impl] Fix breaking issues found in code rev…
ekuris-redhat Jun 30, 2026
c313d26
[AISOS-1883-review-fix] Revert formatting-only changes in test_ci_att…
ekuris-redhat Jun 30, 2026
0eb8412
Merge remote-tracking branch 'origin/main' into forge/aisos-1883
ekuris-redhat Jun 30, 2026
4740972
[AISOS-1883] review: address PR feedback
ekuris-redhat Jun 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -245,3 +245,21 @@ CI_FIX_MAX_RETRIES=5
CI_IGNORED_CHECKS=tide
# Webhook acknowledgment timeout in seconds
WEBHOOK_ACK_TIMEOUT=0.5

# =============================================================================
# Stats Cost Alert Configuration
# =============================================================================
# Enable cost alerting in workflow stats summaries. When enabled and aggregate
# token usage (input + output across all stages) exceeds the threshold, the
# stats summary will include a cost alert.
STATS_ALERT_ENABLED=true
# Total token count threshold that triggers a cost alert (default: 1,000,000).
# Applies to aggregate token usage across all workflow stages.
STATS_ALERT_THRESHOLD_TOKENS=1000000
# Dollar cost threshold for cost alerts. When set, compares total dollar cost against
# this value instead of using the token-based threshold above.
# STATS_ALERT_THRESHOLD_COST=10.00
# LLM pricing table as a JSON-encoded string mapping model name substrings to
# per-million-token rates (input and output in $/MTok). Longest key match wins.
# Default rates are pre-populated; override only if prices change.
# LLM_PRICING={"claude-opus-4":{"input":15.00,"output":75.00},"claude-sonnet-4":{"input":3.00,"output":15.00}}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ ENV/

# Testing
.pytest_cache/
.mypy_cache/
.coverage
htmlcov/
*.cover
Expand Down
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ podman rm $(podman ps -a --filter name=forge- -q)
| `!` | Revision request — triggers regeneration with feedback |
| `?` or `@forge ask` | Question — triggers Q&A answer |
| `>option N` | RCA option selection (RCA Option Gate only) |
| `/forge stats` | Post current workflow statistics as a Jira comment (read-only) |
| `/forge stats retry` | Re-post stats comment, forcing a fresh calculation |
| _(no prefix)_ | Informational — workflow ignores it |

## GitHub PR Comment Commands
Expand Down
48 changes: 48 additions & 0 deletions containers/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,54 @@ async def run_agent_task(
else:
result = await agent.ainvoke(initial_message, config=config)

# Extract and aggregate tokens from usage_metadata

@danchild danchild Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clarify our intentions to support stats. 1) Will we support stats if users do not configure langfuse or another supported LLM observability tool? If the answer is yes, metrics gathering in agent containers is compulsory. 2) Either way, manual token accounting increases complexity and risk of langfuse drifting from the manually counted metrics. We should consider what it would look like to gather the same token data directly from langfuse and weigh its added complexity against the manual account in metrics.json.

try:
total_input_tokens = 0
total_output_tokens = 0
messages = result.get("messages", []) if isinstance(result, dict) else []
for message in messages:
msg_type = type(message).__name__
if msg_type in ("AIMessage", "AIMessageChunk"):
usage = getattr(message, "usage_metadata", None)
if not usage:
resp_metadata = getattr(message, "response_metadata", {})
if isinstance(resp_metadata, dict):
usage = resp_metadata.get("token_usage") or resp_metadata.get("usage")

if isinstance(usage, dict):
total_input_tokens += (
usage.get("input_tokens", 0) or usage.get("prompt_tokens", 0) or 0
)
total_output_tokens += (
usage.get("output_tokens", 0) or usage.get("completion_tokens", 0) or 0
)
elif usage is not None:
total_input_tokens += (
getattr(usage, "input_tokens", 0)
or getattr(usage, "prompt_tokens", 0)
or 0
)
total_output_tokens += (
getattr(usage, "output_tokens", 0)
or getattr(usage, "completion_tokens", 0)
or 0
)

metrics_dir = workspace / ".forge"
metrics_dir.mkdir(parents=True, exist_ok=True)
metrics_file = metrics_dir / "metrics.json"
metrics_file.write_text(
json.dumps(
{"input_tokens": total_input_tokens, "output_tokens": total_output_tokens},
indent=2,
)
)
logger.info(
f"Saved container metrics to {metrics_file}: input_tokens={total_input_tokens}, output_tokens={total_output_tokens}"
)
except Exception as e:
logger.warning(f"Failed to record token usage inside sandbox: {e}")

# Flush Langfuse traces before exit
if langfuse_enabled:
try:
Expand Down
1 change: 1 addition & 0 deletions docs/guide/bug-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ At any approval gate, Forge classifies your comment by its prefix:
- **`!` prefix** — revision request: Forge regenerates the current artifact with your feedback
- **`?` prefix or `@forge ask`** — question: Forge answers and stays paused
- **`>option N`** — RCA option selection (RCA Option Gate only)
- **`/forge stats`** — posts current workflow statistics as a Jira comment (read-only)
- **No prefix** — informational: ignored by the workflow

---
Expand Down
48 changes: 47 additions & 1 deletion docs/guide/feature-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ Start a comment with `!` followed by your feedback. Forge regenerates the curren
```

!!! note
Comments without a recognized prefix (`!`, `?`, `@forge ask`) are treated as informational and ignored by the workflow. Only `!`-prefixed comments trigger regeneration.
Comments without a recognized prefix (`!`, `?`, `@forge ask`, `/forge stats`) are treated as informational and ignored by the workflow. Only `!`-prefixed comments trigger regeneration.

## Handling Failures

Expand All @@ -199,6 +199,52 @@ To retry, add the `forge:retry` label. Forge resumes from the exact node that fa
!!! tip "CI retries"
If CI fix attempts are exhausted, `forge:retry` resets the attempt counter for a fresh budget of retries.

## Workflow Statistics

At the end of a workflow execution (when the ticket reaches a terminal state, including **Completed**, **Blocked**, or **Failed**), Forge aggregates execution data and automatically posts a comprehensive summary on the Jira ticket. This ensures that even when a workflow is blocked or fails, stakeholders can inspect the resource usage and performance metrics up to that point. This helps teams track efficiency, analyze execution bottlenecks, and monitor LLM token costs.

### Summary Format

The summary is generated as a Markdown table with the following columns:

| Column | Description |
|---|---|
| **Stage** | The name of the pipeline stage (e.g., PRD, Spec, Epics, Tasks, Implementation, CI, Review). |
| **Iterations** | The number of attempts or iterations executed during that stage. |
| **Machine Time** | Monotonic duration of active processing by Forge during that stage (formatted as `1h 2m 3s`). |
| **Input Tokens** | Estimated number of LLM input tokens consumed during that stage. |
| **Output Tokens** | Estimated number of LLM output tokens consumed during that stage. |
| **Cost** | Calculated cost based on the stage's token consumption and LLM pricing mappings. |

At the bottom of the table, a **Total** rollup row displays sum totals across all executed stages.

### Cost Alerting

If the cumulative resource consumption exceeds specified safety thresholds, a prominent warning alert is appended to the statistics summary comment.

Alert thresholds are defined globally (or can be customized in the configuration):
- **Token Threshold:** Triggers if cumulative input + output tokens exceed a specified value (default: `1,000,000` tokens).
- **Dollar Threshold:** Triggers if cumulative calculated cost exceeds a specified monetary value (default: disabled/`None`).

When triggered, a cost warning similar to the following is displayed directly below the summary table:

```text
⚠️ WARNING: This workflow run exceeded the configured cost/token limits!
Please review the resource usage details above for potential optimizations.
```

---

## On-Demand Stats Commands

In addition to automatic summary posting at the end of a successful workflow run, team members can request or force-refresh stats at any time using Jira comment commands.

| Command | Action | Description |
|---|---|---|
| `/forge stats` | Request Stats | Generates the current statistics table and posts it as a comment on the Jira ticket, reflecting metrics up to the current stage of execution. |
| `/forge stats retry` | Refresh Stats | Forces a fresh recalculation of statistics and re-posts the summary table. This ensures the stats comment remains updated as the final comment on the Jira issue. |


## Labels Summary

See [Jira Labels](labels.md) for the complete reference.
2 changes: 1 addition & 1 deletion docs/guide/labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ These labels advance the pipeline. Forge watches for label changes via Jira webh

**Asking questions:** Start a comment with `?` or `@forge ask`. Forge answers without advancing or regenerating.

**Informational comments:** Comments without a recognized prefix (`!`, `?`, `@forge ask`, `>option`) are ignored by the workflow — use them for team discussion without triggering Forge.
**Informational comments:** Comments without a recognized prefix (`!`, `?`, `@forge ask`, `>option`, `/forge stats`) are ignored by the workflow — use them for team discussion without triggering Forge.

**Handling failures:** When `forge:blocked` appears, read the Forge comment for the error. Fix the underlying issue if needed, then add `forge:retry`.

Expand Down
63 changes: 63 additions & 0 deletions docs/guide/weekly-reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Weekly Reporting System

Forge includes an automated, weekly aggregation and reporting system that compiles and publishes metrics across all managed tickets for a specific Jira project. This documentation explains how the reporting system operates behind the scenes.

## Quick Start

Generate a weekly report for your project (e.g., `PROJ`) with the following command:

```bash
forge weekly-report --project PROJ
```

> **Note:** The `forge weekly-report` command requires active Redis access and must be run from the Forge project directory containing `.env` to load configurations.

## Aggregation Logic

When you run `forge weekly-report` (or trigger it via automated schedules), the reporting system performs the following steps:

1. **Query Active/Historical Checkpoints:** Forge scans the Redis event and state checkpoints for the specified project (`PROJECT_KEY`). It uses a key scanning pattern `checkpoint:{PROJECT_KEY}-*` to find all state checkpoints.
2. **Filter by Sliding Window:** Metrics are collected and filtered based on a sliding window of `N` days (by default, `7` days). A checkpoint falls within the reporting window if its `updated_at` timestamp or any stage `started_at`/`ended_at` timestamp is greater than or equal to the cutoff (`now - N days`).
3. **Aggregate Stats per Stage:** Data is aggregated across all feature and bug workflows, tracking:
- **Ticket Rollups:** Total numbers of active, completed, or blocked workflows.
- **Machine Time:** Cumulative active machine processing time (monotonic durations) across all stages.
- **LLM Token Costs:** Sum of all input and output tokens consumed, translating them into actual dollar costs based on LLM pricing mappings.
- **Feature Rollups:** Metrics aggregated per epic-linked ticket and feature. Ancestry traversal resolves the parent/grandparent Feature for each ticket in Jira up to two hops (e.g., ticket -> Epic -> Feature). Tickets without a resolved Feature are grouped under the "Unassigned" bucket.
- **Bottleneck Analysis:** Identifies the slowest stage by average duration, ranks stages by iteration count, and calculates the CI fix rate.

## Idempotency & Ticket Publishing

To avoid cluttering Jira with duplicate reports every week, the reporting system is designed to be completely **idempotent** when publishing to Jira via the `--create-ticket` flag.

- **Ticket Naming Convention:** The ticket summary is formatted dynamically based on the project key and current date:
```text
Forge Weekly Report - {PROJECT} - Week of {date}
```
Where `{PROJECT}` is the project key, and `{date}` is the first day of the reporting week (i.e. `today - N + 1 days`).
- **Label Identification:** The system uses the special `forge:weekly-report` and `forge:generated` labels to identify and tag report tickets.
- **Idempotency Guard:**
- When `--create-ticket` is run, Forge first searches Jira using the following JQL:
```jql
project = "{PROJECT}" AND labels = "forge:weekly-report" AND summary ~ "Week of {date}"
```
- If a matching ticket is found, Forge updates that existing ticket's description with the newly compiled statistics instead of creating a new one.
- If no matching ticket exists, Forge creates a new Jira Task issue, assigns the `forge:weekly-report` and `forge:generated` labels, and sets the description.

## Stakeholder Notifications

When using the `--notify` option alongside `--create-ticket`, Forge automatically mentions and notifies designated stakeholders.

### Notification List Compilation

The notification list is compiled hierarchically to allow easy overriding (highest priority first):

1. **Jira Project Property (Highest Priority):** Forge attempts to read the `forge.weekly-report.notify` project property from Jira. This property must contain a JSON array of Jira Account IDs (e.g., `["account-id-1", "account-id-2"]`) or a comma-separated string of account IDs.
2. **Environment Variable (Global Fallback):** If no project-specific property is set, Forge falls back to the `FORGE_WEEKLY_REPORT_NOTIFY` environment variable in `.env`. This variable should contain a comma-separated list of Jira Account IDs or the keyword `"project-leads"`. The special value `"project-leads"` instructs Forge to query the per-project Jira property.
3. **No Recipients:** If neither is configured, no notifications are triggered.

### How Notifications are Delivered

Once the recipient account IDs are resolved:
- Forge posts a comment directly on the generated weekly report Jira ticket.
- The comment mentions each stakeholder using Jira's native `[~accountid:{id}]` mention syntax.
- This triggers email and/or Slack notifications based on the users' individual Atlassian notification preferences, ensuring visibility to project leads and management.
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ graph TD

- [Getting Started](getting-started.md) — Set up Forge in 10 minutes
- [Feature Workflow](guide/feature-workflow.md) — How features flow through Forge
- [Weekly Reporting Guide](guide/weekly-reporting.md) — Automated project-wide metrics and notifications
- [CLI Reference](reference/cli.md) — Command-line interface documentation
- [Developer Guide](developer-guide.md) — Full local development reference
- [Skills System](skills/index.md) — Customize Forge for your stack
- [Contributing](dev/contributing.md) — How to contribute
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Receives Jira webhook events. Validates the signature and enqueues the event for

- `jira:issue_created` — triggers new workflow if `forge:managed` label is present
- `jira:issue_updated` — handles label changes (approvals, retry)
- `jira:issue_commented` — handles Q&A and revision requests
- `jira:issue_commented` — handles Q&A, revision requests, and `/forge stats` commands

Returns HTTP 200 immediately. Processing is asynchronous.

Expand Down
Loading
Loading