Agent Harness v2 Phase 1: 参考 DeerFlow 结构基于 deepagents 重建 agent_harness

## 背景

当前 PSOP 项目中的 `backend/app/agent_harness` 及其相关对象（如 `AgentSpec`、`AgentRun`、`AgentEvent`、`AgentModelCall`、`AgentToolCall`、`AgentPlanner` 等）是在较粗糙的 harness 抽象上构建的，不适合作为长期智能体开发底座继续演进。

本 issue 的目标是启动 **Agent Harness v2 Phase 1**：以 **deepagents** 作为新的核心 agent harness，并利用其底层 LangChain + LangGraph 能力，在现有 `backend/app/agent_harness` 目录下重建 PSOP 的智能体开发与运行基础层。

本 issue 只覆盖“智能体相关实现”的重建。PSOP 业务流程、Runtime Kernel、Execution Graph、Session Token、RunEvent、RunTrace 等不属于替换范围。

---

## 关键决策

1. **删除或重写旧 `agent_harness` 代码实现**
   - 不在旧 harness 上继续修补。
   - 不保留旧 `AgentPlanner` / `AgentLoopState` / `AgentDecision` / `AgentRunner` 等实现作为新平台基础。
   - 可以保留目录名与 import 边界，但目录内部实现应按 v2 重新组织。

2. **不新增 `agent_platform_v2` 目录**
   - 所有新实现统一放在：

```text
backend/app/agent_harness/
```

3. **目录结构与工程组织必须参考 DeerFlow harness**
   - 参考项目：
     - https://github.com/bytedance/deer-flow/tree/main/backend/packages/harness/deerflow
   - DeerFlow harness 的顶层组织包括：
     - `agents`
     - `community`
     - `config`
     - `guardrails`
     - `mcp`
     - `models`
     - `persistence`
     - `reflection`
     - `runtime`
     - `sandbox`
     - `skills`
     - `subagents`
     - `tools`
     - `tracing`
     - `uploads`
     - `utils`
     - `client.py`
   - PSOP 不复制 DeerFlow 的完整业务语义，但应采用类似的 harness 分层方式，而不是重新设计一套通用 `platform/backends/gateways/profiles` 风格结构。

4. **Agent Harness v2 必须支持可观测、可审计、可回放、可归因的智能体执行闭环**
   - 新 harness 不只是“能跑 agent”，还必须为后续 `pskill.evaluator`、`psop.governance` 和测试反馈闭环提供结构化证据。
   - Phase 1 不要求完整实现治理智能体，但必须把 evidence / artifact / trace / validation 模型预留好。

---

## 设计原则

1. **业务模块不直接调用 deepagents**
   - PSOP 业务流程调用具体智能体，例如：
     - `pskill.builder`
     - `pskill.compiler`
     - `pskill.tester`
     - `pskill.runner`
     - `pskill.evaluator`
     - `psop.governance`
   - 具体智能体内部由新的 `agent_harness` 基于 deepagents 实现。

2. **deepagents 只负责智能体内部如何运行**
   - planning
   - subagents
   - filesystem / workspace
   - memory / context offloading
   - skills
   - streaming
   - HITL
   - LangGraph checkpoint / thread state

3. **PSOP 仍然负责平台契约与业务边界**
   - agent invocation contract
   - tool policy / authorization gateway
   - workspace / artifact 映射
   - output schema validation
   - trace / model call / tool call 归档
   - 与 Replay / Runtime / Governance 的证据链对接

4. **不兼容旧 agent_harness 实现**
   - 可以做破坏性重构。
   - Phase 1 的目标是重建，而不是兼容旧接口。
   - 后续再逐步迁移六类智能体到新 harness。

---

## DeerFlow 参考点与 PSOP 映射

### DeerFlow harness 可参考点

- `client.py`：对外稳定 client / facade，隐藏内部 harness 细节。
- `agents/`：agent factory、lead agent、thread state、features、middlewares、agent memory 等。
- `runtime/`：runs、events、checkpointer、store、stream bridge、journal、serialization、user context 等。
- `tools/`：builtins、tool 类型、MCP metadata、skill management、tool sync 等。
- `models/`：模型 provider factory、provider patch、credential loader、payload replay 等。
- `persistence/`：运行态持久化边界。
- `skills/`：skill loading / management。
- `subagents/`：子智能体定义与调度。
- `sandbox/`：受控执行环境。
- `guardrails/`：安全与输出约束。
- `tracing/`：执行流、模型调用、工具调用、事件记录。
- `uploads/`：文件与多模态输入对象管理。

### PSOP 映射原则

- DeerFlow 的 `runtime/` 映射为 **Agent Harness runtime**，不是 PSOP Runtime Kernel。
- DeerFlow 的 `tools/` 映射为 **PSOP Tool Gateway wrapper**，所有工具仍必须通过 PSOP tool policy / authorization。
- DeerFlow 的 `uploads/` 映射为 **agent workspace / artifact object bridge**，不替代 PSOP RunEventPart。
- DeerFlow 的 `skills/` 映射为 **PSOP SkillPackage -> deepagents skill backend**。
- DeerFlow 的 `tracing/` 映射为 **agent invocation/execution trace**，并最终接入 PSOP Replay 证据链。
- DeerFlow 的 `models/` 映射为 **PSOP model route/provider abstraction**，不直接暴露 provider 细节给业务模块。

---

## 建议目录结构

新的 `backend/app/agent_harness` 应按 DeerFlow 风格组织，而不是按旧 harness 或通用 platform/backend/gateway/profile 拆分。

```text
backend/app/agent_harness/
  __init__.py
  client.py

  agents/
    __init__.py
    factory.py
    features.py
    thread_state.py
    memory/
      __init__.py
    middlewares/
      __init__.py
    pskill_builder/
      __init__.py
    pskill_compiler/
      __init__.py
    pskill_tester/
      __init__.py
    pskill_runner/
      __init__.py
    pskill_evaluator/
      __init__.py
    psop_governance/
      __init__.py

  config/
    __init__.py
    agent_manifest.py
    profiles.py
    settings.py

  guardrails/
    __init__.py
    input.py
    output.py
    tool.py

  mcp/
    __init__.py
    metadata.py
    registry.py

  models/
    __init__.py
    factory.py
    provider.py
    credential_loader.py
    payload_replay.py

  persistence/
    __init__.py
    models.py
    repository.py
    schemas.py

  reflection/
    __init__.py

  runtime/
    __init__.py
    runs.py
    events.py
    checkpointer.py
    store.py
    stream_bridge.py
    converters.py
    journal.py
    serialization.py
    user_context.py

  sandbox/
    __init__.py
    workspace.py
    policy.py

  skills/
    __init__.py
    loader.py
    registry.py
    psop_adapter.py

  subagents/
    __init__.py
    registry.py
    definitions.py

  tools/
    __init__.py
    builtins/
      __init__.py
    mcp_metadata.py
    psop_gateway.py
    skill_manage_tool.py
    sync.py
    tools.py
    types.py

  tracing/
    __init__.py
    event_mapper.py
    stream_recorder.py
    model_invocations.py
    tool_invocations.py

  uploads/
    __init__.py
    artifact_bridge.py
    workspace_objects.py

  utils/
    __init__.py
```

### 必须避免的结构

不要采用以下此前建议过但未充分参考 DeerFlow 的结构：

```text
agent_harness/backends/
agent_harness/gateways/
agent_harness/manifests/
agent_harness/profiles/
```

这些概念可以存在，但应收敛到 DeerFlow 风格模块中：

- backend adapter 收敛到 `runtime/`、`agents/factory.py`、`client.py`。
- gateway 收敛到 `tools/psop_gateway.py`、`uploads/artifact_bridge.py`、`skills/psop_adapter.py`。
- manifest/profile 收敛到 `config/agent_manifest.py` 和 `config/profiles.py`。

---

## 核心数据模型

Phase 1 可以在 `agent_harness/persistence/models.py` 中定义以下新模型，并通过 Alembic migration 建表。

### `agent_manifest`

描述一个 PSOP 智能体是什么，而不是直接暴露 deepagents 配置。

关键字段：

- `id`
- `agent_key`
- `name`
- `description`
- `status`
- `created_at`
- `updated_at`

### `agent_manifest_version`

关键字段：

- `id`
- `agent_manifest_id`
- `version`
- `backend`: `deepagents`
- `profile`: `long_horizon_builder | runtime_node_agent | evaluator | governance | demo | ...`
- `input_schema_json`
- `output_schema_json`
- `model_policy_json`
- `tool_policy_json`
- `workspace_policy_json`
- `memory_policy_json`
- `subagent_policy_json`
- `deepagents_config_json`
- `status`
- `created_at`

### `agent_invocation`

PSOP 业务视角的一次智能体调用。

关键字段：

- `id`
- `agent_key`
- `agent_manifest_version_id`
- `caller_kind`
- `caller_id`
- `runtime_run_id` nullable
- `input_payload_json`
- `context_refs_json`
- `status`: `queued | running | succeeded | failed | waiting_tool_authorization | cancelled`
- `output_payload_json`
- `error_message`
- `created_at`
- `started_at`
- `ended_at`

### `agent_execution`

底层 harness 的实际执行实例。

关键字段：

- `id`
- `agent_invocation_id`
- `backend`: `deepagents`
- `backend_thread_id`
- `backend_checkpoint_id`
- `profile`
- `status`
- `usage_summary_json`
- `error_message`
- `created_at`
- `started_at`
- `ended_at`

### `agent_trace_event`

deepagents / LangGraph stream 的 PSOP 归档投影。

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `seq_no`
- `event_kind`
- `source`: `deepagents | langgraph | psop_tool_gateway | psop_output_validator`
- `payload_json`
- `evidence_refs_json`
- `created_at`

建议 event kinds：

```text
agent.started
agent.message.delta
agent.model.started
agent.model.completed
agent.tool.requested
agent.tool.completed
agent.subagent.started
agent.subagent.completed
agent.filesystem.read
agent.filesystem.write
agent.memory.loaded
agent.memory.write_candidate
agent.output.validated
agent.failed
```

### `model_invocation`

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `trace_event_id`
- `provider`
- `model`
- `route`
- `input_summary_json`
- `output_summary_json`
- `usage_json`
- `latency_ms`
- `status`
- `error_message`
- `raw_request_ref`
- `raw_response_ref`
- `evidence_refs_json`
- `created_at`

### `tool_invocation`

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `trace_event_id`
- `tool_name`
- `tool_provider`
- `requested_args_summary_json`
- `side_effect_level`
- `policy_decision_json`
- `authorization_id`
- `status`
- `result_summary_json`
- `artifact_refs_json`
- `evidence_refs_json`
- `started_at`
- `ended_at`
- `created_at`

### `agent_workspace_object`

用于映射 deepagents virtual filesystem 与 PSOP artifact/object store。

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `path`
- `object_kind`
- `artifact_object_id`
- `content_hash`
- `size_bytes`
- `metadata_json`
- `artifact_refs_json`
- `evidence_refs_json`
- `created_at`

### `agent_memory_candidate`

agent 建议写入的 memory，不直接成为正式 memory。

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `memory_scope`
- `candidate_payload_json`
- `evidence_refs_json`
- `status`: `candidate | approved | rejected | promoted`
- `created_at`

### `agent_output_validation`

关键字段：

- `id`
- `agent_invocation_id`
- `agent_execution_id`
- `schema_name`
- `passed`
- `diagnostics_json`
- `validated_payload_json`
- `evidence_refs_json`
- `created_at`

---

## 可观测、审计与优化提案闭环要求

Agent Harness v2 需要把智能体执行过程变成结构化证据，而不只是保存普通日志。后续 `pskill.evaluator`、`psop.governance`、测试反馈系统需要基于这些证据生成优化提案。

### 目标优化对象

Agent Harness v2 需要为以下优化对象预留证据与 artifact 引用能力：

- PSkill 迭代
- Execution Graph / EG 迭代
- 测试用例迭代
- 基于测试用例反馈同时优化 PSkill 与 EG
- 基于运行结果、Replay、测试失败、工具调用、模型调用和人工反馈生成治理提案

### 需要支撑的闭环

```text
AgentInvocation / AgentExecution
  -> AgentTraceEvent / ModelInvocation / ToolInvocation / WorkspaceObject
  -> OutputValidation / TestResult / ReplayEvidence
  -> Evaluation / Attribution / Findings
  -> ImprovementProposal
  -> PSkill patch / EG patch / TestCase patch
  -> Re-run tests / dry-run / canary
  -> Compare before/after metrics
  -> Accept / reject / rollback
```

### Evidence refs 要求

`agent_trace_event.evidence_refs_json` 以及相关模型中的 `evidence_refs_json` 需要能够引用：

- agent invocation / execution
- model invocation
- tool invocation
- workspace object
- output validation
- runtime replay / run / run_trace / run_event
- pskill version / compile artifact
- test case / test run / test assertion
- evaluation / finding / governance proposal

### Artifact refs 要求

`agent_workspace_object.artifact_refs_json` 与 `tool_invocation.artifact_refs_json` 需要支持后续指向：

- PSkill draft / patch
- EG compile artifact / patch
- test case files
- diagnostics report
- evaluation report
- governance proposal draft

### Attribution-ready trace 要求

`agent_trace_event`、`model_invocation`、`tool_invocation` 需要保留足够字段用于质量归因：

- which agent
- which input/context
- which model/provider/route
- which tool and side-effect policy
- which output schema
- which validation diagnostics
- which test failed or passed
- which runtime evidence was used

### Proposal-ready output 要求

Output validation 和 tracing 需要支持后续智能体生成结构化优化提案，例如：

```text
ImprovementProposal
  target_kind: pskill | execution_graph | test_case | test_suite | agent_profile | tool_policy
  target_ref
  problem_statement
  evidence_refs
  proposed_changes
  risk_assessment
  required_tests
  activation_plan
  rollback_plan
```

### 测试反馈优化要求

Phase 1 不实现完整测试反馈闭环，但数据结构必须支持后续流程：

```text
Test failure
  -> evaluator attribution
  -> finding
  -> governance proposal
  -> PSkill / EG / TestCase patch
  -> re-test
  -> before/after metrics comparison
  -> approve / reject / rollback
```

---

## Core flow

```text
PSOP business module
  -> agent_harness.client.AgentHarnessClient.invoke(agent_key, input, context)
  -> agents.factory builds deepagents-backed agent from config/profile
  -> runtime.runs creates agent_invocation + agent_execution
  -> runtime.stream_bridge records stream events
  -> tools.psop_gateway handles every tool call
  -> uploads.artifact_bridge maps workspace files to PSOP artifacts
  -> guardrails.output validates output schema
  -> persistence.repository stores trace/model/tool/workspace/memory records
  -> returns typed PSOP agent result
```

业务模块不直接调用 deepagents，也不依赖 LangGraph thread/checkpoint 细节。

---

## Tool Gateway 要求

所有 deepagents tool call 必须进入 PSOP Tool Gateway：

```text
deepagents tool call
  -> agent_harness.tools.psop_gateway
  -> PSOP policy check
  -> optional authorization
  -> actual tool executor
  -> result returned to deepagents
  -> tool_invocation persisted
```

Phase 1 只需要实现最小闭环：

- read / compute / low_write 三类工具
- high_write / external_action / physical_action 可以先返回 `authorization_required` 或 `blocked_not_implemented`
- 不要求完整工具授权 UI

---

## Workspace / Uploads 要求

deepagents virtual filesystem 不应直接写宿主机任意路径。

Phase 1 支持：

```text
/workspace/{agent_invocation_id}/**  read/write
/materials/**                        read-only, optional
/artifacts/**                        read-only or mapped object access, optional
```

所有写入对象需要通过 `uploads/artifact_bridge.py` 映射为 `agent_workspace_object` 或 PSOP artifact object。

---

## Output validation 要求

每个 Agent Manifest Version 必须声明输出 schema。

Phase 1 至少支持：

- JSON schema validation
- validation diagnostics persistence
- output validation event
- failed validation should mark invocation failed or return structured validation error
- validation diagnostics 可以被后续 evaluation / governance 作为 evidence ref 引用

---

## Demo agent

实现一个最小 demo agent，建议：

```text
agent_key: psop.demo.deepagent
profile: demo
input: { "task": string }
output: { "summary": string, "steps": string[] }
```

验收目标：

- 能通过新的 `agent_harness.client.AgentHarnessClient` 或 API 创建 invocation
- 能由 deepagents-backed runtime 执行
- 能记录 `agent_invocation`
- 能记录 `agent_execution`
- 能记录 `agent_trace_event`
- 能记录至少一次 `model_invocation`
- 如果调用工具，能记录 `tool_invocation`
- 能完成 output validation
- demo agent 至少生成一个可引用的 output validation record

---

## API 建议

Phase 1 可以提供最小 API：

```text
POST /api/v1/agent-harness/invocations
GET  /api/v1/agent-harness/invocations/{id}
GET  /api/v1/agent-harness/invocations/{id}/events
GET  /api/v1/agent-harness/invocations/{id}/executions
GET  /api/v1/agent-harness/invocations/{id}/model-invocations
GET  /api/v1/agent-harness/invocations/{id}/tool-invocations
```

---

## Acceptance Criteria

- [ ] 删除或重写旧 `backend/app/agent_harness` 代码实现，不继续依赖旧 harness 执行语义。
- [ ] 不新增 `backend/app/agent_platform_v2` 目录；所有新实现放在 `backend/app/agent_harness` 下。
- [ ] 新 `agent_harness` 的模块结构以 DeerFlow harness 顶层组织为蓝本：`agents/config/guardrails/mcp/models/persistence/reflection/runtime/sandbox/skills/subagents/tools/tracing/uploads/utils/client.py`。
- [ ] 实现文档说明 DeerFlow 参考点、PSOP 映射点、以及不复制 DeerFlow 业务语义的边界。
- [ ] 新增 Agent Harness v2 数据模型与 repository / service / schema 基础实现。
- [ ] 新增 Alembic migration，创建 Phase 1 所需表。
- [ ] 实现 `client.py` 作为业务侧唯一入口，不让业务模块直接依赖 deepagents。
- [ ] 实现 `agents/factory.py`，可根据 config/profile 构建 deepagents-backed agent。
- [ ] 实现 `runtime/runs.py`、`runtime/events.py`、`runtime/stream_bridge.py` 的最小闭环。
- [ ] 实现 `tools/psop_gateway.py`，deepagents 工具调用必须通过 PSOP gateway。
- [ ] 实现 `uploads/artifact_bridge.py` 或 `sandbox/workspace.py`，避免 deepagents 直接写任意宿主机路径。
- [ ] 实现 `guardrails/output.py`，并持久化 output validation result。
- [ ] `agent_trace_event`、`model_invocation`、`tool_invocation`、`agent_workspace_object` 均支持 evidence / artifact refs。
- [ ] Output validation diagnostics 可以作为后续 evaluation / governance 的证据引用。
- [ ] 实现最小 demo agent manifest/profile。
- [ ] 实现 invocation API。
- [ ] demo agent invocation 可以成功执行并产生结构化输出。
- [ ] demo agent 至少生成一个可引用的 output validation record。
- [ ] 至少落库一条 `agent_trace_event`。
- [ ] 至少落库一条 `model_invocation`，或在 mock model 模式下落库等价调用摘要。
- [ ] 工具调用路径可测试；如 demo 不调用工具，需要单独添加 tool invocation 单测。
- [ ] 单元测试覆盖：config/profile loading、agent factory、runtime invocation、stream event persistence、output validation、tool gateway。
- [ ] 文档说明：新的 `agent_harness` 不替代 Runtime Kernel / Execution Graph。
- [ ] 文档说明 Agent Harness v2 如何支撑 PSkill / EG / TestCase 的优化闭环。
- [ ] 文档明确：Phase 1 只预留观测、审计和证据模型，不要求完整实现治理智能体。

---

## Non-goals / 明确不做

- 不迁移 `pskill.builder`、`pskill.runner` 等正式智能体。
- 不改 RuntimeService 的 EG 执行语义。
- 不改 Session Token / RunEvent / RunTrace。
- 不实现完整 HITL authorization UI。
- 不实现完整 Replay UI 集成。
- 不实现完整治理智能体。
- 不实现完整测试反馈闭环。
- 不新增 `agent_platform_v2` 目录。

---

## 后续 Phase 参考

Phase 2：基于新的 `agent_harness` 重做 `pskill.builder`，支持 PSkill draft / patch 与 evidence refs。

Phase 3：重做 `pskill.evaluator` 与 `psop.governance`，基于 Replay、测试结果和 agent trace 做质量归因与优化提案。

Phase 4：重做 `pskill.compiler` repair 与 `pskill.tester`，支持 EG patch、测试用例生成、测试失败归因。

Phase 5：重做受限 profile 的 `pskill.runner`，对接 Runtime / Execution Graph 节点执行。

Phase 6：建立测试反馈闭环：test failure -> evaluator finding -> governance proposal -> PSkill / EG / TestCase patch -> re-test -> before/after metrics comparison。

Agent Harness v2 Phase 1: 参考 DeerFlow 结构基于 deepagents 重建 agent_harness #12

Description

背景

关键决策

设计原则

DeerFlow 参考点与 PSOP 映射

DeerFlow harness 可参考点

PSOP 映射原则

建议目录结构

必须避免的结构

核心数据模型

agent_manifest

agent_manifest_version

agent_invocation

agent_execution

agent_trace_event

model_invocation

tool_invocation

agent_workspace_object

agent_memory_candidate

agent_output_validation

可观测、审计与优化提案闭环要求

目标优化对象

需要支撑的闭环

Evidence refs 要求

Artifact refs 要求

Attribution-ready trace 要求

Proposal-ready output 要求

测试反馈优化要求

Core flow

Tool Gateway 要求

Workspace / Uploads 要求

Output validation 要求

Demo agent

API 建议

Acceptance Criteria

Non-goals / 明确不做

后续 Phase 参考

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`agent_manifest`

`agent_manifest_version`

`agent_invocation`

`agent_execution`

`agent_trace_event`

`model_invocation`

`tool_invocation`

`agent_workspace_object`

`agent_memory_candidate`

`agent_output_validation`