Skip to content

nes-datagen: capture cross-file next edits (xtab-cross-file sample task)#323168

Open
g1910 wants to merge 2 commits into
microsoft:mainfrom
g1910:gamit/xtab-cross-file-datagen
Open

nes-datagen: capture cross-file next edits (xtab-cross-file sample task)#323168
g1910 wants to merge 2 commits into
microsoft:mainfrom
g1910:gamit/xtab-cross-file-datagen

Conversation

@g1910

@g1910 g1910 commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Extends the nes-datagen simulation pipeline to produce cross-file next-edit
training samples via a new xtab-cross-file sample task. Until now the pipeline only
labeled the next edit inside the active file; this captures edits that span multiple files.

For each recording the pipeline now:

  • Buckets post-request edits per document and composes per-file replacements.
  • Resolves each target file's request-time content.
  • Emits a multi-file CustomDiffPatch response (one filename:line patch block per touched file).
  • Records targetFiles / targetFilePaths / isCrossFile in the sample metadata,
    ordered to match the response blocks.

A new --patch-order option (first-touch (default) | anchor-first) controls per-file
block ordering.

What changed

Pipeline:

  • test/base/simulationOptions.tsxtab-cross-file task + --patch-order option, with validation and help text.
  • test/pipeline/alternativeAction/processor.ts, types.ts — bucket multi-file edits into per-file fileEdits[].
  • test/pipeline/replayRecording.ts — expose each target's request-time content (targetFileEdits).
  • test/pipeline/responseStep.ts — multi-file CustomDiffPatch generation (anchor-only fallback when not cross-file).
  • test/pipeline/pipeline.ts — task dispatch + orderTargetFiles (first-touch / anchor-first).
  • test/pipeline/output.ts — discriminated-union SampleClassification arm carrying cross-file metadata.

Tests:

  • test/pipeline/test/xtabCrossFilePipeline.e2e.spec.ts + test/pipeline/test/fixtures/xtabCrossFileFixtureData.ts
    — synthetic, oracle-based e2e covering the multi-file label, --patch-order ordering,
    cross-file metadata, and the xtab single-file fallback. No model/network calls.

How to test

From extensions/copilot/:

npx vitest run test/pipeline --pool=forks

All pipeline suites pass (87 passed, 1 skipped), including the new xtab-cross-file cases.

Generate cross-file samples from a recording:

npm run simulate -- --config-file=config.json nes-datagen --input=data.jsonl --sample-task=xtab-cross-file --patch-order=anchor-first

(--config-file must include ...xtabProvider.modelConfiguration; output defaults to <input>_output.jsonl.)

Notes

  • Backward compatible: default sample task remains xtab; existing xtab / cursor-* tasks are unchanged.
  • No recordings/telemetry or model config are included in this PR.

Copilot AI review requested due to automatic review settings June 26, 2026 17:05
g1910 added 2 commits June 26, 2026 11:11
Previously the pipeline only labeled same-file next edits. This adds support for cross-file targets: bucket post-request edits per document, compose per-file replacements, resolve each target file's request-time content, and generate multi-file CustomDiffPatch responses. Adds an optional --patch-order (first-touch|anchor-first) option.
…ta by --patch-order

Synthetic, oracle-based e2e (xtabCrossFilePipeline.e2e.spec.ts + xtabCrossFileFixtureData.ts) covering the multi-file CustomDiffPatch label, --patch-order block ordering (first-touch vs anchor-first), targetFiles/targetFilePaths/isCrossFile metadata, and the xtab single-file fallback. Also makes buildXtabCrossFileClassification take the patch-ordered target list so metadata matches the label block order.
@g1910 g1910 force-pushed the gamit/xtab-cross-file-datagen branch from a68c2f4 to aa63015 Compare June 26, 2026 17:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the nes-datagen training-data simulation pipeline in extensions/copilot/test/pipeline/ to support a new xtab-cross-file sample task that can label the “next edit” across multiple files, including ordering control via --patch-order, and emits corresponding per-file metadata.

Changes:

  • Adds xtab-cross-file task selection + --patch-order option to simulation CLI options.
  • Buckets post-request edits across files, reconstructs request-time content per file, and generates multi-file CustomDiffPatch assistant labels.
  • Adds an end-to-end test and synthetic fixture data to validate multi-file labeling, ordering policies, and the single-file xtab fallback.
Show a summary per file
File Description
extensions/copilot/test/base/simulationOptions.ts Adds xtab-cross-file task and --patch-order option parsing/help text.
extensions/copilot/test/pipeline/alternativeAction/types.ts Extends NextUserEdit to include per-file composed edits (fileEdits).
extensions/copilot/test/pipeline/alternativeAction/processor.ts Buckets post-request edits by file, composes per-file replacements, preserves first-touch order.
extensions/copilot/test/pipeline/replayRecording.ts Captures targetFileEdits with request-time content per touched file for label generation.
extensions/copilot/test/pipeline/responseStep.ts Introduces multi-file response generation for CustomDiffPatch labels.
extensions/copilot/test/pipeline/pipeline.ts Dispatches xtab-cross-file, orders target files (first-touch/anchor-first), wires response + metadata classification.
extensions/copilot/test/pipeline/output.ts Adds discriminated-union classification payload for cross-file metadata.
extensions/copilot/test/pipeline/test/xtabCrossFilePipeline.e2e.spec.ts New e2e test validating multi-file label ordering + metadata + xtab fallback.
extensions/copilot/test/pipeline/test/fixtures/xtabCrossFileFixtureData.ts Synthetic fixture recordings for cross-file and anchor-only scenarios.

Review details

  • Files reviewed: 9/9 changed files
  • Comments generated: 2
  • Review effort level: Low

Comment on lines +433 to +438
for (const file of nonEmptyFiles) {
const result = generateResponse(responseFormat, file.edit, file.docContent, file.filePath, input.userPrompt, log);
if (!('error' in result) && result.assistant) {
blocks.push(result.assistant);
}
}
Comment on lines +186 to 190
const targets = crossFile
? orderTargetFiles(p.targetFileEdits, p.activeFilePath, patchOrder)
: p.targetFileEdits.filter(f => f.relativePath === p.activeFilePath);
const files = targets.map(f => ({ filePath: f.relativePath, docContent: f.docContent, edit: f.edit }));
responseInputs.push({
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants