fix: strengthen Git repository URL validation (#22)#62
Conversation
1ebc191 to
3c3f023
Compare
|
根据 review 意见已更新本 PR,主要调整如下:
|
keting
left a comment
There was a problem hiding this comment.
【PRD / 用户文档需要更新】
现有 README 和 Quick Start 只说 “Git repository URL must be accessible from container”,例如 README.md:182、README.zh-CN.md:166、docs/
quickstart.md:92、docs/quickstart.zh-CN.md:90。
PR #62 改成了更明确的产品行为:只接受仓库根地址或 clone URL;拒绝 issues/pull/tree/blob 页面 URL、http/file/ext:: 等不安全协议、带凭据/query/
fragment 的 URL、本地/内网地址。用户文档应该补充这些规则和示例,否则用户仍会以为只要“容器能访问”就行。
另外,当前代码允许 git_repo_url 为空,并且 update 时允许清空;如果这是有意设计,文档也应该说明“仓库地址可暂不填写,但轮询/任务检测会受限”。如
果产品上要求创建项目必须填仓库地址,那 PR 里的实现还需要收紧。(这个应该不能为空?)
【 Tech Spec / 架构文档需要更新】
docs/architecture.md:186 的“关键行为约束”目前没有 Git URL 校验规则,只写了路径安全。建议新增一条“项目 Git 仓库地址校验”:
- 前后端都有校验,后端是最终防线。
- 支持 https://host/org/repo(.git)、ssh://user@host/org/repo.git、git@host:org/repo.git。
- 对 GitHub/GitLab/Bitbucket/Codeberg/Gitee 这类已知 Web Git host,允许不带 .git 的仓库根地址。
- 对未知 host,要求 repo path 以 .git 结尾。
- 拒绝页面 URL、query/fragment、凭据、私有/本地/metadata IP、危险协议和 leading dash。
- 这是格式和安全校验,不是仓库存在性校验。
最后一点很重要:PR 描述里写了“verifies repository accessibility with non-interactive git ls-remote”,但 PR 测试里明确有
test_create_project_accepts_valid_but_unverified_repo_url,也就是 合法格式但不存在/未验证的仓库仍可创建。所以文档不要写“创建时验证仓库存在或可
访问”;如果 PRD 真要求这一点,那代码和测试需要改。
此改动允许仓库地址为空,经测试发现轮询时只能被动等待超时。这一项应该收紧,要求创建项目/修改项目时仓库地址必须填写。 |
|
已根据 review 调整:
|
|
已根据两轮 review 意见继续收敛,主要修改如下:
|



Summary
Strengthen Git repository URL validation for project create/update flows.
This change is needed because invalid repository addresses, missing owner/repo paths, repository-internal page URLs, unsafe protocols, local/private network targets, query/fragment URLs, and embedded credentials could previously be saved as project repository URLs.
The change adds stricter Git remote URL validation on both backend and frontend. It accepts repository root URLs and clone URLs, including HTTPS,
ssh://, and SCP-style Git URLs. It rejects missing-protocol inputs, missing repository path information, repository-internal pages, unsafe protocols or prefixes, query/fragment values, embedded credentials or tokens, leading-dash inputs, invalid hostnames, localhost/private/link-local/metadata targets, non-standard private IP forms, and IPv4-mapped private addresses.For known Web Git hosts, validation now applies host-specific root path rules: GitHub, Gitee, Bitbucket, and Codeberg require a root
owner/repopath when using a non-.gitURL, while GitLab subgroup repository roots remain supported.Project
git_repo_urlis now required for both create and update flows. Users cannot create a project without a repository URL or clear the repository URL on edit.This PR validates URL shape and safety only. It does not perform remote repository reachability checks during project create/update, does not limit HALF to GitHub-only repositories, and does not automatically clean existing historical invalid repository URLs. If a repository is validly shaped but does not exist or cannot be accessed, the later polling/git clone step shows a user-facing access error and HALF retries automatically.
User-facing and architecture documentation were updated to describe the accepted URL forms, rejected URL forms, required repository URL behavior, private repository credential guidance, host-specific root path rules, and the fact that save-time validation does not verify repository existence.
Closes #22
Testing
python src\backend\tests\test_git_service.pypython src\backend\tests\test_project_agent_availability.pypython src\backend\tests\test_polling_service.pypython -m unittest discover -s src\backend\testscd src/frontend && npm test -- ProjectNewPage.test.tscd src/frontend && npm testcd src/frontend && npm run buildChecklist