Skip to content

feat: fetch and store repo license via licensee IN-1105#4095

Open
gaspergrom wants to merge 7 commits intomainfrom
feat/IN-1105-fetch-store-repo-license-via-licensee
Open

feat: fetch and store repo license via licensee IN-1105#4095
gaspergrom wants to merge 7 commits intomainfrom
feat/IN-1105-fetch-store-repo-license-via-licensee

Conversation

@gaspergrom
Copy link
Copy Markdown
Contributor

@gaspergrom gaspergrom commented May 8, 2026

Summary

  • Adds license column (VARCHAR(255)) to public.repositories via a new migration
  • Installs the licensee Ruby gem (v9.15.3, the last version compatible with Ruby 2.7 on Debian Bullseye) in the git integration Docker image, along with libgit2 build and runtime deps required by the rugged gem
  • Implements LicenseService that runs licensee detect --json <repo_path> and extracts the SPDX identifier from the JSON output
  • Wires the service into the repository worker's first-batch hook, alongside the existing software-value and vulnerability-scanner calls
  • Persists the detected SPDX ID (e.g. MIT, Apache-2.0, BSD-3-Clause) to public.repositories.license via a new update_repository_license CRUD helper

Changes

  • backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql — add license column
  • backend/src/database/migrations/U1778154987__addLicenseToRepositories.sql — undo migration
  • scripts/services/docker/Dockerfile.git_integration — install licensee v9.15.3 + libgit2 deps
  • services/apps/git_integration/src/crowdgit/services/license/license_service.py — new service
  • services/apps/git_integration/src/crowdgit/services/license/__init__.py — module init
  • services/apps/git_integration/src/crowdgit/services/__init__.py — export LicenseService
  • services/apps/git_integration/src/crowdgit/worker/repository_worker.py — wire service
  • services/apps/git_integration/src/crowdgit/database/crud.py — add update_repository_license

Note

Medium Risk
Adds a new DB column and wires an external licensee binary into the git integration processing path, which could affect worker runtime behavior/image size and introduce new failure/performance modes during first-batch processing.

Overview
Adds repository license persistence by introducing a nullable public.repositories.license column (with forward/undo migrations) and plumbing it through the data-access layer queries.

Extends the git-integration worker to install and run the Ruby licensee gem inside its Docker image, adds a new LicenseService that extracts an SPDX identifier from licensee detect --json, and updates the repository worker to detect/update the license on the first clone batch via a new update_repository_license CRUD helper.

Separately removes redundant try/catch wrappers in members enrichment activity helpers without changing behavior.

Reviewed by Cursor Bugbot for commit e51b77c. Bugbot is set up for automated code reviews on this repo. Configure here.

@gaspergrom gaspergrom self-assigned this May 8, 2026
@gaspergrom gaspergrom requested review from Copilot and themarolt May 8, 2026 09:11
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

themarolt
themarolt previously approved these changes May 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds repository license detection to the git integration pipeline and persists the detected SPDX identifier into the main public.repositories table, enabling downstream consumers to query repository license metadata.

Changes:

  • Adds a license column to public.repositories (with rollback migration).
  • Extends the git integration Docker image to install the licensee gem and its libgit2 build/runtime dependencies.
  • Introduces LicenseService (invokes licensee detect --json) and wires it into the repository worker’s first-batch processing, persisting results via a new CRUD helper.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql Adds license column to public.repositories.
backend/src/database/migrations/U1778154987__addLicenseToRepositories.sql Drops license column on rollback.
scripts/services/docker/Dockerfile.git_integration Installs Ruby + licensee and required libgit2/toolchain deps in the git integration image.
services/apps/git_integration/src/crowdgit/services/license/license_service.py New async service to execute licensee and parse SPDX from JSON output.
services/apps/git_integration/src/crowdgit/services/license/init.py Exports LicenseService from the license service module.
services/apps/git_integration/src/crowdgit/services/init.py Re-exports LicenseService at the services package level.
services/apps/git_integration/src/crowdgit/worker/repository_worker.py Runs license detection on first clone batch and writes the result to DB.
services/apps/git_integration/src/crowdgit/database/crud.py Adds update_repository_license helper to persist SPDX ID.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/git_integration/src/crowdgit/services/license/license_service.py Outdated
Comment thread services/apps/git_integration/src/crowdgit/database/crud.py Outdated
Comment thread services/libs/data-access-layer/src/repositories/index.ts
gaspergrom added 3 commits May 8, 2026 10:37
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
…N-1105

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
@gaspergrom gaspergrom force-pushed the feat/IN-1105-fetch-store-repo-license-via-licensee branch from b02ba60 to 58d4968 Compare May 8, 2026 09:37
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings May 8, 2026 09:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment thread services/apps/git_integration/src/crowdgit/database/crud.py Outdated
Comment thread services/libs/data-access-layer/src/repositories/index.ts
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Comment thread services/apps/git_integration/src/crowdgit/database/crud.py
@gaspergrom gaspergrom requested a review from themarolt May 8, 2026 10:18
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings May 8, 2026 10:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e51b77c. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants