Skip to content

Sub-Issue 3: Add Source Metadata to Retrieved Documents #36

Description

@aleenaharoldpeter

Description

As part of the IPCC integration, retrieved content should include metadata indicating its source. This will allow us to verify whether chatbot responses are using information from CABook (Climate Academy), IPCC reports, or both.

Since IPCC is intended to function as a fallback knowledge source, source attribution is important for validating that fallback retrieval is working correctly.

This metadata should be attached when documents are ingested and preserved throughout chunking, indexing, and retrieval.

Source information should also be accessible during testing through localhost:8800/docs (e.g., via POST /ask) to simplify debugging before any frontend changes are made.

Tasks

  • Add source metadata during document ingestion.
  • CABook documents → climate_academy
  • IPCC documents → ipcc
  • Ensure source metadata is preserved during chunking.
  • Store source metadata in the vector database/index.
  • Ensure source metadata is returned with retrieved chunks.
  • Expose source information in API responses or debug output for testing.

Example

Retrieved CABook chunk

{
  "content": "...",
  "metadata": {
    "source": "climate_academy"
  }
}

Retrieved IPCC chunk

{
  "content": "...",
  "metadata": {
    "source": "ipcc"
  }
}

Testing

Using localhost:8800/docs:

Call POST /ask.

  • Submit a query expected to use CABook content.
  • Verify that climate_academy appears in the returned source metadata.
  • Submit a query expected to require IPCC fallback.
  • Verify that ipcc appears in the returned source metadata.
  • Confirm that source information remains available when both sources contribute to a response.

Acceptance Criteria

  • Retrieved chunks contain source metadata.
  • Source metadata survives ingestion, indexing, and retrieval.
  • Source information can be inspected through localhost:8800/docs.
  • Developers can determine whether a response used climate_academy, ipcc, or both.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions