Skip to content

ETT-1459: Record first ingest date#185

Open
aelkiss wants to merge 4 commits into
mainfrom
ETT-1459-first-ingest-date
Open

ETT-1459: Record first ingest date#185
aelkiss wants to merge 4 commits into
mainfrom
ETT-1459-first-ingest-date

Conversation

@aelkiss

@aelkiss aelkiss commented Jun 4, 2026

Copy link
Copy Markdown
Member

This change moves the functionality for recording items in feed_audit to the end of the Collate stage rather than a particular storage.

It also takes the opportunity to:

  • clean up some confusing configuration regarding link_dir and obj_dir
  • improve testing functionality around temporary directories
  • remove LinkedPairtree entirely (since we no longer deposit using symlinked anything)
  • log how long each collate operation takes (ETT-824); add additional fields to info & warn level logging in collate

See comments in more detail on each commit.

I had looked into options for rolling back failed deposits to S3, but the ideas I had didn't work out (see ETT-1483).

aelkiss added 3 commits June 4, 2026 15:25
* add first ingest date column to feed_audit table
* record item in feed_audit at the end of collate
* remove record_audit functionality from LocalPairtree (now unused except in
  development); emit warning (could record to feed_storage for
  consistency if we want instead, but we aren't really using it..?)
* testing with storage classes in collate is a bit messy because of the
  distinction between depositing to the repo and reading back from the
  repo
* add additional logging options in Stage (need to DRY out though)
* additional logging for collate (should log duration; see ETT-824)
* add some notes towards ETT-1687
* Mock depositing item for collate tests with mocked storage
This addresses two issues:
* We are no longer using symlinks to deposit material into the repository.
* When we read from the repo, we just care about the root of the repo
  (just like e.g. babel apps reading from the repo), not about any
  symlinks, etc.

Specific changes:
* remove LinkedPairtree
* remove "repository" key in config & references to link_dir / obj_dir;
  replace with a "repository_root" key
* TempDirs keeps track of what it creates; callers can create additional
  temp dirs that will get cleaned up at the end of a test.
@aelkiss aelkiss requested a review from moseshll June 4, 2026 20:09
Comment thread etc/ingest.sql
`id` varchar(30) NOT NULL,
`sdr_partition` tinyint(4) DEFAULT NULL,
`zip_size` bigint(20) DEFAULT NULL,
`first_ingest_date` datetime NULL DEFAULT CURRENT_TIMESTAMP,

@aelkiss aelkiss Jun 4, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change for recording first ingest date; the application side doesn't need to handle it directly at all beyond making sure that something is recorded in feed_audit

Comment thread lib/HTFeed/Stage.pm Outdated
Comment thread lib/HTFeed/Storage.pm
my $self = shift;

return $self->{volume}->get_zip_path(get_config('staging', 'zipfile')) . '.gpg';
return $self->{volume}->get_zip_path(get_config('staging', 'zipfile')) . "-$self->{name}.gpg";

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids collisions with encrypted zips left over from other storages. They should get cleaned up but don't always in practice.

Comment thread lib/HTFeed/Stage/Collate.pm Outdated

@moseshll moseshll left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline suggestion re logging.

I'm pleased with all the cleanup happening here.

There still appears to be a brittle test but that's something that's happened sporadically to me for a long time, probably out of scope for this. I tend to suspect a race condition because it's a pretty simple test.

#   Failed test 'HTFeed::Storage::ObjectStore with encryption enabled stores the mets and encrypted zip'
#   at t/storage_object_store.t line 194.

@aelkiss

aelkiss commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

re: brittle test I haven't seen it locally or in github. I can keep an eye out for it though.

@aelkiss aelkiss force-pushed the ETT-1459-first-ingest-date branch from 8b17af6 to a7bd638 Compare June 11, 2026 18:56
@aelkiss

aelkiss commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

I will wait to merge & deploy; I'd like to get the schema updated in production and see about populating info using the existing audit stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants