Skip to content

12036 remove minio as s3 storage option#91

Open
srmanda-cs wants to merge 20 commits into
developfrom
12036-remove-minio-as-s3-storage-option
Open

12036 remove minio as s3 storage option#91
srmanda-cs wants to merge 20 commits into
developfrom
12036-remove-minio-as-s3-storage-option

Conversation

@srmanda-cs
Copy link
Copy Markdown
Member

What this PR does / why we need it:

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@srmanda-cs srmanda-cs self-assigned this May 26, 2026
Copilot AI review requested due to automatic review settings May 26, 2026 20:58
@srmanda-cs srmanda-cs requested a review from pdurbin as a code owner May 26, 2026 20:58
srmanda-cs added 17 commits May 26, 2026 17:01
…load

Switch testNonDirectUpload from localstack1 (upload-redirect=true,
download-redirect=true) to the new localstack_noredirect driver
(both redirects disabled), so the test genuinely exercises the
non-redirect proxy-through-Dataverse code path.

Also replace the plain downloadFile call with downloadFileNoRedirect
and assert statusCode(200). This makes the assertion self-documenting:
a 303 response would now cause an explicit test failure instead of
being silently followed by RestAssured.
…alstack_noredirect

Using the same bucket name as localstack1 would cause a collision in
the test environment when tasks/localstack_create_bucket.yml runs
aws s3 mb on each bucket entry. Use mybucket-noredirect to avoid
this. Update driver configs in both docker-compose files and switch
S3AccessIT.testNonDirectUpload to use the new BUCKET_NAME_NOREDIRECT
constant.
Updated comment for clarity regarding S3 tags implementation.
@srmanda-cs srmanda-cs force-pushed the 12036-remove-minio-as-s3-storage-option branch from 0e9f779 to 420101b Compare May 26, 2026 21:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes MinIO as a supported/dev S3-compatible storage option and standardizes S3 testing/dev setup around LocalStack, including an explicit “no redirect” LocalStack driver to exercise Dataverse’s proxy download path.

Changes:

  • Updated S3 integration tests to use localstack_noredirect (no download redirect) instead of MinIO for non-direct upload/download coverage.
  • Removed MinIO from dev docker-compose configurations and associated dev-start script setup.
  • Updated installation docs to remove MinIO-specific guidance and references.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/test/java/edu/harvard/iq/dataverse/api/S3AccessIT.java Switches non-direct upload test coverage from MinIO to LocalStack no-redirect driver; adjusts bucket usage and download behavior.
src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java Removes MinIO-specific mention from object-tagging warning comment.
scripts/dev/dev-start-frd.sh Removes creation of MinIO dev volume directory.
docker-compose-dev.yml Removes MinIO service/config; adds localstack_noredirect S3 driver config.
conf/keycloak/docker-compose-dev.yml Same as above for Keycloak dev stack.
doc/sphinx-guides/source/installation/config.rst Removes MinIO mentions; updates S3 endpoint example.
doc/sphinx-guides/source/installation/big-data-support.rst Removes MinIO CORS tooling references and MinIO tagging note.
Comments suppressed due to low confidence (1)

doc/sphinx-guides/source/installation/big-data-support.rst:112

  • After removing the Minio-specific docs, this section still instructs users to run mc cors set ... (Minio client) and includes cors.xml under the AWS CLI tab. Either restore a separate Minio/"mc" tab, replace this with an AWS CLI equivalent, or remove the mc instructions to keep the guidance consistent.
        :language: xml

    Proceed with making the changes:

    :code:`mc cors set <STORE_NAME>/<BUCKET_NAME> ./cors.xml`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 56 to 58
static final String BUCKET_NAME = "mybucket";
static final String BUCKET_NAME_NOREDIRECT = "mybucket-noredirect";
static S3Client s3localstack = null;
Comment thread docker-compose-dev.yml
-Ddataverse.files.localstack_noredirect.label=LocalStackNoRedirect
-Ddataverse.files.localstack_noredirect.custom-endpoint-url=http://localstack:4566
-Ddataverse.files.localstack_noredirect.custom-endpoint-region=us-east-2
-Ddataverse.files.localstack_noredirect.bucket-name=mybucket-noredirect
-Ddataverse.files.localstack_noredirect.label=LocalStackNoRedirect
-Ddataverse.files.localstack_noredirect.custom-endpoint-url=http://localstack:4566
-Ddataverse.files.localstack_noredirect.custom-endpoint-region=us-east-2
-Ddataverse.files.localstack_noredirect.bucket-name=mybucket-noredirect
Comment thread doc/sphinx-guides/source/installation/config.rst
Comment thread doc/sphinx-guides/source/installation/big-data-support.rst
Comment thread src/test/java/edu/harvard/iq/dataverse/api/S3AccessIT.java
@coveralls
Copy link
Copy Markdown

coveralls commented May 26, 2026

Coverage Report for CI Build 98

Coverage remained the same at 24.95%

Details

  • Coverage remained the same as the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 1 coverage regression across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

1 previously-covered line in 1 file lost coverage.

File Lines Losing Coverage Coverage
src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java 1 25.29%

Coverage Stats

Coverage Status
Relevant Lines: 94695
Covered Lines: 23626
Line Coverage: 24.95%
Coverage Strength: 0.25 hits per line

💛 - Coveralls

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Test Results

396 tests  ±0   381 ✅ +1   31m 42s ⏱️ + 10m 16s
 53 suites ±0    15 💤 ±0 
 53 files   ±0     0 ❌  - 1 

Results for commit ca0b4cf. ± Comparison against base commit d5e2cda.

♻️ This comment has been updated with latest results.

@srmanda-cs
Copy link
Copy Markdown
Member Author

@copilot Please investigate this test. It passes just fine in the Jenkins run, fails only in containers:

1 expectation failed.
Expected status code <200> but was <400>.
java.lang.AssertionError:
1 expectation failed.
Expected status code <200> but was <400>.

at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:73)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:108)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:57)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277)
at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure.validate(ResponseSpecificationImpl.groovy:512)
at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure$validate$1.call(Unknown Source)
at io.restassured.internal.ResponseSpecificationImpl.validateResponseIfRequired(ResponseSpecificationImpl.groovy:696)
at io.restassured.internal.ResponseSpecificationImpl.this$2$validateResponseIfRequired(ResponseSpecificationImpl.groovy)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:198)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:62)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
at io.restassured.internal.ResponseSpecificationImpl.statusCode(ResponseSpecificationImpl.groovy:135)
at io.restassured.specification.ResponseSpecification$statusCode$0.callCurrent(Unknown Source)
at io.restassured.internal.ResponseSpecificationImpl.statusCode(ResponseSpecificationImpl.groovy:143)
at io.restassured.internal.ValidatableResponseOptionsImpl.statusCode(ValidatableResponseOptionsImpl.java:89)
at edu.harvard.iq.dataverse.api.S3AccessIT.testNonDirectUpload(S3AccessIT.java:158)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

{
"status": "OK",
"data": {
"LocalStack": "localstack1",
"LocalStackNoRedirect": "localstack_noredirect",
"Local": "local",
"Filesystem": "file1"
}
}
{
"status": "OK",
"data": {
"id": 805,
"alias": "dv2f88f58b",
"name": "dv2f88f58b",
"dataverseContacts": [
{
"displayOrder": 0,
"contactEmail": "10a46a76@mailinator.com"
}
],
"permissionRoot": true,
"dataverseType": "UNCATEGORIZED",
"isMetadataBlockRoot": false,
"isFacetRoot": false,
"ownerId": 1,
"creationDate": "2026-05-26T21:36:20Z",
"effectiveRequiresFilesToPublishDataset": false,
"isReleased": false
}
}
{
"status": "OK",
"data": {
"name": "undefined",
"directUpload": false,
"directDownload": false,
"uploadOutOfBand": false
}
}
{
"status": "OK",
"data": {
"message": "Storage set to: LocalStackNoRedirect/localstack_noredirect"
}
}
{
"status": "OK",
"data": {
"name": "localstack_noredirect",
"type": "s3",
"label": "LocalStackNoRedirect",
"directUpload": false,
"directDownload": false,
"uploadOutOfBand": false
}
}
{
"status": "OK",
"data": {
"id": 806,
"persistentId": "doi:10.5072/FK2/0S4XP2"
}
}
{
"status": "OK",
"data": {
"id": 806,
"identifier": "FK2/0S4XP2",
"persistentUrl": "https://doi.org/10.5072/FK2/0S4XP2",
"protocol": "doi",
"authority": "10.5072",
"separator": "/",
"publisher": "Root",
"storageIdentifier": "localstack_noredirect://10.5072/FK2/0S4XP2",
"datasetType": "dataset",
"locks": [

    ],
    "latestVersion": {
        "id": 333,
        "datasetId": 806,
        "datasetPersistentId": "doi:10.5072/FK2/0S4XP2",
        "datasetType": "dataset",
        "storageIdentifier": "localstack_noredirect://10.5072/FK2/0S4XP2",
        "internalVersionNumber": 1,
        "versionState": "DRAFT",
        "latestVersionPublishingState": "DRAFT",
        "lastUpdateTime": "2026-05-26T21:36:20Z",
        "createTime": "2026-05-26T21:36:20Z",
        "license": {
            "name": "CC0 1.0",
            "uri": "http://creativecommons.org/publicdomain/zero/1.0",
            "iconUri": "https://licensebuttons.net/p/zero/1.0/88x31.png",
            "rightsIdentifier": "CC0-1.0",
            "rightsIdentifierScheme": "SPDX",
            "schemeUri": "https://spdx.org/licenses/",
            "languageCode": "en"
        },
        "fileAccessRequest": true,
        "metadataBlocks": {
            "citation": {
                "displayName": "Citation Metadata",
                "name": "citation",
                "fields": [
                    {
                        "typeName": "title",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "Darwin's Finches"
                    },
                    {
                        "typeName": "author",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [
                            {
                                "authorName": {
                                    "typeName": "authorName",
                                    "multiple": false,
                                    "typeClass": "primitive",
                                    "value": "Finch, Fiona"
                                },
                                "authorAffiliation": {
                                    "typeName": "authorAffiliation",
                                    "multiple": false,
                                    "typeClass": "primitive",
                                    "value": "Birds Inc."
                                }
                            }
                        ]
                    },
                    {
                        "typeName": "datasetContact",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [
                            {
                                "datasetContactName": {
                                    "typeName": "datasetContactName",
                                    "multiple": false,
                                    "typeClass": "primitive",
                                    "value": "Finch, Fiona"
                                },
                                "datasetContactEmail": {
                                    "typeName": "datasetContactEmail",
                                    "multiple": false,
                                    "typeClass": "primitive",
                                    "value": "finch@mailinator.com"
                                }
                            }
                        ]
                    },
                    {
                        "typeName": "dsDescription",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [
                            {
                                "dsDescriptionValue": {
                                    "typeName": "dsDescriptionValue",
                                    "multiple": false,
                                    "typeClass": "primitive",
                                    "value": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
                                }
                            }
                        ]
                    },
                    {
                        "typeName": "subject",
                        "multiple": true,
                        "typeClass": "controlledVocabulary",
                        "value": [
                            "Medicine, Health and Life Sciences"
                        ]
                    }
                ]
            }
        },
        "files": [
            
        ]
    }
}

}
{
"status": "ERROR",
"message": "Failed to save the content of the uploaded file."
}
May 26, 2026 9:36:18 PM edu.harvard.iq.dataverse.api.UtilIT createRandomUser
INFO: Creating random test user user5cdedcaa
May 26, 2026 9:36:19 PM edu.harvard.iq.dataverse.api.UtilIT getApiTokenFromResponse
INFO: API token found in create user response: 40cf68b2-2666-461f-84f5-323dfb2a66e9
May 26, 2026 9:36:19 PM edu.harvard.iq.dataverse.api.UtilIT getUsernameFromResponse
INFO: Username found in create user response: user5cdedcaa
May 26, 2026 9:36:19 PM edu.harvard.iq.dataverse.api.UtilIT createRandomUser
INFO: Creating random test user usere246cad7
May 26, 2026 9:36:20 PM edu.harvard.iq.dataverse.api.UtilIT getUsernameFromResponse
INFO: Username found in create user response: usere246cad7
May 26, 2026 9:36:20 PM edu.harvard.iq.dataverse.api.UtilIT getApiTokenFromResponse
INFO: API token found in create user response: 6f286d3c-2d3f-4ed6-aa64-79d248d2787b
May 26, 2026 9:36:20 PM edu.harvard.iq.dataverse.api.UtilIT getAliasFromResponse
INFO: Alias found in create dataverse response: dv2f88f58b

Copy link
Copy Markdown

Copilot AI commented May 26, 2026

@copilot Please investigate this test. It passes just fine in the Jenkins run, fails only in containers:

1 expectation failed.
...

I investigated the container workflow failure (run 26474971605) and applied a fix. The issue was that localstack_noredirect uses mybucket-noredirect, but that bucket wasn’t guaranteed to exist in container startup/test setup. I updated LocalStack init to create it and updated S3AccessIT setup to ensure both S3 buckets exist (including LocalStack’s generic 404 response handling). Addressed in commits c564762 and ca0b4cf.

@srmanda-cs
Copy link
Copy Markdown
Member Author

Yepp, AI is almost certainly going to take my job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants