A Docker Compose stack for running enough of the EODH backend infrastructure locally to test data download and processing workflows without needing access to a Kubernetes cluster.
The stack currently provides:
- RustFS as an S3-compatible object store
- Apache Pulsar in standalone mode
- An S3 init container that creates buckets and uploads seed data
- An optional harvest-transformer container wired to the local S3 and Pulsar services
This is intended for local development and test harnesses. It is not intended to be a production deployment.
| Service | Purpose | Host URL / port |
|---|---|---|
s3 |
RustFS S3-compatible object store | S3 API: http://localhost:9000, console: http://localhost:9001 |
s3-init |
One-shot AWS CLI container that creates and seeds buckets | No exposed ports |
pulsar |
Apache Pulsar standalone broker | Binary protocol: pulsar://localhost:6650, admin API: http://localhost:8080 |
harvest-transformer |
Local harvest-transformer runner configured against this stack | No exposed ports |
Inside Docker Compose, containers should use service names rather than localhost:
S3 endpoint: http://s3:9000
Pulsar service: pulsar://pulsar:6650
Pulsar admin API: http://pulsar:8080
From the host machine, use localhost:
S3 endpoint: http://localhost:9000
Pulsar service: pulsar://localhost:6650
Pulsar admin API: http://localhost:8080
- Docker
- Docker Compose
- AWS CLI, if you want to inspect the local S3 service from the host
Copy the example environment file:
cp .env.example .envThe main settings are:
S3_DATA_DIR=./data/s3
S3_LOG_DIR=./data/logs
S3_ACCESS_KEY=dev_access_key
S3_SECRET_KEY=dev_secret_key_123
S3_BUCKET=transformed
OUTPUT_ROOT=http://localhost/If S3_DATA_DIR or S3_LOG_DIR are not set, Compose defaults to:
./data/s3
./data/logs
Create those directories before starting the stack:
mkdir -p ${S3_DATA_DIR:-./data/s3} ${S3_LOG_DIR:-./data/logs}RustFS must be able to write to these directories. The RustFS container commonly runs as UID 10001, so on Linux you may need:
sudo chown -R 10001:10001 ${S3_DATA_DIR:-./data/s3} ${S3_LOG_DIR:-./data/logs}Set to true to bypass RustFS disk topology checks. Useful on systems where RustFS
refuses to start due to disk layout constraints. Defaults to false.
docker compose up -dCheck service status:
docker compose psView logs:
docker compose logs -f s3
docker compose logs -f pulsar
docker compose logs -f s3-initStop the stack:
docker compose downTo remove Docker-managed Pulsar data as well:
docker compose down -vRustFS exposes:
- S3 API:
http://localhost:9000 - Console:
http://localhost:9001
Log in to the console using:
Username: <S3_ACCESS_KEY>
Password: <S3_SECRET_KEY>
For the default .env.example values:
Username: dev_access_key
Password: dev_secret_key_123
Create a local AWS profile.
In ~/.aws/config:
[profile local_s3]
region = eu-west-2
output = json
s3 =
addressing_style = pathIn ~/.aws/credentials:
[local_s3]
aws_access_key_id = dev_access_key
aws_secret_access_key = dev_secret_key_123List buckets:
aws --profile local_s3 --endpoint-url http://localhost:9000 s3 lsUpload a file:
aws --profile local_s3 --endpoint-url http://localhost:9000 s3 cp ./example.txt s3://transformed/example.txtDownload a file:
aws --profile local_s3 --endpoint-url http://localhost:9000 s3 cp s3://transformed/example.txt ./example.txtThe s3-init service runs scripts/init.sh after RustFS becomes healthy.
The init script:
- Waits until the S3 endpoint responds
- Creates a minimal
spdx-public-eodhpbucket used byharvest-transformer - Treats each first-level directory in
seed/as an S3 bucket name - Uploads files from each seed directory into the matching bucket
- Ignores
.gitkeepfiles
Example:
seed/
└── transformed/
└── example.json
becomes:
s3://transformed/example.json
Nested directories are preserved as S3 key prefixes:
seed/transformed/foo/bar/example.json
becomes:
s3://transformed/foo/bar/example.json
To create an empty bucket that is committed to Git, create a directory under seed/ and add a .gitkeep file:
seed/my-empty-bucket/.gitkeep
The bucket will be created, but .gitkeep will not be uploaded.
You can re-run the S3 init step with:
docker compose up s3-initPulsar runs in standalone mode and exposes:
- Binary protocol:
pulsar://localhost:6650 - Admin REST API:
http://localhost:8080
Check that Pulsar is healthy:
curl http://localhost:8080/admin/v2/clustersExpected response:
["standalone"]List topics in the default namespace:
curl http://localhost:8080/admin/v2/persistent/public/defaultCreate a topic manually, if needed:
docker exec -it local-pulsar \
bin/pulsar-admin topics create persistent://public/default/transformedPulsar can also auto-create topics in this local standalone setup when a producer connects. In application code, use the Pulsar protocol URL, not HTTP:
import pulsar
client = pulsar.Client("pulsar://localhost:6650")
producer = client.create_producer("persistent://public/default/transformed")Using http://localhost:6650 is incorrect because port 6650 is the binary protocol port.
The harvest-transformer service is built from:
https://github.com/EO-DataHub/harvest-transformer.git#main
It is configured to use the local services:
PULSAR_URL=pulsar://pulsar:6650
AWS_ENDPOINT_URL_S3=http://s3:9000
S3_BUCKET=${S3_BUCKET}
S3_SPDX_BUCKET=spdx-public-eodhp
OUTPUT_ROOT=${OUTPUT_ROOT}
It starts only after:
s3-inithas completed successfullypulsaris healthy
local.env.example contains useful values for applications running on the host:
AWS_PROFILE=local_s3
S3_ENDPOINT=http://localhost:9000
S3_FORCE_PATH_STYLE=true
PULSAR_SERVICE_URL=pulsar://localhost:6650Applications running inside Compose should use Docker service names instead:
S3_ENDPOINT=http://s3:9000
PULSAR_SERVICE_URL=pulsar://pulsar:6650List S3 buckets:
aws --profile local_s3 --endpoint-url http://localhost:9000 s3 lsList Pulsar clusters:
curl http://localhost:8080/admin/v2/clustersList Pulsar topics:
curl http://localhost:8080/admin/v2/persistent/public/defaultCheck containers:
docker compose psThis usually means the AWS CLI is talking to real AWS instead of RustFS. Always include the local endpoint:
aws --profile local_s3 --endpoint-url http://localhost:9000 s3 lsAlso check that credentials are in ~/.aws/credentials, not only in ~/.aws/config.
Make sure the host directories exist and are writable by the container UID:
mkdir -p ${S3_DATA_DIR:-./data/s3} ${S3_LOG_DIR:-./data/logs}
sudo chown -R 10001:10001 ${S3_DATA_DIR:-./data/s3} ${S3_LOG_DIR:-./data/logs}Check that clients are using:
pulsar://localhost:6650
not:
http://localhost:6650
Port 6650 is Pulsar's binary protocol port. HTTP is available on port 8080 for the admin API.
Pulsar topics may not appear until a producer or consumer uses them. You can create the topic manually:
docker exec -it local-pulsar \
bin/pulsar-admin topics create persistent://public/default/transformedThis repository is deliberately small. It aims to provide just enough local infrastructure to exercise EODH backend services that normally run against Kubernetes-managed S3-compatible storage and Pulsar.