Add CDA-ETL module with initial implementation and basic functionality#1732
Add CDA-ETL module with initial implementation and basic functionality#1732RyanM-RMA wants to merge 2 commits into
Conversation
Includes the addition of the following: - Dockerfile and docker-compose for containerizing the service. - Gradle build configuration for the module. - Core ETL pipeline components: configuration, session management, location, project, and timeseries processing. - Environment variable management with `etl.env.example`. - Utility functions and cache handling. - Initial set of unit tests for basic validation (e.g., configuration handling).
| @@ -0,0 +1,14 @@ | |||
| services: | |||
There was a problem hiding this comment.
why separate from the root docker-compose.yml? I'd think the defaults should be sourced for CWBI Test and destination the CDA service container
| final def envFile = 'etl.env' | ||
| final def reqFile = 'requirements.txt' | ||
|
|
||
| tasks.register('installRequirements', Exec) { |
There was a problem hiding this comment.
Take a look at what Stephen did on: https://github.com/DOI-BOR/WTMP-Python-Plotting/blob/main/build.gradle using a gradle plugin to manage python.
There was a problem hiding this comment.
which we already do for Node JS. So makes sense to use a plugin for python as well.
| # SOFTWARE. | ||
| import cwms | ||
|
|
||
| class SessionManager: |
There was a problem hiding this comment.
take a look at python's contextmanager as I think it will simplify the session management. See my regi-python PR as an example:
- usage: https://github.com/USACE-WaterManagement/regi-python/pull/1/changes#diff-884cdfd74221e802652f52dace72c4e14e4c0d67ad1a57a2f8b02b4155362786R41
- context definition: https://github.com/USACE-WaterManagement/regi-python/pull/1/changes#diff-48673334bb966021f118fc5fd7b8632e14b648abc64e10d3554f38060c0a65e8R23
|
Might be some ideas in here we can use with cwms-cli I thought this was a novel idea. Setting locations in the env for reuse Ie |
- Split core functionality into modular components for improved clarity and maintainability, including separate processing for locations, projects, and timeseries. - Introduced caching logic in `cache_util.py` for optimized data retrieval and storage. - Added threading utilities for concurrent task execution in `threading_util.py`. - Enhanced `SessionManager` logic with dynamic session initialization. - Updated Gradle build to include a `runEtlUnitTests` task for streamlined testing. - Improved environment variable examples in `etl.env.example`. - Introduced comprehensive unit tests for locations, projects, and timeseries modules. - Various bug fixes and restructured imports for consistency.
Includes the addition of the following:
etl.env.example.Summary
Add module for Extract Transform and Load (ETL) between CDA API's.
Related Issue
Closes https://jira.hecdev.net/browse/REGI-481
Validation
Tested by running the gradle and docker-compose processes, verifying data is valid through running CWMSVue and REGI.
Checklist