Perf: 1Million Challenge for CDA TS GET#1692
Conversation
|
Test failures appear to be related to the updated CWMS DB schema / |
MikeNeilson
left a comment
There was a problem hiding this comment.
Conceptually still looks reasonable. I don't have much more to add over what Adam has already said. Few suggestions on readability.
krowvin
left a comment
There was a problem hiding this comment.
Add a few code change notes to be aware of on this round
MikeNeilson
left a comment
There was a problem hiding this comment.
Looks reasonable @adamkorynta should determine if is comments were appropriately resolve.
I think so, but he may have been thinking of something else.
| int pageSize = queryParamAsClass(ctx, new String[]{PAGE_SIZE }, | ||
| Integer.class, DEFAULT_PAGE_SIZE, metrics, | ||
| name(TimeSeriesController.class.getName(), GET_ALL)); | ||
| pageSize = Controllers.validateTimeSeriesPageSize(pageSize); |
There was a problem hiding this comment.
if you pass queryParamAsClass(....) into validateTimeSeriesPageSize you can have pageSize be final here and avoid the extra assignment later.
There was a problem hiding this comment.
Should be good! Let me know
232b2b3#diff-4cfdd6fc7fb21cf33d99cbbbd646237b20b13f326edc6c0010dded41223442d1R475
There was a problem hiding this comment.
I'm still seeing the original code. Not a huge deal for this one though, more a style choice than anything.
|
I asked @inguyen314 for a list of TSIDs I could potentially use for testing against the 1M and current version of the api for integration testing MVS TSIDs (Click to expand) |
MikeNeilson
left a comment
There was a problem hiding this comment.
I'm overall happy with it, some of the test code that loads the large data isn't using bind variables or otherwise efficient operations. Bind vars are a must, but efficiency could go either way but it likely affecting performance results given how abusive it would be the the query planner cache in the database.
| int pageSize = queryParamAsClass(ctx, new String[]{PAGE_SIZE }, | ||
| Integer.class, DEFAULT_PAGE_SIZE, metrics, | ||
| name(TimeSeriesController.class.getName(), GET_ALL)); | ||
| pageSize = Controllers.validateTimeSeriesPageSize(pageSize); |
There was a problem hiding this comment.
I'm still seeing the original code. Not a huge deal for this one though, more a style choice than anything.
| List<SeedRow> sortedRows = new ArrayList<>(rows); | ||
| sortedRows.sort(Comparator.comparing(seedRow -> seedRow.dateTime)); | ||
|
|
||
| for (SeedRow row : sortedRows) { |
There was a problem hiding this comment.
This should really be handled as a batch operation. Though if it wasn't exceptionally slow in your initial testing we can patch it up later.
e.g.
try ( stmt = prepare )
{
for (var row: rows)
{
stmt.set...
stmt.add
}
stmt.executeBatch
}
alternatively, you call executeBatch every X rows, where X is found through experimentation.
All let's the query planner be a bit more efficient.
|
|
||
| for (SeedRow row : sortedRows) { | ||
| int year = OffsetDateTime.ofInstant(row.dateTime, ZoneOffset.UTC).getYear(); | ||
| String sql = "insert into at_tsv_" + year |
There was a problem hiding this comment.
Okay, this should be fixed, even in tests... bind variables, always bind variables.
341f74b to
5696cd7
Compare
| private static final FieldMapping AV_CWMS_TS_ID2_FIELD_MAP = new CwmsTsId2FieldMapping(); | ||
| private static final FieldMapping AV_CWMS_TS_ID_FIELD_MAP = new CwmsTsIdFieldMapping(); | ||
|
|
||
| private static final class DirectReadMetadata { |
There was a problem hiding this comment.
I don't think this is in our checkstyle, but java conventions generally put inner classes at the bottom of parent classes
There was a problem hiding this comment.
If we can add that to the checkstyle file, we definitely should.
Emphasize the importance of using bind variables in SQL queries to prevent string concatenation / security issues. ## Summary Mike noticed string concat in one of the test files in #1692
|
Failing test needs to be fixed. Suspect some of the logic change isn't getting it down the right code path: |
Signed-off-by: Charles Graham, SWT <charles.r.graham@usace.army.mil>
Signed-off-by: Charles Graham, SWT <charles.r.graham@usace.army.mil>
Signed-off-by: Charles Graham, SWT <charles.r.graham@usace.army.mil>
|
Test failure is due to recent change to schema on how nulls are handled. This will also affect the next release RC database build once that RC is deployed, but is not related to this PR. Going to merge so we can test this in the dev environment. |
Work on #1685
Improve unfiltered /timeseries read performance + perf coverage
/timeseriesread path off the Oracleretrieve_ts_out_tabhot path and onto directAV_TSV_DQU/jOOQ reads while preservingretrieve_tsresponse semantics in JavaBenchmark
Local single-request benchmarks against the Docker/Testcontainers Oracle dev environment, using
page-sizeequal to the requested row count:250000points:40.679s250000points:4.652501s500000points:HTTP 500The current Java benchmark harness also confirms the direct path now returns the expected count for the large seeded series:
250000points:250000reported rows, correct final timestamp,4.652501sThe benchmark task writes JSON artifacts under
load_data/performance/results/.Validation
./gradlew :cwms-data-api:integrationTests --tests cwms.cda.api.TimeSeriesDirectReadParityIT./gradlew :cwms-data-api:timeseriesReadBenchmark -Pbenchmark.pointCount=250000 -Pbenchmark.runs=1 -Pbenchmark.forceReseed=trueNotes
/timeseriesread path;/timeseries/filteredis still on the legacy pathSeeding 250,000 TS values for SPK and reading them back via CDA returns in
4.6sBefore changes
Changes
Tool Usage