Fix SDPR initialization and seed handling#484
Fix SDPR initialization and seed handling#484hsun3163 wants to merge 3 commits intoStatFunGen:mainfrom
Conversation
PRS-CS follow-up findingAs a follow-up, I ran the same seed-controlled prediction-scale check for PRS-CS on the two OTTERS regression fixtures used for lassosum/SDPR evidence: fixture 161 and fixture 206. The PRS-CS experiment is useful because it separates three targets that can otherwise be confused:
For PRS-CS, exact beta identity is achievable when the method path and seed are identical:
However, changing only the seed within the same PRS-CS method changes the per-SNP beta allocation:
Comparing old Python PRS-CS to pecotmr PRS-CS with the same frozen inputs and the same seed also gives low raw beta correlation but high prediction-scale agreement:
This supports the same validation principle used for SDPR: raw per-SNP beta Pearson can be too strict for stochastic shrinkage methods under LD. Exact beta equality is a valid This PR does not change PRS-CS behavior. The PRS-CS follow-up is included only to clarify the validation standard: prediction-scale agreement is the practical model-level metric, |
|
This PR is due to testing artifact from an earlier version. Therefore nolonger needed |
Summary
This PR fixes the SDPR initialization and reproducibility issue in pecotmr by exposing the SDPR initialization mode and passing deterministic seed handling through the R API into the C++ sampler.
The default is now
init = "legacy_random", which restores original SDPR-style random cluster initialization. The current/null initialization path is not removed; it remains available explicitly asinit = "null".What is reverted and why
The current pecotmr SDPR path initialized all SNPs in the null cluster:
cls_assgn.assign(num_snp, 0);That behavior was introduced as a defensive implementation choice for the rewritten sampler, because original SDPR-style random initialization can make the first
sample_beta()step allocate a large dense matrix when many SNPs start as non-null. However, the OTTERS regression experiments showed that this current/null initialization is not equivalent to the original SDPR behavior and can be unstable as a prediction model when run without controlled seed handling.This PR therefore reverts the current/null initialization as the default OTTERS-compatible behavior. It does not delete that implementation; it keeps it as an explicit opt-in mode with
init = "null"for debugging or intentional use.The option name
legacy_randomis new in pecotmr. It means original SDPR-style random initialization, not an upstream SDPR API name.Prediction-scale validation rationale
For each comparison,
beta1andbeta2are two SDPR weight vectors being compared, for example two repeated pecotmr runs or Old OTTERS weights versus new pecotmr weights. I used the LD-weighted prediction correlation:where
Ris the same LD matrix used by SDPR. This measures whether the two weight vectors produce the same genetically predicted expression under the reference LD.Before the fix, the current/null SDPR path was not reproducible as a prediction model:
More precisely, the current/null path was not a stable prediction model across random states. It was deterministic when the exact same seed was reused, but changing the seed changed the fitted prediction substantially:
pecotmr current/null init, seed0 vs seed1:
raw beta Pearson = 0.026
LD prediction correlation = 0.308
opposite signs = 871 / 3404
The current/null initialization made SDPR highly sensitive to the random seed, to the point that two valid seeded runs on the same input produced different predicted expressions.
After switching to deterministic original-SDPR-style initialization, the SDPR output becomes stable on the prediction scale:
Trying to force every SNP beta to match exactly fails. Even after increasing the old SDPR run to 10,000 iterations, beta-vector agreement did not fully converge:
This likely reflects the fact that, under LD, correlated SNPs can exchange weights. Therefore beta Pearson can remain modest even when the predicted expression is nearly identical.
After the fix, pecotmr also agrees well with Old OTTERS on the prediction scale:
Old OTTERS vs pecotmr original-SDPR-style init seed0:
Old OTTERS vs pecotmr original-SDPR-style init seed1:
What this PR changes
init = c("legacy_random", "null")tosdpr().init = "legacy_random"the default to restore original SDPR-style random initialization.init = "null"as an explicit option instead of silently using it by default.seedand initialization mode into the C++ sampler.man/sdpr.Rd.init, fixed-seed reproducibility withn_threads = 1, explicit null initialization, andsdpr_weights()argument forwarding.Interpretation