diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ba24c8a24..78d513002 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -105,6 +105,38 @@ jobs: - name: Run simulation_manager smoke test run: bash .travis/test-simulation-manager.sh + asimov-integration: + needs: install + runs-on: ubuntu-latest + strategy: + fail-fast: false + matrix: + include: + - asimov-series: '0.5' + asimov-spec: 'asimov>=0.5,<0.6' + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.11' + cache: 'pip' + cache-dependency-path: requirements.txt + - name: Enable symlink + run: sudo ln -sf $(which python3) /usr/bin/python + - name: Install system dependencies + run: sudo apt-get update && sudo apt-get install -y libgsl-dev + - name: Install dependencies + run: | + python -m pip install --upgrade pip --break-system-packages + python -m pip install -r requirements.txt --break-system-packages + python -m pip install coverage pytest --break-system-packages + python -m pip install --editable . --break-system-packages + python -m pip install htcondor --only-binary=:all: --break-system-packages + - name: Install Asimov + run: python -m pip install '${{ matrix.asimov-spec }}' 'asimov-gwdata>=0.4,<0.5' --break-system-packages + - name: Run Asimov integration test + run: bash .travis/test-asimov.sh + test-run: needs: install runs-on: ubuntu-latest @@ -171,4 +203,4 @@ jobs: github_token: ${{ secrets.GITHUB_TOKEN }} publish_dir: ./docs/build/ publish_branch: gh-pages - force_push: true \ No newline at end of file + force_push: true diff --git a/.travis/test-asimov.sh b/.travis/test-asimov.sh new file mode 100644 index 000000000..9a89ddd95 --- /dev/null +++ b/.travis/test-asimov.sh @@ -0,0 +1,8 @@ +#! /bin/bash +set -euo pipefail + +# The RIFT Asimov plugin is currently developed and validated against the +# Asimov 0.5 series. This test skips cleanly for unsupported/future series +# from inside pytest, so developers can preflight 0.6/0.7 environments without +# editing the test. +python -m pytest -q MonteCarloMarginalizeCode/Code/test/asimov_integration diff --git a/COMMIT_MSG.txt b/COMMIT_MSG.txt deleted file mode 100644 index 9bb49b95c..000000000 --- a/COMMIT_MSG.txt +++ /dev/null @@ -1,125 +0,0 @@ -RIFT.misc: dag_utils_generic fixes from end-to-end demo testing - -Follow-up to the initial pluggable-backend commit. These changes are all -driven by running the three new demo drivers (BasicIteration, EOS, -util_RIFT_pseudo_pipe) under the htcondor / glue / slurm backends on a -real RIFT install and fixing every site that crashed. - -Backend abstraction -------------------- - -* _GenericNode now generates globally-unique node names of the form - "-" using the same generate_job_id() recipe as - glue.pipeline.CondorDAGNode. This avoids identifier collisions when - multiple workflows are merged into a single .dag file. The legacy - private attribute _CondorDAGNode__md5name is exposed and mirrors - node.name verbatim, so RIFT consumers that hard-code that attribute - (e.g. when wiring up custom SCRIPT POST items) keep working. A new - set_name() helper keeps name and _CondorDAGNode__md5name in sync if a - caller wants a more legible override. - -* HTCondorBackend is now *always* registered. Its emit_job() already had - a self-contained submit-file text renderer for environments without the - htcondor python bindings (macOS in particular), so gating the - registration on `import htcondor` produced spurious warnings - ("RIFT_DAG_BACKEND='htcondor' not registered") and silently rerouted to - whichever backend auto-detection happened to find. GluePipelineBackend - is still gated on `from glue import pipeline`, since it strictly needs - glue at emit time. SlurmBackend has no python-level dependency and - always registers. - -* GluePipelineBackend supplies default log / stdout / stderr file paths - derived from the sub-file when the caller didn't set them, instead of - letting glue.pipeline raise CondorSubmitError("Log file not specified.") - / "Error file not specified." / "Output file not specified.". Other - backends already tolerated their absence. - -* GluePipelineBackend.emit_dag() strips a trailing ".dag" before passing - the path to glue.pipeline.CondorDAG.set_dag_file(), since glue itself - appends ".dag" and would otherwise produce "wf.dag.dag". - -Test suite (test/backends/) ---------------------------- - -* test_backends_lowlevel.py: explicit `import importlib.util` (the - submodule is not auto-loaded on all Python versions / startup paths); - open(...).read() patterns rewritten with `with` blocks to silence - Python 3.14+ ResourceWarnings; `--input=x.txt` assertion relaxed to - also accept the space-separated `--input x.txt` form that real - glue.pipeline emits; node-name test now asserts the new "-" - format and that _CondorDAGNode__md5name mirrors .name. - -* run_all_backends.sh + each demo_*.sh: prepend - MonteCarloMarginalizeCode/Code to PYTHONPATH and bin/ to PATH so the - in-tree RIFT package is discoverable without `pip install -e .`. The - command-v / fallback resolution for the pipeline-driver path captures - the absolute path returned by command-v (so the existence check is - meaningful). Tolerant `rm -rf 2>/dev/null || true` instead of bare - `rm -rf` so cleanup never aborts the run on FUSE / restrictive - filesystems. - -* The three demos no longer falsely report PASS when their child python - invocations crashed. Each backend run captures the python exit code - via PIPESTATUS (so tee doesn't swallow it) and treats the run as - failed if the exit code is non-zero *or* the driver log contains a - "Traceback (most recent call last):" line; failed backends are - collected in BACKEND_FAILURES and the demo exits non-zero at the end - if any failed. - -Fixture and invocation fixes ----------------------------- - -* demo_create_event_parameter_pipeline_BasicIteration.sh: switched to - the actual argument names accepted by the script - (--ile-n-events-to-analyze, --general-request-disk); removed - --skip-reproducibility (only valid on pseudo_pipe); added a sanitize - step on the input grid (see below). - -* demo_create_eos_posterior_pipeline.sh: switched from the single-args - --marg-event-args path (which leaves marg_event_nchunk_list = None - and crashes the multi-event loop) to the list-file path with - --marg-event-args-list-file, --marg-event-exe-list-file, and - --marg-event-nchunk-list-file. Uses a plain-text inputs/eos-grid.dat - for --input-grid (np.loadtxt cannot read a gzipped XML). Added - inputs/args_marg_event_list.txt, inputs/marg_event_exe_list.txt, - inputs/marg_event_nchunk_list.txt. - -* demo_util_RIFT_pseudo_pipe.sh: rewritten to mirror - .travis/test-build.sh -- uses --use-ini + --use-coinc which is the - only path through pseudo_pipe that populates event_dict["IFOs"] - without contacting GraceDb. Copied the in-tree - .travis/ref_ini/GW150914.ini and .travis/ref_ini/coinc.xml into - inputs/ (overridable via PSEUDO_INI / PSEUDO_COINC env vars). Sets - RIFT_LOWLATENCY / SINGULARITY_RIFT_IMAGE / SINGULARITY_BASE_EXE_DIR - exactly as the travis driver does. Wipes the rundir before the run - but does NOT pre-create it (pseudo_pipe.py does its own os.mkdir on - --use-rundir and aborts with FileExistsError otherwise); the driver - log goes to a sibling path and is moved into the rundir after - pseudo_pipe creates it. Artefact-listing block tolerates a missing - rundir. - -* inputs/overlap-grid.xml.gz: replaced the comment-only placeholder - (which trivially failed expat parsing) with a real LIGO_LW - sim_inspiral table copied from - MonteCarloMarginalizeCode/data/zero_noise_mdc/mdc-bns.xml.gz. - -* inputs/sanitize_grid.py: new helper. Modern igwn_ligolw rejects both - Type="ilwd:char" Column declarations and the `process_id` / - `simulation_id` columns on sim_inspiral. The helper prefers the - ligo.lw / igwn_ligolw API (running ligolw_no_ilwdchar.process_document - and removing the rejected columns). The stdlib-only text fallback - walks each , identifies columns to drop and ilwd:char columns - to rewrite, parses the comma-separated body via Python's csv - module, drops the corresponding column values from each data row, and - rewrites ilwd-format string values like "process:process_id:0" to the - bare integer 0. The BasicIteration demo invokes this as a - preprocessing step. - -Upstream bug fixes ------------------- - -* bin/create_event_parameter_pipeline_BasicIteration line 803: - `which_short` was a typo for `which_sort` (introduced in upstream - commit 8ef16ec9 "cepp: add 'shuf' to output, to randomize extrinsic - order at creation time"). Triggered when running with - --last-iteration-extrinsic-time-resampling. Fixed in place. diff --git a/MonteCarloMarginalizeCode/Code/RIFT/misc/__init__.py b/MonteCarloMarginalizeCode/Code/RIFT/misc/__init__.py index 12024d740..cca3d1942 100644 --- a/MonteCarloMarginalizeCode/Code/RIFT/misc/__init__.py +++ b/MonteCarloMarginalizeCode/Code/RIFT/misc/__init__.py @@ -1 +1 @@ -__all__ = ['dag_utils','xmlutils','bounded_kde','samples_utils'] +__all__ = ['dag_utils','xmlutils','bounded_kde','samples_utils','hyperpipeline_io'] diff --git a/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils.py b/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils.py index 0a1474528..676e1bd3b 100644 --- a/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils.py +++ b/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils.py @@ -711,6 +711,7 @@ def write_CIP_sub(tag='integrate', exe=None, input_net='all.net',output='output- return ile_job, ile_sub_name + def write_puff_sub(tag='puffball', exe=None, base=None,input_net='output-ILE-samples',output='puffball',universe="vanilla",out_dir=None,log_dir=None, use_eos=False,ncopies=1,arg_str=None,request_memory=1024,arg_vals=None, no_grid=False,extra_text='',**kwargs): """ Perform puffball calculation @@ -1293,7 +1294,15 @@ def write_unify_sub_simple(tag='unify', exe=None, base=None,target=None,universe """ - exe = exe or which("util_CleanILE.py") # like cat, but properly accounts for *independent* duplicates. (Danger if identical). Also strips large errors + if exe is None: + if str(os.environ.get("RIFT_HYPERPIPELINE_FORMAT", "")).strip().lower() in ("1", "true", "yes", "on"): + exe = which("util_CleanILE_hyperpipeline.py") + if not exe: + exe = os.path.abspath(os.path.join( + os.path.dirname(__file__), "..", "..", "bin", + "util_CleanILE_hyperpipeline.py")) + else: + exe = which("util_CleanILE.py") # like cat, but properly accounts for *independent* duplicates. (Danger if identical). Also strips large errors # Write unify.sh # - problem of globbing inside condor commands @@ -2159,7 +2168,24 @@ def write_joingrids_sub(tag='join_grids', exe=None, universe='vanilla', input_pa working_dir = log_dir.replace("/logs", '') # assumption about workflow/naming! Danger! - fname_out =target_dir + "/" +output_base + ".xml.gz" + # Hyperpipeline mode: emit a .dat composite via the EOS-style + # head-line + cat | sort | uniq | shuf shell pattern (mirrors + # join_post.sh in create_eos_posterior_pipeline). XML helpers + # (igwn_ligolw_add) are not used. + import os as _os_join + _hpip_join = str(_os_join.environ.get("RIFT_HYPERPIPELINE_FORMAT", "")).strip().lower() in ("1","true","yes","on") + _suffix_join = "dat" if _hpip_join else "xml.gz" + _shuffle_filter_join = None + if _hpip_join: + _shuffle_filter_join = which("shuf") or which("gshuf") + if _shuffle_filter_join is None: + _sort_join = which("sort") + if _sort_join is not None: + _shuffle_filter_join = _sort_join + " -R" + else: + _shuffle_filter_join = "cat" + + fname_out =target_dir + "/" +output_base + "." + _suffix_join if n_explode ==1: # we are really doing a glob match fname_out = fname_out.replace('$(macroiteration)','$1') fname_out = fname_out.replace('$(macroiterationnext)','$2') @@ -2169,11 +2195,34 @@ def write_joingrids_sub(tag='join_grids', exe=None, universe='vanilla', input_pa if old_add: extra_arg = " --ilwdchar-compat " # should never be used anymore with open("join_grids.sh",'w') as f: - f.write("#! /bin/bash \n") - f.write(r""" + if _hpip_join: + # Pick any one shard to source the column header from, then + # concatenate every shard's data rows (skipping `#` lines) + # and shuffle so spokes are interleaved. + f.write("#! /bin/bash \n") + f.write(r""" +# merge hyperpipeline ASCII shards +{extra} +set -e +shopt -s nullglob +SHARDS=({work}/{out}*.{suf}) +if [ ${{#SHARDS[@]}} -eq 0 ]; then + echo "join_grids.sh: no input shards matched {work}/{out}*.{suf}" >&2 + exit 1 +fi +# Header: take the first comment block from the first shard. +grep -E '^#' "${{SHARDS[0]}}" > {out_path} +# Body: every non-comment line from every shard, deduped + shuffled. +grep -hE -v '^#' "${{SHARDS[@]}}" | sort -u | {shuffle_filter} >> {out_path} +""".format(extra=extra_text, work=alt_work_dir, out=alt_out, + suf=_suffix_join, out_path=fname_out, + shuffle_filter=_shuffle_filter_join)) + else: + f.write("#! /bin/bash \n") + f.write(r""" # merge using glob command called from shell {} -{} {} --output {} {}/{}*.xml.gz +{} {} --output {} {}/{}*.xml.gz """.format(extra_text,exe,extra_arg,fname_out,alt_work_dir,alt_out)) os.system("chmod a+x join_grids.sh") exe = target_dir + "/join_grids.sh" @@ -3066,4 +3115,3 @@ def write_hyperpost_sub(tag='HYPER', exe=None, input_net='all.marg_net',output=' return ile_job, ile_sub_name - diff --git a/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils_generic.py b/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils_generic.py index 2aceb8262..93e3b57ed 100644 --- a/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils_generic.py +++ b/MonteCarloMarginalizeCode/Code/RIFT/misc/dag_utils_generic.py @@ -2666,7 +2666,15 @@ def write_unify_sub_simple(tag='unify', exe=None, base=None,target=None,universe """ - exe = exe or which("util_CleanILE.py") # like cat, but properly accounts for *independent* duplicates. (Danger if identical). Also strips large errors + if exe is None: + if str(os.environ.get("RIFT_HYPERPIPELINE_FORMAT", "")).strip().lower() in ("1", "true", "yes", "on"): + exe = which("util_CleanILE_hyperpipeline.py") + if not exe: + exe = os.path.abspath(os.path.join( + os.path.dirname(__file__), "..", "..", "bin", + "util_CleanILE_hyperpipeline.py")) + else: + exe = which("util_CleanILE.py") # like cat, but properly accounts for *independent* duplicates. (Danger if identical). Also strips large errors # Write unify.sh # - problem of globbing inside condor commands @@ -3532,7 +3540,22 @@ def write_joingrids_sub(tag='join_grids', exe=None, universe='vanilla', input_pa working_dir = log_dir.replace("/logs", '') # assumption about workflow/naming! Danger! - fname_out =target_dir + "/" +output_base + ".xml.gz" + # Hyperpipeline mode: emit a .dat composite via the EOS-style + # head-line + cat | sort | uniq | shuf shell pattern. XML helpers + # (igwn_ligolw_add) are not used. + _hpip_join = str(os.environ.get("RIFT_HYPERPIPELINE_FORMAT", "")).strip().lower() in ("1","true","yes","on") + _suffix_join = "dat" if _hpip_join else "xml.gz" + _shuffle_filter_join = None + if _hpip_join: + _shuffle_filter_join = which("shuf") or which("gshuf") + if _shuffle_filter_join is None: + _sort_join = which("sort") + if _sort_join is not None: + _shuffle_filter_join = _sort_join + " -R" + else: + _shuffle_filter_join = "cat" + + fname_out =target_dir + "/" +output_base + "." + _suffix_join if n_explode ==1: # we are really doing a glob match fname_out = fname_out.replace('$(macroiteration)','$1') fname_out = fname_out.replace('$(macroiterationnext)','$2') @@ -3542,8 +3565,28 @@ def write_joingrids_sub(tag='join_grids', exe=None, universe='vanilla', input_pa if old_add: extra_arg = " --ilwdchar-compat " # should never be used anymore with open("join_grids.sh",'w') as f: - f.write("#! /bin/bash \n") - f.write(r""" + if _hpip_join: + f.write("#! /bin/bash \n") + f.write(r""" +# merge hyperpipeline ASCII shards +{extra} +set -e +shopt -s nullglob +SHARDS=({work}/{out}*.{suf}) +if [ ${{#SHARDS[@]}} -eq 0 ]; then + echo "join_grids.sh: no input shards matched {work}/{out}*.{suf}" >&2 + exit 1 +fi +# Header: take the first comment block from the first shard. +grep -E '^#' "${{SHARDS[0]}}" > {out_path} +# Body: every non-comment line from every shard, deduped + shuffled. +grep -hE -v '^#' "${{SHARDS[@]}}" | sort -u | {shuffle_filter} >> {out_path} +""".format(extra=extra_text, work=alt_work_dir, out=alt_out, + suf=_suffix_join, out_path=fname_out, + shuffle_filter=_shuffle_filter_join)) + else: + f.write("#! /bin/bash \n") + f.write(r""" # merge using glob command called from shell {} {} {} --output {} {}/{}*.xml.gz @@ -3584,10 +3627,10 @@ def write_joingrids_sub(tag='join_grids', exe=None, universe='vanilla', input_pa # ile_job.add_condor_cmd("MY.PostCmd", ' "' + gzip + ' ' +fname_out + '"') explode_str = "" - explode_str += " {}/{}.xml.gz ".format(working_dir,output_base) # base result from fitting job + explode_str += " {}/{}.{} ".format(working_dir,output_base,_suffix_join) # base result from fitting job if n_explode >1: for indx in np.arange(n_explode): - explode_str+= " {}/{}-{}.xml.gz ".format(working_dir,output_base,indx) + explode_str+= " {}/{}-{}.{} ".format(working_dir,output_base,indx,_suffix_join) ile_job.add_arg(explode_str) else: ile_job.add_arg(" $(macroiteration) $(macroiterationnext) ") @@ -4437,4 +4480,3 @@ def write_hyperpost_sub(tag='HYPER', exe=None, input_net='all.marg_net',output=' return ile_job, ile_sub_name - diff --git a/MonteCarloMarginalizeCode/Code/RIFT/misc/hyperpipeline_io.py b/MonteCarloMarginalizeCode/Code/RIFT/misc/hyperpipeline_io.py new file mode 100644 index 000000000..681b40e9e --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/RIFT/misc/hyperpipeline_io.py @@ -0,0 +1,649 @@ +""" +hyperpipeline_io +================ + +A small, dependency-light I/O helper for the *hyperpipeline* ASCII format used +as an alternative to the legacy XML / positional-ASCII intermediate files +exchanged between ILE and CIP. + +The format is a plain-text whitespace-separated table with a leading +``#``-prefixed header naming each column. Column 0 is always ``lnL`` and +column 1 is always ``sigma_lnL``; the remaining columns are the (possibly +extensible) physical parameters describing each intrinsic sample. This is +deliberately compatible with ``numpy.genfromtxt(..., names=True)``, +``numpy.loadtxt`` (when ``# `` headers are stripped as comments), and +``pandas.read_csv(..., sep=r'\\s+', comment='#')``. + +Activation +---------- + +The new format is opt-in. Both producers (``integrate_likelihood_extrinsic_batchmode``) +and consumers (``util_ConstructIntrinsicPosterior_GenericCoordinates.py``) +check the environment variable :envvar:`RIFT_HYPERPIPELINE_FORMAT`; when its +value is truthy (``1``, ``true``, ``yes``, case-insensitive), the new code +paths are taken. When the variable is unset or falsy, the executables behave +exactly as before -- existing pipelines are unaffected. + +Default columns +--------------- + +By default the format describes a quasi-circular precessing binary using +Cartesian component spins:: + + lnL sigma_lnL m1 m2 a1x a1y a1z a2x a2y a2z + +Optional column groups (auto-detected from the header): + +* ``eccentricity`` (and optionally ``meanPerAno``) +* ``lambda1 lambda2`` (and optionally ``eos_table_index``) +* ``distance`` (in Mpc) + +The reader returns a structured ``numpy.ndarray`` so callers can address +columns by name regardless of order, and a small adaptor function is provided +to convert that structured array into the legacy positional ``(N, K)`` matrix +the existing CIP pipeline still expects. +""" + +from __future__ import absolute_import, division, print_function + +import os +import numpy as np + + +# --------------------------------------------------------------------------- +# Constants and helpers +# --------------------------------------------------------------------------- + +#: Environment-variable flag. When truthy, ILE writes / CIP reads in the +#: hyperpipeline format. +ENV_FLAG = "RIFT_HYPERPIPELINE_FORMAT" + +#: Magic token placed on the first line of every hyperpipeline file so +#: that downstream tools can sniff the format unambiguously. Both writers +#: prepend ``# `` so the on-disk first line is ``# RIFT_HYPERPIPELINE_V1``. +#: Sniff/read code strips any leading ``#`` and whitespace before matching. +MAGIC = "RIFT_HYPERPIPELINE_V1" + +#: Default base columns -- always present, in this order, for every file. +DEFAULT_BASE_COLUMNS = ( + "lnL", "sigma_lnL", + "m1", "m2", + "a1x", "a1y", "a1z", + "a2x", "a2y", "a2z", +) + +#: Columns that may optionally be appended after the base set. The reader +#: accepts any subset; the writer emits only those that are active. +OPTIONAL_COLUMNS = ( + "eccentricity", + "meanPerAno", + "lambda1", + "lambda2", + "eos_table_index", + "distance", +) + + +def is_active(env=None): + """Return ``True`` if the hyperpipeline format is enabled in *env*. + + Parameters + ---------- + env : Mapping or None + Environment mapping. Defaults to ``os.environ``. + """ + if env is None: + env = os.environ + val = str(env.get(ENV_FLAG, "")).strip().lower() + return val in ("1", "true", "yes", "on") + + +def build_column_list(use_eccentricity=False, use_meanPerAno=False, + use_tides=False, use_eos_index=False, + use_distance=False): + """Compose the column-name tuple for a given physics configuration. + + The resulting tuple always begins with :data:`DEFAULT_BASE_COLUMNS` and + appends the requested optional groups in a fixed canonical order:: + + (eccentricity, [meanPerAno,] [lambda1, lambda2,] + [eos_table_index,] [distance]) + """ + cols = list(DEFAULT_BASE_COLUMNS) + if use_eccentricity: + cols.append("eccentricity") + if use_meanPerAno: + cols.append("meanPerAno") + if use_tides: + cols.extend(["lambda1", "lambda2"]) + if use_eos_index: + cols.append("eos_table_index") + if use_distance: + cols.append("distance") + return tuple(cols) + + +# --------------------------------------------------------------------------- +# Writer +# --------------------------------------------------------------------------- + +def write_row(fname, columns, values, append=False): + """Write a single row in the hyperpipeline format. + + Parameters + ---------- + fname : str + Output filename. + columns : sequence of str + Names of the columns (length K). + values : sequence of float + Values, in the same order as *columns* (length K). + append : bool + If ``True``, append to *fname* without rewriting the header. Used + for accumulating multi-event ILE shards into a single file. + + Notes + ----- + The function writes (in this order) a magic line, a column-name header + line, and the data row. When *append* is ``True`` the magic and header + lines are skipped on the assumption that the file already begins with + them. + """ + columns = tuple(columns) + values = np.asarray(values, dtype=float).reshape(1, -1) + if values.shape[1] != len(columns): + raise ValueError( + "hyperpipeline_io.write_row: got {} values for {} columns".format( + values.shape[1], len(columns))) + + mode = "a" if append else "w" + with open(fname, mode) as fp: + if not append: + fp.write("# " + MAGIC + "\n") + fp.write("# " + " ".join(columns) + "\n") + # %.18e mirrors numpy.savetxt's default precision so this round-trips. + fp.write(" ".join("{:.18e}".format(v) for v in values[0]) + "\n") + + +def write_table(fname, columns, data): + """Write a multi-row table. *data* must have shape ``(N, len(columns))``.""" + columns = tuple(columns) + arr = np.asarray(data, dtype=float) + if arr.ndim == 1: + arr = arr.reshape(1, -1) + if arr.shape[1] != len(columns): + raise ValueError( + "hyperpipeline_io.write_table: data has {} cols, header has {}" + .format(arr.shape[1], len(columns))) + # numpy.savetxt prepends `# ` to every line of `header`, so we ask it + # to produce two `#`-prefixed comment lines: the magic marker and the + # column-name line. + header = MAGIC + "\n" + " ".join(columns) + np.savetxt(fname, arr, header=header) + + +# --------------------------------------------------------------------------- +# Reader +# --------------------------------------------------------------------------- + +def _strip_comment(line): + """Strip a leading ``#`` (or repeated ``#``) and surrounding whitespace.""" + return line.lstrip("#").strip() + + +def sniff(fname): + """Return ``True`` if *fname* looks like a hyperpipeline file. + + The check is cheap: read the first non-empty line and look for the + magic marker, or fall back to a header-only sniff (a ``#`` line listing + the canonical first column names). + """ + try: + with open(fname, "r") as fp: + for raw in fp: + line = raw.strip() + if not line: + continue + if not line.startswith("#"): + return False + payload = _strip_comment(line) + if payload.startswith(MAGIC): + return True + # Fallback: a header line listing the canonical first columns. + toks = payload.split() + if len(toks) >= 2 and toks[0] == "lnL" and toks[1] == "sigma_lnL": + return True + return False + except (OSError, IOError): + return False + return False + + +def read_header(fname): + """Return the tuple of column names declared in *fname*'s header. + + Skips the optional magic line; expects the next ``#``-prefixed line to + be the column-name header. + """ + with open(fname, "r") as fp: + for raw in fp: + line = raw.strip() + if not line: + continue + if not line.startswith("#"): + break # Hit data without finding a header. + payload = _strip_comment(line) + if payload.startswith(MAGIC): + continue + return tuple(payload.split()) + raise ValueError( + "hyperpipeline_io.read_header: no column header found in {}".format(fname)) + + +def read_table(fname): + """Read *fname* and return ``(structured_array, column_names_tuple)``. + + Uses :func:`numpy.genfromtxt` with ``names=True`` so columns are + addressable by name. Multiple ``#``-prefixed lines (e.g. the magic + marker plus repeated headers from a concatenated multi-shard file) are + all treated as comments by genfromtxt; the column names come from the + first such line, which we determine explicitly via :func:`read_header`. + """ + columns = read_header(fname) + # genfromtxt's auto name detection tries to infer from the *last* + # comment block immediately preceding the data, which is brittle for + # concatenated multi-shard files. Pass names explicitly to be robust. + arr = np.genfromtxt(fname, names=columns, comments="#", + dtype=float, invalid_raise=False) + if arr.ndim == 0: + # Single-row file -> genfromtxt returns 0-d structured scalar; + # promote to length-1 array for consistent downstream handling. + arr = np.atleast_1d(arr) + return arr, columns + + +# --------------------------------------------------------------------------- +# Adaptor: structured array -> legacy positional matrix expected by CIP +# --------------------------------------------------------------------------- + +def to_legacy_dat(arr, use_eccentricity=False, use_meanPerAno=False, + use_tides=False, use_eos_index=False, use_distance=False): + """Reshape a hyperpipeline structured array into the legacy CIP layout. + + The legacy CIP reader expects a plain ``(N, K)`` ``ndarray`` whose + columns are:: + + [event_id, m1, m2, s1x, s1y, s1z, s2x, s2y, s2z, + (distance?), (lambda1, lambda2, (eos_index)?)?, + (eccentricity, (meanPerAno)?)?, + lnL, sigma_lnL] + + The ordering of optional groups before ``lnL`` mirrors what the legacy + ``col_lnL`` increment chain in + ``util_ConstructIntrinsicPosterior_GenericCoordinates.py`` produces. + The synthetic ``event_id`` column is filled with ``-1`` because the + hyperpipeline format does not carry that book-keeping field. + """ + n = len(arr) + + cols = ["event_id", "m1", "m2", "a1x", "a1y", "a1z", "a2x", "a2y", "a2z"] + if use_distance: + cols.append("distance") + if use_tides: + cols.extend(["lambda1", "lambda2"]) + if use_eos_index: + cols.append("eos_table_index") + if use_eccentricity: + cols.append("eccentricity") + if use_meanPerAno: + cols.append("meanPerAno") + cols.extend(["lnL", "sigma_lnL"]) + + out = np.zeros((n, len(cols)), dtype=float) + for j, name in enumerate(cols): + if name == "event_id": + out[:, j] = -1.0 + else: + if name not in arr.dtype.names: + raise KeyError( + "hyperpipeline_io.to_legacy_dat: requested column '{}' " + "not present in file (have {})".format(name, arr.dtype.names)) + out[:, j] = arr[name] + return out + + +def read_many(filenames): + """Read and vertically stack a list of hyperpipeline files. + + All files must share the same column header. Empty / unreadable / wrong + files are skipped with a warning printed to stderr (this mirrors + util_CleanILE.py's tolerance for malformed shards). Returns + ``(structured_array, column_names_tuple)``. + """ + import sys + chunks = [] + columns = None + for fname in filenames: + try: + if not os.path.exists(fname): + continue + if os.stat(fname).st_size == 0: + continue + arr, hdr = read_table(fname) + except Exception as exc: + sys.stderr.write( + "hyperpipeline_io.read_many: skipping {}: {}\n".format(fname, exc)) + continue + if columns is None: + columns = hdr + elif hdr != columns: + sys.stderr.write( + "hyperpipeline_io.read_many: column mismatch in {} " + "(have {}, expected {}); skipping\n".format(fname, hdr, columns)) + continue + if arr.ndim == 0: + arr = np.atleast_1d(arr) + chunks.append(arr) + if not chunks: + raise ValueError( + "hyperpipeline_io.read_many: no readable input files among " + "{} candidate(s)".format(len(filenames))) + out = np.concatenate(chunks) + return out, columns + + +def consolidate(arr, columns, sigma_cut=0.9, digits=5): + """Consolidate duplicate intrinsic-parameter rows by weighted averaging. + + Mirrors the per-key averaging in ``util_CleanILE.py``: rows are grouped + by their intrinsic-parameter values (every column except ``lnL`` and + ``sigma_lnL``, rounded to *digits* decimals), rows with + ``sigma_lnL > sigma_cut`` are discarded, and within each group ``lnL`` is + weighted-averaged using the same ``1/sigma^2`` (after Lmax factoring) + weights as ``RIFT.misc.weight_simulations.AverageSimulationWeights``:: + + wts = (sigma_k)^-2 / sum_k (sigma_k)^-2 + = sum_k wts_k * I_k + sigma_ = sqrt(1 / sum_k 1/sigma_k^2) / + + Returns ``(consolidated_array, columns)`` where the array is sorted by + ``lnL`` in descending order. + """ + columns = tuple(columns) + if "lnL" not in columns or "sigma_lnL" not in columns: + raise ValueError( + "hyperpipeline_io.consolidate: input is missing 'lnL' or 'sigma_lnL'" + " columns: {}".format(columns)) + intrinsic_cols = tuple(c for c in columns if c not in ("lnL", "sigma_lnL")) + + # Drop poorly-resolved samples (mirrors util_CleanILE behaviour). + keep = arr["sigma_lnL"] <= sigma_cut + arr = arr[keep] + + if len(arr) == 0: + return np.empty(0, dtype=arr.dtype), columns + + # Group by rounded intrinsic-parameter tuple. + groups = {} + for row in arr: + key = tuple(round(float(row[c]), digits) for c in intrinsic_cols) + groups.setdefault(key, []).append(row) + + out = np.empty(len(groups), dtype=arr.dtype) + for i, (key, rows) in enumerate(groups.items()): + lnL = np.array([r["lnL"] for r in rows], dtype=float) + sigOverL = np.array([r["sigma_lnL"] for r in rows], dtype=float) + sigOverL = np.maximum(sigOverL, 1e-7) + lnLmax = np.max(lnL) + # sigma_k absolute (factor out Lmax for numerical stability). + sig = sigOverL * np.exp(lnL - lnLmax) + wts = 1.0 / (sig * sig) + wts = wts / np.sum(wts) + lnL_mean_minus_max = np.log(np.sum(np.exp(lnL - lnLmax) * wts)) + sigma_net_overL = np.sqrt(1.0 / np.sum(1.0 / (sig * sig))) / np.exp(lnL_mean_minus_max) + # Build the consolidated row: intrinsic values from the key, lnL/sigma + # from the weighted average. + for j, name in enumerate(intrinsic_cols): + out[i][name] = key[j] + out[i]["lnL"] = lnL_mean_minus_max + lnLmax + out[i]["sigma_lnL"] = sigma_net_overL + + # Sort by lnL descending so the composite has the highest-likelihood row + # first -- preserves the convention of the legacy `sort -rg -kN` step. + order = np.argsort(out["lnL"])[::-1] + return out[order], columns + + +#: Bidirectional alias map between hyperpipeline on-disk column names and +#: the corresponding ``ChooseWaveformParams`` attribute names. The on-disk +#: convention follows the LALInference / posterior-export naming +#: (``a1x``, ``a1y``, ...) while ``ChooseWaveformParams`` stores spins as +#: ``s1x``, ``s1y``, etc. The codebase already defines this mapping in +#: several places (e.g. util_FitAndEvaluate_GenericCoordinates.py and +#: util_ConstructIntrinsicPosterior_GenericCoordinates.py); we centralise it +#: here so the hyperpipeline I/O layer is the single bridge. +COLUMN_ALIAS_DISK_TO_ATTR = { + "a1x": "s1x", "a1y": "s1y", "a1z": "s1z", + "a2x": "s2x", "a2y": "s2y", "a2z": "s2z", +} +COLUMN_ALIAS_ATTR_TO_DISK = {v: k for k, v in COLUMN_ALIAS_DISK_TO_ATTR.items()} + + +def disk_to_attr(name): + """Map an on-disk column name to its ChooseWaveformParams attribute.""" + return COLUMN_ALIAS_DISK_TO_ATTR.get(name, name) + + +def attr_to_disk(name): + """Map a ChooseWaveformParams attribute to its on-disk column name.""" + return COLUMN_ALIAS_ATTR_TO_DISK.get(name, name) + + +#: On-disk units for parameters that have a non-trivial scale. m1 / m2 are +#: stored in solar masses for human readability (matches the legacy ILE ASCII +#: convention) but ``ChooseWaveformParams.m1`` is internally in kg, so any +#: writer / reader that round-trips through a P object MUST apply this +#: scaling. Distance is stored in Mpc; ``ChooseWaveformParams.dist`` is in +#: metres. Other parameters are dimensionless / pre-scaled and pass through. +PARAM_DISK_TO_SI = { + "m1": "MSUN_SI", + "m2": "MSUN_SI", + "dist": "PC_SI_MPC", # special: metres = value(Mpc) * 1e6 * lal.PC_SI + "distance": "PC_SI_MPC", +} + + +def _disk_to_si(name, val, lal_module): + """Convert *val* (on-disk units) to the SI convention used by P.""" + scale = PARAM_DISK_TO_SI.get(name) + if scale is None: + return float(val) + if scale == "MSUN_SI": + return float(val) * lal_module.MSUN_SI + if scale == "PC_SI_MPC": + return float(val) * 1.0e6 * lal_module.PC_SI + return float(val) + + +def _si_to_disk(name, val, lal_module): + """Convert *val* (P's SI units) to the on-disk convention.""" + scale = PARAM_DISK_TO_SI.get(name) + if scale is None: + return float(val) + if scale == "MSUN_SI": + return float(val) / lal_module.MSUN_SI + if scale == "PC_SI_MPC": + return float(val) / (1.0e6 * lal_module.PC_SI) + return float(val) + + +#: Default filename suffix for hyperpipeline grid files emitted by +#: :func:`write_grid_from_P_list` when the caller passes a basename +#: without one. Mirrors the auto-append behaviour of +#: ``lalsimutils.ChooseWaveformParams_array_to_xml`` (which appends +#: ``.xml.gz`` when it isn't already present), so writer call sites can +#: pass the same basename to either helper. +DEFAULT_GRID_SUFFIX = ".dat" + + +def with_grid_suffix(fname): + """Append :data:`DEFAULT_GRID_SUFFIX` if *fname* lacks a recognised one.""" + if fname.endswith(".dat") or fname.endswith(".txt"): + return fname + return fname + DEFAULT_GRID_SUFFIX + + +def write_grid_from_P_list(fname, P_list, columns, + lal_module=None, lalsimutils_module=None, + lnL_values=None, sigma_lnL_values=None): + """Write a hyperpipeline grid file from a list of ChooseWaveformParams. + + Used by CIP / puffball / fetch when emitting the next-iteration + intrinsic-parameter grid in hyperpipeline format instead of XML. + + *columns* must include ``lnL`` and ``sigma_lnL`` (filled with 0 by + default if *lnL_values* / *sigma_lnL_values* are not provided -- these + columns carry no information for posterior draws but are kept to + preserve the universal column-0/1 invariant). Mass and distance values + are converted from P's internal SI units to the on-disk convention + (solar masses, Mpc) via :data:`PARAM_DISK_TO_SI`. + + Parameters + ---------- + fname : str + Output filename. + P_list : sequence + ``ChooseWaveformParams`` instances. + columns : sequence of str + Output column names (must contain ``lnL`` and ``sigma_lnL``). + lal_module, lalsimutils_module : modules + Pass ``lal`` and ``RIFT.lalsimutils`` explicitly. Avoids importing + them at module-load time so :mod:`hyperpipeline_io` stays + dependency-light for tests that only exercise pure-numpy code paths. + lnL_values, sigma_lnL_values : array-like or None + Optional per-row likelihood values. Filled with zeros if omitted. + """ + columns = tuple(columns) + if "lnL" not in columns or "sigma_lnL" not in columns: + raise ValueError( + "hyperpipeline_io.write_grid_from_P_list: 'lnL' and 'sigma_lnL' " + "must appear in columns; got {}".format(columns)) + fname = with_grid_suffix(fname) + n = len(P_list) + mat = np.zeros((n, len(columns)), dtype=float) + if lnL_values is not None: + mat[:, columns.index("lnL")] = np.asarray(lnL_values, dtype=float) + if sigma_lnL_values is not None: + mat[:, columns.index("sigma_lnL")] = np.asarray(sigma_lnL_values, dtype=float) + for i, P in enumerate(P_list): + for j, name in enumerate(columns): + if name in ("lnL", "sigma_lnL"): + continue + # Resolve disk-name -> attr-name via the alias map (a1x -> s1x). + attr_name = disk_to_attr(name) + # First try direct attribute, then fall through to extract_param + # (handles derived coordinates: mc, eta, q, chi1, ...) + if hasattr(P, attr_name): + raw = getattr(P, attr_name) + elif lalsimutils_module is not None and hasattr(P, "extract_param"): + raw = P.extract_param(attr_name) + else: + raise AttributeError( + "hyperpipeline_io.write_grid_from_P_list: P has no '{}' " + "(or alias '{}') attribute and no lalsimutils provided " + "for extract_param".format(name, attr_name)) + if lal_module is not None: + mat[i, j] = _si_to_disk(name, raw, lal_module) + else: + mat[i, j] = float(raw) + write_table(fname, columns, mat) + + +def read_grid_to_P_list(fname, P_factory, lal_module=None, + valid_params=None): + """Read a hyperpipeline grid file and return ``(P_list, columns)``. + + For each row, instantiates a ``ChooseWaveformParams`` via *P_factory()* + and assigns the named columns. Mass and distance values are converted + from on-disk units (solar masses, Mpc) to P's internal SI units via + :data:`PARAM_DISK_TO_SI`. + + Parameters + ---------- + fname : str + Input filename. + P_factory : callable + Zero-arg callable that returns a fresh ``ChooseWaveformParams``. + Typically ``lalsimutils.ChooseWaveformParams``. + lal_module : module + Pass ``lal`` for unit conversions. If None, no conversion is done + (units stay on-disk -- only safe when downstream code knows that). + valid_params : iterable of str + Restrict assignment to these names. Defaults to the full column + set. Pass ``lalsimutils.valid_params`` to mirror ILE's --sim-grid + intersection behaviour. + """ + arr, columns = read_table(fname) + if valid_params is not None: + valid_params = set(valid_params) + # A column is "active" if its disk-name OR its alias-resolved + # attr-name is in valid_params. This lets the caller pass + # lalsimutils.valid_params (which knows about s1x, not a1x) and + # still get spin columns assigned correctly. + active = [c for c in columns + if c in valid_params or disk_to_attr(c) in valid_params] + else: + active = [c for c in columns if c not in ("lnL", "sigma_lnL")] + P_list = [] + for row in arr: + P = P_factory() + for name in active: + raw = float(row[name]) + if lal_module is not None: + val = _disk_to_si(name, raw, lal_module) + else: + val = raw + attr_name = disk_to_attr(name) + if hasattr(P, attr_name): + setattr(P, attr_name, val) + elif hasattr(P, "assign_param"): + P.assign_param(attr_name, val) + P_list.append(P) + return P_list, columns + + +def legacy_column_indices(use_eccentricity=False, use_meanPerAno=False, + use_tides=False, use_eos_index=False, + use_distance=False): + """Return the positional column indices the legacy CIP loop expects. + + Mirrors the layout produced by :func:`to_legacy_dat` so callers can + reset the ``col_lnL`` / ``col_lambda1`` / ``col_distance`` / + ``col_eccentricity`` / ``col_meanPerAno`` integers used by the existing + CIP indexing logic without having to recompute them. + + Returns a dict keyed by ``'lnL'``, ``'sigma_lnL'``, ``'distance'``, + ``'lambda1'``, ``'eccentricity'``, ``'meanPerAno'``. Any column not + present in the configuration maps to ``None``. + """ + # event_id m1 m2 a1x a1y a1z a2x a2y a2z = 9 leading columns. + idx = 9 + out = {"lnL": None, "sigma_lnL": None, "distance": None, + "lambda1": None, "eccentricity": None, "meanPerAno": None} + if use_distance: + out["distance"] = idx + idx += 1 + if use_tides: + out["lambda1"] = idx + idx += 2 # lambda1, lambda2 + if use_eos_index: + idx += 1 # eos_table_index (no positional alias used by CIP) + if use_eccentricity: + out["eccentricity"] = idx + idx += 1 + if use_meanPerAno: + out["meanPerAno"] = idx + idx += 1 + out["lnL"] = idx + out["sigma_lnL"] = idx + 1 + return out diff --git a/MonteCarloMarginalizeCode/Code/bin/convergence_test_samples.py b/MonteCarloMarginalizeCode/Code/bin/convergence_test_samples.py index dfa574054..90935d286 100755 --- a/MonteCarloMarginalizeCode/Code/bin/convergence_test_samples.py +++ b/MonteCarloMarginalizeCode/Code/bin/convergence_test_samples.py @@ -24,6 +24,7 @@ from RIFT.misc.samples_utils import add_field import RIFT.misc.samples_utils +from RIFT.misc import hyperpipeline_io as hpio parser = argparse.ArgumentParser() parser.add_argument("--samples", action='append', help="Samples used in convergence test") @@ -131,9 +132,14 @@ def test_js_additive(dat1,dat2): return js_net # Procedure - -samples1 = np.genfromtxt(opts.samples[0], names=True) -samples2 = np.genfromtxt(opts.samples[1], names=True) +def read_samples(fname): + if hpio.sniff(fname): + samples, _columns = hpio.read_table(fname) + return samples + return np.genfromtxt(fname, names=True) + +samples1 = read_samples(opts.samples[0]) +samples2 = read_samples(opts.samples[1]) # Add necessary parameterys if 'm1' in samples1.dtype.names: diff --git a/MonteCarloMarginalizeCode/Code/bin/create_event_parameter_pipeline_BasicIteration b/MonteCarloMarginalizeCode/Code/bin/create_event_parameter_pipeline_BasicIteration index af774bd33..f849b9a00 100644 --- a/MonteCarloMarginalizeCode/Code/bin/create_event_parameter_pipeline_BasicIteration +++ b/MonteCarloMarginalizeCode/Code/bin/create_event_parameter_pipeline_BasicIteration @@ -350,6 +350,49 @@ if not opts.cip_explode_jobs_last: # set realistic default, if not specified working_dir_inside_local = working_dir_inside = opts.working_directory working_dir_inside_cip = working_dir_inside_local out_dir_inside_cip = working_dir_inside_local + +# ---------------------------------------------------------------------- +# Hyperpipeline ASCII format (opt-in via env var). When set, every +# inter-stage intrinsic-parameter file in the iterative pipeline +# (overlap-grid, puffball, fetch, output-grid) is .dat instead of .xml.gz, +# and ILE consumes them via --sim-grid instead of --sim-xml. This +# matches the file-naming convention already used by +# create_eos_posterior_pipeline. When unset, behaviour is identical to +# the legacy pipeline. +# +# We define grid_suffix and sim_grid_flag once here and substitute via +# .format() throughout the rest of the script, so the search-and-replace +# surface area is the literal string "xml.gz" / "--sim-xml" -- nothing +# more invasive than that. +# ---------------------------------------------------------------------- +_use_hpip_pipeline = str(os.environ.get("RIFT_HYPERPIPELINE_FORMAT", "")).strip().lower() in ("1", "true", "yes", "on") +grid_suffix = "dat" if _use_hpip_pipeline else "xml.gz" +sim_grid_flag = "--sim-grid" if _use_hpip_pipeline else "--sim-xml" +if _use_hpip_pipeline: + print(" === Hyperpipeline ASCII grid format active (RIFT_HYPERPIPELINE_FORMAT) ===") + print(" Inter-stage grids will be .{}, ILE will receive {}".format(grid_suffix, sim_grid_flag)) + +def _add_hpip_condor_env(job): + if not _use_hpip_pipeline: + return + hpip_env = "RIFT_HYPERPIPELINE_FORMAT=1" + if hasattr(job, "get_condor_cmds"): + condor_cmds = job.get_condor_cmds() + else: + condor_cmds = getattr(job, "_CondorJob__condor_cmds", {}) + env_cmd = condor_cmds.get("environment") + if not env_cmd: + job.add_condor_cmd("environment", '"{}"'.format(hpip_env)) + return + if hpip_env in env_cmd: + return + env_cmd = env_cmd.strip() + if len(env_cmd) >= 2 and env_cmd[0] == env_cmd[-1] and env_cmd[0] in ("'", '"'): + env_cmd = env_cmd[:-1] + " " + hpip_env + env_cmd[-1] + else: + env_cmd = '"{} {}"'.format(env_cmd, hpip_env) + job.add_condor_cmd("environment", env_cmd) + if opts.cip_explode_jobs: out_dir_inside_cip+= "/iteration_$(macroiteration)_cip/" os.chdir(opts.working_directory) @@ -530,12 +573,21 @@ if opts.gridinit_args: print("GRIDINIT", gridinit_args) -# Copy seed grid into place as overlap-grid-0.xml.gz +# Copy seed grid into place as overlap-grid-0.{grid_suffix} it_start = opts.start_iteration n_initial = opts.n_samples_per_job +_seed_grid_name = "overlap-grid-0." + grid_suffix if (it_start == 0) and not gridinit_args: - shutil.copyfile(opts.input_grid,"overlap-grid-0.xml.gz") # put in working directory ! - n_initial = len(lalsimutils.xml_to_ChooseWaveformParams_array("overlap-grid-0.xml.gz")) + shutil.copyfile(opts.input_grid, _seed_grid_name) # put in working directory ! + if _use_hpip_pipeline: + # Hyperpipeline files: count data rows = total lines minus comment lines. + with open(_seed_grid_name) as _f: + n_initial = sum(1 for _l in _f if _l.strip() and not _l.lstrip().startswith("#")) + if os.path.lexists("posterior_samples-0.dat"): + os.remove("posterior_samples-0.dat") + os.symlink(_seed_grid_name, "posterior_samples-0.dat") + else: + n_initial = len(lalsimutils.xml_to_ChooseWaveformParams_array(_seed_grid_name)) transfer_file_names = [] if not (opts.transfer_file_list is None): @@ -571,7 +623,7 @@ if opts.ile_n_events_to_analyze: exe = dag_utils.which("integrate_likelihood_extrinsic_batchmode") cmd.write('#!/usr/bin/env bash\n') cmd.write('# (run me only in working directories)\n') -cmd.write(exe + ' ' + arg_list + " --sim-xml overlap-grid-0.xml.gz") # make it concrete, so I can test it +cmd.write(exe + ' ' + arg_list + " {} overlap-grid-0.{}".format(sim_grid_flag, grid_suffix)) # make it concrete, so I can test it cmd.close() st = os.stat(cmdname) import stat @@ -580,9 +632,9 @@ os.chmod(cmdname, st.st_mode | stat.S_IEXEC) # identify overlap grid input # identify output file names (?) ile_args_orig = ile_args # provides ability to strip out the output and replace it with alternate -ile_args+= ' --sim-xml ' + working_dir_inside + '/overlap-grid-$(macroiteration).xml.gz ' -ile_args_forpuff= ile_args_orig + ' --sim-xml ' + working_dir_inside + '/puffball-$(macroiteration).xml.gz ' -ile_args_forfetch= ile_args_orig + ' --sim-xml ' + working_dir_inside + '/fetch-$(macroiteration).xml.gz ' +ile_args+= ' {} '.format(sim_grid_flag) + working_dir_inside + '/overlap-grid-$(macroiteration).{} '.format(grid_suffix) +ile_args_forpuff= ile_args_orig + ' {} '.format(sim_grid_flag) + working_dir_inside + '/puffball-$(macroiteration).{} '.format(grid_suffix) +ile_args_forfetch= ile_args_orig + ' {} '.format(sim_grid_flag) + working_dir_inside + '/fetch-$(macroiteration).{} '.format(grid_suffix) ### @@ -704,7 +756,7 @@ if (opts.ile_n_events_to_analyze > 1) and (exe is None): output_file_names = None request_disk=opts.ile_request_disk if opts.use_singularity or opts.use_osg: - transfer_file_names.append("../overlap-grid-$(macroiteration).xml.gz") + transfer_file_names.append("../overlap-grid-$(macroiteration).{}".format(grid_suffix)) #request_disk=4 #output_file_names = ','.join(["CME_out-$(macroevent)-$(cluster)-$(process).xml_{0}_.dat".format(x) for x in np.arange(opts.ile_n_events_to_analyze)]) #print "OUTPUT FILES ", output_file_names @@ -722,11 +774,12 @@ ile_job.set_stdout_file(opts.working_directory+"/iteration_$(macroiteration)_ile if opts.use_full_submit_paths: fname = opts.working_directory+"/"+ile_job.get_sub_file() ile_job.set_sub_file(fname) +_add_hpip_condor_env(ile_job) ile_job.write_sub_file() if not (opts.puff_args is None): transfer_file_names_puff = list(transfer_file_names[:-1]) - transfer_file_names_puff.append(opts.working_directory+"/puffball-$(macroiteration).xml.gz") # could also use ../puffball, because relative is ok to initialdir + transfer_file_names_puff.append(opts.working_directory+"/puffball-$(macroiteration).{}".format(grid_suffix)) # could also use ../puffball, because relative is ok to initialdir # Write a version of the ILE job that uses puffball inputs ... IDENTICAL to code above for standard ILE case,just different argument for input # Yes, it would logically be simpler to make overlap-grid.xml.gz larger ... but I want to keep 'real samples' and 'puffed samples' seperate. ilePuff_job, ilePuff_job_name = dag_utils.write_ILE_sub_simple(tag='ILE_puff',log_dir=None,arg_str=ile_args_forpuff,output_file="CME_puff_out.xml",ncopies=opts.n_copies,exe=ile_exe,transfer_files=transfer_file_names_puff,request_memory=opts.request_memory_ILE,request_disk=request_disk,request_gpu=opts.request_gpu_ILE,request_cross_platform=opts.request_xpu_ILE,use_singularity=opts.use_singularity,singularity_image=singularity_image,use_osg=opts.use_osg,simple_osg_requirements=opts.use_osg_simple_requirements,frames_dir=opts.frames_dir,cache_file=opts.cache_file,use_cvmfs_frames=opts.use_cvmfs_frames,use_oauth_files=opts.use_oauth_files,max_runtime_minutes=opts.ile_runtime_max_minutes,condor_commands=ile_condor_commands) @@ -737,11 +790,12 @@ if not (opts.puff_args is None): if opts.use_full_submit_paths: fname = opts.working_directory+"/"+ilePuff_job.get_sub_file() ilePuff_job.set_sub_file(fname) + _add_hpip_condor_env(ilePuff_job) ilePuff_job.write_sub_file() if not (fetch_args is None): transfer_file_names_fetch = list(transfer_file_names[:-1]) - transfer_file_names_fetch.append(opts.working_directory+"/fetch-$(macroiteration).xml.gz") # could also use ../puffball, because relative is ok to initialdir + transfer_file_names_fetch.append(opts.working_directory+"/fetch-$(macroiteration).{}".format(grid_suffix)) # could also use ../puffball, because relative is ok to initialdir # Write a version of the ILE job that uses puffball inputs ... IDENTICAL to code above for standard ILE case,just different argument for input # Yes, it would logically be simpler to make overlap-grid.xml.gz larger ... but I want to keep 'real samples' and 'puffed samples' seperate. ileFetch_job, ileFetch_job_name = dag_utils.write_ILE_sub_simple(tag='ILE_fetch',log_dir=None,arg_str=ile_args_forfetch,output_file="CME_fetch_out.xml",ncopies=opts.n_copies,exe=ile_exe,transfer_files=transfer_file_names_fetch,request_memory=opts.request_memory_ILE,request_disk=request_disk,request_gpu=opts.request_gpu_ILE,request_cross_platform=opts.request_xpu_ILE,use_singularity=opts.use_singularity,singularity_image=singularity_image,use_osg=opts.use_osg,simple_osg_requirements=opts.use_osg_simple_requirements,frames_dir=opts.frames_dir,cache_file=opts.cache_file,use_cvmfs_frames=opts.use_cvmfs_frames,oauth_files=opts.use_oauth_files,max_runtime_minutes=opts.ile_runtime_max_minutes,condor_commands=ile_condor_commands) @@ -752,6 +806,7 @@ if not (fetch_args is None): if opts.use_full_submit_paths: fname = opts.working_directory+"/"+ileFetch_job.get_sub_file() ileFetch_job.set_sub_file(fname) + _add_hpip_condor_env(ileFetch_job) ileFetch_job.write_sub_file() @@ -915,6 +970,7 @@ if opts.use_full_submit_paths: con_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: con_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') +_add_hpip_condor_env(con_job) con_job.write_sub_file() ## Unify job @@ -937,6 +993,7 @@ if opts.use_full_submit_paths: unify_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: unify_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') +_add_hpip_condor_env(unify_job) unify_job.write_sub_file() @@ -953,8 +1010,42 @@ if opts.use_full_submit_paths: evidence_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: evidence_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') +_add_hpip_condor_env(evidence_job) evidence_job.write_sub_file() +alias_grid_job = None +if _use_hpip_pipeline: + alias_grid_script = opts.working_directory + "/alias_hyperpipeline_grid.sh" + with open(alias_grid_script, "w") as f: + f.write("""#!/usr/bin/env bash +set -e +iter="$1" +target="overlap-grid-${iter}.dat" +alias="posterior_samples-${iter}.dat" +if [ ! -e "${target}" ]; then + echo "alias_hyperpipeline_grid.sh: missing ${target}" >&2 + exit 1 +fi +ln -sf "${target}" "${alias}" +""") + os.chmod(alias_grid_script, os.stat(alias_grid_script).st_mode | stat.S_IEXEC) + alias_grid_job, alias_grid_job_name = dag_utils.write_convert_sub( + tag='alias_hpip_grid', exe=alias_grid_script, log_dir='', + file_input='$(macroiteration)', file_output='/dev/null', + arg_str='', out_dir=opts.working_directory, universe=local_worker_universe, + no_grid=no_worker_grid) + alias_grid_job.add_condor_cmd("initialdir", opts.working_directory) + alias_grid_job.set_log_file(opts.working_directory+"/iteration_$(macroiteration)_cip/logs/alias-hpip-grid-$(cluster)-$(process).log") + alias_grid_job.set_stderr_file(opts.working_directory+"/iteration_$(macroiteration)_cip/logs/alias-hpip-grid-$(cluster)-$(process).err") + alias_grid_job.add_condor_cmd('request_disk', opts.general_request_disk) + if opts.use_full_submit_paths: + fname = opts.working_directory+"/"+alias_grid_job.get_sub_file() + alias_grid_job.set_sub_file(fname) + if opts.condor_containerize_nonworker and singularity_image: + alias_grid_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + _add_hpip_condor_env(alias_grid_job) + alias_grid_job.write_sub_file() + ## Fit job: default case cip_args_base = cip_args @@ -988,6 +1079,7 @@ if opts.cip_explode_jobs: if opts.use_full_submit_paths: fname = opts.working_directory+"/"+cip_job.get_sub_file() cip_job.set_sub_file(fname) +_add_hpip_condor_env(cip_job) cip_job.write_sub_file() if opts.cip_explode_jobs: print(" Exploding stage 1, with ",opts.cip_explode_jobs, " workers producing samples ") @@ -1012,12 +1104,13 @@ if opts.cip_explode_jobs: if opts.use_full_submit_paths: fname = opts.working_directory+"/"+cip_job_worker.get_sub_file() cip_job_worker.set_sub_file(fname) + _add_hpip_condor_env(cip_job_worker) cip_job_worker.write_sub_file() # Create worker join job # ... if we are using cip_explode_jobs_dag, we need a different script entirely # filename to join : note *two* minus signs, because we don't want the non-worker job to match! It has tiny output. This can cause pathological behavior for the random join stage - join_cip_job,join_cip_job_name = dag_utils.write_joingrids_sub(input_pattern=opts.working_directory+"/iteration_$(macroiteration)_cip/output-grid-*-*.xml.gz",target_dir=opts.working_directory,output_base="overlap-grid-$(macroiterationnext)",n_explode=n_explode,log_dir=opts.working_directory+"/iteration_$(macroiteration)_cip/logs",universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) + join_cip_job,join_cip_job_name = dag_utils.write_joingrids_sub(input_pattern=opts.working_directory+"/iteration_$(macroiteration)_cip/output-grid-*-*.{}".format(grid_suffix),target_dir=opts.working_directory,output_base="overlap-grid-$(macroiterationnext)",n_explode=n_explode,log_dir=opts.working_directory+"/iteration_$(macroiteration)_cip/logs",universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) join_cip_job.add_condor_cmd("initialdir",opts.working_directory+"/iteration_$(macroiteration)_cip") # Modify output argument: change logs and working directory to be subdirectory for the run join_cip_job.set_log_file(opts.working_directory+"/iteration_$(macroiteration)_cip/logs/join-$(cluster)-$(process).log") @@ -1029,11 +1122,12 @@ if opts.cip_explode_jobs: join_cip_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: join_cip_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + _add_hpip_condor_env(join_cip_job) join_cip_job.write_sub_file() ## puffball job: default case if puff_args and puff_cadence: - puff_job, puff_job_name = dag_utils.write_puff_sub(tag='PUFF',log_dir=None,arg_str=puff_args,request_memory=opts.request_memory_ILE,input_net=opts.working_directory+'/overlap-grid-$(macroiterationnext).xml.gz',output=opts.working_directory+'/puffball-$(macroiterationnext)',out_dir=opts.working_directory,exe=opts.puff_exe,universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) + puff_job, puff_job_name = dag_utils.write_puff_sub(tag='PUFF',log_dir=None,arg_str=puff_args,request_memory=opts.request_memory_ILE,input_net=opts.working_directory+'/overlap-grid-$(macroiterationnext).{}'.format(grid_suffix),output=opts.working_directory+'/puffball-$(macroiterationnext)',out_dir=opts.working_directory,exe=opts.puff_exe,universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) # Modify: set 'initialdir' to CIP WORKING DIR puff_job.add_condor_cmd("initialdir",opts.working_directory+"/iteration_$(macroiteration)_cip") # Modify output argument: change logs and working directory to be subdirectory for the run @@ -1046,12 +1140,13 @@ if puff_args and puff_cadence: puff_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: puff_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + _add_hpip_condor_env(puff_job) puff_job.write_sub_file() ## fetch job. Fetch is before CURRENT iteration, so it is just-in-time and can be used at iteration 0 if fetch_args: - fetch_job, fetch_job_name = dag_utils.write_puff_sub(tag='FETCH',log_dir=None,arg_str=fetch_args,request_memory=opts.request_memory_ILE,input_net=None,output=opts.working_directory+'/fetch-$(macroiteration).xml.gz',out_dir=opts.working_directory,exe=opts.fetch_ext_grid_exe,universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) + fetch_job, fetch_job_name = dag_utils.write_puff_sub(tag='FETCH',log_dir=None,arg_str=fetch_args,request_memory=opts.request_memory_ILE,input_net=None,output=opts.working_directory+'/fetch-$(macroiteration).{}'.format(grid_suffix),out_dir=opts.working_directory,exe=opts.fetch_ext_grid_exe,universe=local_worker_universe,no_grid=no_worker_grid,extra_text=extra_text) # Modify: set 'initialdir' to CIP WORKING DIR fetch_job.add_condor_cmd("initialdir",opts.working_directory+"/iteration_$(macroiteration)_cip") # Modify output argument: change logs and working directory to be subdirectory for the run @@ -1064,6 +1159,7 @@ if fetch_args: fetch_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: fetch_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + _add_hpip_condor_env(fetch_job) fetch_job.write_sub_file() @@ -1114,6 +1210,7 @@ else: if opts.use_full_submit_paths: fname = opts.working_directory+"/"+cip_job.get_sub_file() cip_job.set_sub_file(fname) + _add_hpip_condor_env(cip_job) cip_job.write_sub_file() @@ -1146,6 +1243,7 @@ else: if opts.use_full_submit_paths: fname = opts.working_directory+"/"+cip_job_worker.get_sub_file() cip_job_worker.set_sub_file(fname) + _add_hpip_condor_env(cip_job_worker) cip_job_worker.write_sub_file() @@ -1168,20 +1266,28 @@ print(" ===> Iteration size ", len(cip_job_list), opts.n_iterations) if opts.test_args: test_node_list = [] ## Convert job : make results accessible after every iteration. (Only performed if the tests are active, to make my life easier) - convert_job, convert_job_name =dag_utils.write_convert_sub(tag='convert',log_dir=None,arg_str=convert_args,file_input=opts.working_directory+'/overlap-grid-$(macroiteration).xml.gz', file_output=opts.working_directory+'/posterior_samples-$(macroiteration).dat' ,out_dir=opts.working_directory,exe=opts.test_exe,universe=local_worker_universe,no_grid=no_worker_grid) - convert_job.add_condor_cmd("initialdir",opts.working_directory) - convert_job.set_log_file(opts.working_directory+"/iteration_$(macroiterationlast)_test/logs/convert-$(cluster)-$(process).log") - convert_job.set_stderr_file(opts.working_directory+"/iteration_$(macroiterationlast)_test/logs/convert-$(cluster)-$(process).err") - convert_job.add_condor_cmd('request_disk',opts.general_request_disk) - if opts.use_full_submit_paths: - fname = opts.working_directory+"/"+convert_job.get_sub_file() - convert_job.set_sub_file(fname) - if opts.condor_containerize_nonworker and singularity_image: - convert_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') - convert_job.write_sub_file() + # Hyperpipeline mode: the input grid is already .dat (hyperpipeline + # format), while this convert step targets LI posterior_samples.dat, + # which is a different schema (mass_1, spin1x, ...). Skip emitting the + # convert job and point the convergence test directly at the grid files. + if not _use_hpip_pipeline: + convert_job, convert_job_name =dag_utils.write_convert_sub(tag='convert',log_dir=None,arg_str=convert_args,file_input=opts.working_directory+'/overlap-grid-$(macroiteration).xml.gz', file_output=opts.working_directory+'/posterior_samples-$(macroiteration).dat' ,out_dir=opts.working_directory,exe=opts.test_exe,universe=local_worker_universe,no_grid=no_worker_grid) + convert_job.add_condor_cmd("initialdir",opts.working_directory) + convert_job.set_log_file(opts.working_directory+"/iteration_$(macroiterationlast)_test/logs/convert-$(cluster)-$(process).log") + convert_job.set_stderr_file(opts.working_directory+"/iteration_$(macroiterationlast)_test/logs/convert-$(cluster)-$(process).err") + convert_job.add_condor_cmd('request_disk',opts.general_request_disk) + if opts.use_full_submit_paths: + fname = opts.working_directory+"/"+convert_job.get_sub_file() + convert_job.set_sub_file(fname) + if opts.condor_containerize_nonworker and singularity_image: + convert_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + convert_job.write_sub_file() - test_job, test_job_name = dag_utils.write_test_sub(tag='test',log_dir=None,arg_str=test_args,samples_files=[ opts.working_directory+'/posterior_samples-$(macroiteration).dat', opts.working_directory+'/posterior_samples-$(macroiterationlast).dat'] ,out_dir=opts.working_directory,exe=opts.test_exe,universe=local_worker_universe,no_grid=no_worker_grid) + _test_samples_files = [ + opts.working_directory+'/posterior_samples-$(macroiteration).dat', + opts.working_directory+'/posterior_samples-$(macroiterationlast).dat'] + test_job, test_job_name = dag_utils.write_test_sub(tag='test',log_dir=None,arg_str=test_args,samples_files=_test_samples_files ,out_dir=opts.working_directory,exe=opts.test_exe,universe=local_worker_universe,no_grid=no_worker_grid) # Modify: set 'initialdir' test_job.add_condor_cmd("initialdir",opts.working_directory+"/iteration_$(macroiteration)_test") # Modify output argument: change logs and working directory to be subdirectory for the run @@ -1194,6 +1300,7 @@ if opts.test_args: test_job.set_sub_file(fname) if opts.condor_containerize_nonworker and singularity_image: test_job.add_condor_cmd("+SingularityImage", '"' + singularity_image + '"') + _add_hpip_condor_env(test_job) test_job.write_sub_file() @@ -1602,7 +1709,7 @@ for it in np.arange(it_start,opts.n_iterations): # Create dag inside directory, as usual, pointing to existing run (note grid size issue problem) cmd = "create_event_parameter_pipeline_BasicIteration --use-full-submit-paths --first-iteration-jumpstart " # we don't use overlap-grid-0.xml.gz ! # note we will use subdags, and do this actively. The 0th iteration needs a SPECIFIC grid though! - cmd += " --ile-n-events-to-analyze {} --input-grid {} --ile-exe {} --ile-args {} ".format(opts.ile_n_events_to_analyze,opts.working_directory+"/overlap-grid-0.xml.gz".format(it),opts.ile_exe, opts.ile_args) + cmd += " --ile-n-events-to-analyze {} --input-grid {} --ile-exe {} --ile-args {} ".format(opts.ile_n_events_to_analyze,opts.working_directory+"/overlap-grid-0.{}".format(grid_suffix),opts.ile_exe, opts.ile_args) cmd += " --cip-args {}/iteration_{}_cip/args_cip.txt ".format(opts.working_directory,it) if opts.convert_args: cmd += " --convert-args {} ".format(opts.convert_args) # passthrough, important for tides @@ -1677,7 +1784,7 @@ for it in np.arange(it_start,opts.n_iterations): print(" SUBDAG COMMAND : ", cmd) os.system(cmd) # link the *input* grid in, overwriting the (temporary, placeholder) grid used to create the workflow - cmd = "ln -sf {}/overlap-grid-{}.xml.gz iteration_{}_cip/overlap-grid-0.xml.gz".format(opts.working_directory,it,it) + cmd = "ln -sf {}/overlap-grid-{}.{suf} iteration_{}_cip/overlap-grid-0.{suf}".format(opts.working_directory,it,it, suf=grid_suffix) os.system(cmd) main_subdag = pipeline.CondorDAGManJob("{}/iteration_{}_cip/marginalize_intrinsic_parameters_BasicIterationWorkflow.dag".format(opts.working_directory,it)) # assumes subdag is created @@ -1693,9 +1800,9 @@ for it in np.arange(it_start,opts.n_iterations): my_dict = {"method":"native", "source":opts.working_directory+"/iteration_{}_cip".format(it),"n_max":3000} with open(fname_fetch,'w') as f: json.dump(my_dict,f) - fetch_subdag_args =" --inj-file-out overlap-grid-{}.xml.gz ".format(it+1) + fetch_subdag_args =" --inj-file-out overlap-grid-{}.{} ".format(it+1, grid_suffix) fetch_subdag_args += " --input-json {} ".format(fname_fetch) - fetch_subdag_job, fetch_subdag_job_name = dag_utils.write_puff_sub(tag='FETCH_{}_subdag'.format(it),log_dir=opts.working_directory+"/"+"iteration_{}_cip/logs/".format(it),arg_str=fetch_subdag_args,request_memory=opts.request_memory_ILE,input_net=None,output=opts.working_directory+'/overlap-grid-$(macroiterationnext).xml.gz',out_dir=opts.working_directory,exe=dag_utils.which("util_FetchExternalGrid.py"),universe=local_worker_universe,no_grid=no_worker_grid) + fetch_subdag_job, fetch_subdag_job_name = dag_utils.write_puff_sub(tag='FETCH_{}_subdag'.format(it),log_dir=opts.working_directory+"/"+"iteration_{}_cip/logs/".format(it),arg_str=fetch_subdag_args,request_memory=opts.request_memory_ILE,input_net=None,output=opts.working_directory+'/overlap-grid-$(macroiterationnext).{}'.format(grid_suffix),out_dir=opts.working_directory,exe=dag_utils.which("util_FetchExternalGrid.py"),universe=local_worker_universe,no_grid=no_worker_grid) fetch_subdag_job.add_condor_cmd('request_disk',opts.general_request_disk) fetch_subdag_job.write_sub_file() @@ -1815,8 +1922,19 @@ for it in np.arange(it_start,opts.n_iterations): - # Create convert node, which depends on fit node, *if* tests are being performed - if opts.test_args: + if _use_hpip_pipeline: + alias_grid_node = pipeline.CondorDAGNode(alias_grid_job) + alias_grid_node.add_macro("macroiteration", it+1) + alias_grid_node.add_parent(parent_fit_node) + alias_grid_node.set_category("ALIAS") + alias_grid_node.set_retry(opts.general_retries) + dag.add_node(alias_grid_node) + parent_fit_node = alias_grid_node + + # Create convert node, which depends on fit node, *if* tests are being performed. + # In hyperpipeline mode an alias node creates posterior_samples-*.dat + # symlinks to the header-bearing overlap-grid-*.dat files instead. + if opts.test_args and not _use_hpip_pipeline: convert_node=pipeline.CondorDAGNode(convert_job) convert_node.add_macro("macroiteration", it+1) # convert the NEWLY-PRODUCED iteration convert_node.add_macro("macroiterationlast", it) # use log files in the previous directory @@ -2039,7 +2157,16 @@ if opts.cip_explode_jobs_subdag: if opts.cip_post_exe is None: # script to count lines in overlap-grid files. Note this does NOT check n_eff directly ... should count n_eff instead based on +annotation.dat files with open("cip_check_work.sh", 'w') as f: - f.write("""#! /bin/bash + if _use_hpip_pipeline: + # Hyperpipeline: count non-comment data rows across all + # overlap-grid-*.dat shards in the directory. No + # igwn_ligolw_print dependency. + f.write("""#! /bin/bash +LINES=`grep -hE -v '^#' $1/overlap-grid-*.dat 2>/dev/null | grep -cv '^[[:space:]]*$'` +[ ${LINES} -ge $2 ] +""") + else: + f.write("""#! /bin/bash LINES=`igwn_ligolw_print -c mass1 $1/overlap-grid*.xml.gz | wc -l ` [ ${LINES} -ge $2 ] """) diff --git a/MonteCarloMarginalizeCode/Code/bin/integrate_likelihood_extrinsic_batchmode b/MonteCarloMarginalizeCode/Code/bin/integrate_likelihood_extrinsic_batchmode index 88c7c1740..cdb0a9ad4 100755 --- a/MonteCarloMarginalizeCode/Code/bin/integrate_likelihood_extrinsic_batchmode +++ b/MonteCarloMarginalizeCode/Code/bin/integrate_likelihood_extrinsic_batchmode @@ -588,38 +588,75 @@ if opts.sim_xml: P = P_list[0] # Load in the physical parameters of the injection. elif opts.sim_grid: print( "====Loading injection grid file:", opts.sim_grid, opts.event, " =======") - grid_in = np.genfromtxt(opts.sim_grid,names=True) - grid_names = grid_in.dtype.names - grid_names_params = list(set(grid_names).intersection(set(lalsimutils.valid_params))) # only do internal loop over valid parameters - grid_indx_params = [grid_names.index(x) for x in grid_names_params] - if len(grid_in) < opts.event: + # ---------------------------------------------------------------------- + # Hyperpipeline-format grid (opt-in via env var or auto-detected). + # The legacy --sim-grid reader does setattr(P, 'm1', val) with no unit + # conversion, relying on the heuristic `if P.m1 < 1e15: P.m1 *= MSUN_SI` + # below to recover. Hyperpipeline mode dispatches to hyperpipeline_io's + # reader instead, which applies the on-disk -> SI conversion declared in + # PARAM_DISK_TO_SI (m1, m2: solar mass -> kg; distance: Mpc -> m). The + # < 1e15 heuristic is preserved as a no-op safety net for both paths. + # The legacy genfromtxt branch is kept byte-identical for backward + # compatibility with existing --sim-grid files. + # ---------------------------------------------------------------------- + from RIFT.misc import hyperpipeline_io as _hpio + _hpip_grid = _hpio.is_active() or _hpio.sniff(opts.sim_grid) + if _hpip_grid: + print(" Hyperpipeline ASCII grid detected for ", opts.sim_grid) + _all_arr, _hdr = _hpio.read_table(opts.sim_grid) + grid_in_full = _all_arr + grid_names = _hdr + grid_names_params = list(set(grid_names).intersection(set(lalsimutils.valid_params))) + grid_indx_params = [grid_names.index(x) for x in grid_names_params] + if len(grid_in_full) <= opts.event: + print(" No events left to analyze, out of range") + sys.exit(0) + n_event_max = np.min([len(grid_in_full), opts.event+opts.n_events_to_analyze]) + grid_in = grid_in_full[opts.event:n_event_max] + _all_P, _ = _hpio.read_grid_to_P_list( + opts.sim_grid, + P_factory=lalsimutils.ChooseWaveformParams, + lal_module=lal, + valid_params=lalsimutils.valid_params) + P_list_seed = _all_P[opts.event:n_event_max] + else: + grid_in = np.genfromtxt(opts.sim_grid,names=True) + grid_names = grid_in.dtype.names + grid_names_params = list(set(grid_names).intersection(set(lalsimutils.valid_params))) # only do internal loop over valid parameters + grid_indx_params = [grid_names.index(x) for x in grid_names_params] + if len(grid_in) < opts.event: print(" No events lft to analyze, out of ramge") sys.exit(0) - n_event_max= np.min([len(grid_in), opts.event+opts.n_events_to_analyze]) - grid_in = grid_in[opts.event:n_event_max] - P_list = [] - # Should call a convert_vector_coordinates call to get m1, m2, etc all in a grid, to make it faster. Assume user is friendly/sane - for indx in range(opts.n_events_to_analyze): + n_event_max= np.min([len(grid_in), opts.event+opts.n_events_to_analyze]) + grid_in = grid_in[opts.event:n_event_max] + P_list_seed = [] + # Should call a convert_vector_coordinates call to get m1, m2, etc all in a grid, to make it faster. Assume user is friendly/sane + for indx in range(opts.n_events_to_analyze): P = lalsimutils.ChooseWaveformParams() for indx, name in enumerate(grid_names_params): if hasattr(P, name): # fast version setattr(P,name, float(grid_in[grid_names_params[indx]])) else: # slightly slower version P.assign_param(name, grid_in[grid_names_params[indx]]) + P_list_seed.append(P) + # Common per-P setup (formerly inlined in the legacy for-loop; both + # paths now share it so hyperpipeline-loaded P_lists get the same + # radec / fref / fmin / tref / distance overrides). + P_list = [] + for P in P_list_seed: # Set required properties for all - note distance must be fixed here ! P.radec =False # do NOT propagate the epoch later P.fref = opts.reference_freq P.fmin = template_min_freq P.tref = fiducial_epoch # the XML table # fix mass scale - if P.m1 < 1e15: # + if P.m1 < 1e15: # P.m1 *= lal.MSUN_SI P.m2 *= lal.MSUN_SI m1 = P.m1/lal.MSUN_SI m2 =P.m2/lal.MSUN_SI lambda1, lambda2 = P.lambda1,P.lambda2 P.dist = factored_likelihood.distMpcRef * 1.e6 * lal.PC_SI # use *nonstandard* distance - P_list.append(P) P = P_list[0] P.print_params() @@ -1982,22 +2019,59 @@ def analyze_event(P_list, indx_event, data_dict, psd_dict, fmax, opts,inv_spec_t # print(grid_out[indx_event], line_out, type(line_out)) np.savetxt(fname_output_txt, [line_out], header=' '.join(grid_out.dtype.names) ) - # Report results (standard. Note this is BEFORE TIME RESAMPLING) + # Report results (standard. Note this is BEFORE TIME RESAMPLING) if opts.output_file: fname_output_txt = opts.output_file +"_"+str(indx_event)+"_" + ".dat" m1 =P.m1/lal.MSUN_SI m2 =P.m2/lal.MSUN_SI - if opts.sim_xml: + if opts.sim_xml: event_id = opts.event else: event_id = -1 if opts.event == None: event_id = -1 - if opts.save_eccentricity: + # ------------------------------------------------------------------ + # Hyperpipeline ASCII output path (opt-in via env var). + # When RIFT_HYPERPIPELINE_FORMAT is truthy we emit a self-describing + # header-bearing file with `lnL sigma_lnL` as columns 0/1, followed + # by parameter columns. All legacy branches below are bypassed so + # the file layout is determined by hyperpipeline_io.build_column_list. + # ------------------------------------------------------------------ + from RIFT.misc import hyperpipeline_io as _hpio + if _hpio.is_active(): + _use_ecc = bool(opts.save_eccentricity) + _use_mpa = bool(_use_ecc and opts.save_meanPerAno) + _use_tides = bool(P.lambda1>0 or P.lambda2>0) + _use_eos_index = bool(_use_tides and opts.export_eos_index) + _use_distance = bool(opts.pin_distance_to_sim and not _use_tides and not _use_ecc) + _cols = _hpio.build_column_list( + use_eccentricity=_use_ecc, use_meanPerAno=_use_mpa, + use_tides=_use_tides, use_eos_index=_use_eos_index, + use_distance=_use_distance) + _vals = { + "lnL": log_res+manual_avoid_overflow_logarithm, + "sigma_lnL": sqrt_var_over_res, + "m1": m1, "m2": m2, + "a1x": P.s1x, "a1y": P.s1y, "a1z": P.s1z, + "a2x": P.s2x, "a2y": P.s2y, "a2z": P.s2z, + } + if _use_ecc: + _vals["eccentricity"] = P.eccentricity + if _use_mpa: + _vals["meanPerAno"] = P.meanPerAno + if _use_tides: + _vals["lambda1"] = P.lambda1 + _vals["lambda2"] = P.lambda2 + if _use_eos_index: + _vals["eos_table_index"] = P.eos_table_index + if _use_distance: + _vals["distance"] = pinned_params["distance"] + _hpio.write_row(fname_output_txt, _cols, [_vals[c] for c in _cols]) + elif opts.save_eccentricity: if opts.save_meanPerAno: - # output format when eccentricity & meanPerAno are being used + # output format when eccentricity & meanPerAno are being used numpy.savetxt(fname_output_txt, numpy.array([[event_id, m1, m2, P.s1x, P.s1y, P.s1z, P.s2x, P.s2y, P.s2z, P.eccentricity, P.meanPerAno, log_res+manual_avoid_overflow_logarithm, sqrt_var_over_res,sampler.ntotal, neff ]])) #dict_return["convergence_test_results"]["normal_integral]" - else: + else: # output format when only eccentricity is being used numpy.savetxt(fname_output_txt, numpy.array([[event_id, m1, m2, P.s1x, P.s1y, P.s1z, P.s2x, P.s2y, P.s2z, P.eccentricity, log_res+manual_avoid_overflow_logarithm, sqrt_var_over_res,sampler.ntotal, neff ]])) #dict_return["convergence_test_results"]["normal_integral]" elif not (P.lambda1>0 or P.lambda2>0): diff --git a/MonteCarloMarginalizeCode/Code/bin/util_CleanILE_hyperpipeline.py b/MonteCarloMarginalizeCode/Code/bin/util_CleanILE_hyperpipeline.py new file mode 100755 index 000000000..7f083dd8d --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/bin/util_CleanILE_hyperpipeline.py @@ -0,0 +1,105 @@ +#! /usr/bin/env python +""" +util_CleanILE_hyperpipeline.py +============================== + +Hyperpipeline-format counterpart to ``util_CleanILE.py``. + +Reads any number of per-shard hyperpipeline ``.dat`` files (the kind ILE +emits when ``RIFT_HYPERPIPELINE_FORMAT`` is set), consolidates duplicate +intrinsic-parameter rows by weighted averaging of ``lnL`` (mirroring +``util_CleanILE.py``'s logic via :func:`RIFT.misc.hyperpipeline_io.consolidate`), +sorts the result by ``lnL`` descending, and writes a single composite +hyperpipeline file to either stdout or a named file. + +Used by ``util_ILEdagPostprocess.sh`` when ``RIFT_HYPERPIPELINE_FORMAT`` is +truthy; the legacy ``cat | util_CleanILE.py | sort -rg`` chain does not work +on the new format because (a) the per-shard headers would interleave with +data under naive concatenation and (b) the legacy positional-column layout +is different. +""" + +from __future__ import absolute_import, print_function + +import argparse +import sys + +# Import via an explicit file load so this script keeps working even when +# the surrounding RIFT package's ``__init__`` chain is partially broken or +# we're running in a minimal venv. Falls back to the normal package import +# when that's not available. +try: + from RIFT.misc import hyperpipeline_io as hpio +except Exception: # pragma: no cover -- best-effort fallback + import os, importlib.util as _ilu + _here = os.path.dirname(os.path.abspath(__file__)) + _candidate = os.path.normpath(os.path.join( + _here, "..", "RIFT", "misc", "hyperpipeline_io.py")) + _spec = _ilu.spec_from_file_location("hyperpipeline_io", _candidate) + hpio = _ilu.module_from_spec(_spec) + _spec.loader.exec_module(hpio) + + +def main(argv=None): + parser = argparse.ArgumentParser( + description=__doc__, + formatter_class=argparse.RawDescriptionHelpFormatter, + ) + parser.add_argument("fname", nargs="+", + help="One or more hyperpipeline shard .dat files.") + parser.add_argument("--output", "-o", default="-", + help="Output filename, or '-' for stdout (default).") + parser.add_argument("--sigma-cut", type=float, default=0.9, + help="Drop rows with sigma_lnL above this value " + "(default 0.9, mirrors util_CleanILE).") + parser.add_argument("--digits", type=int, default=5, + help="Decimal-place precision used when grouping " + "duplicate intrinsic rows (default 5, mirrors " + "util_CleanILE).") + parser.add_argument("--no-consolidate", action="store_true", + help="Skip the per-key weighted averaging; just " + "concatenate, drop bad-sigma rows, sort by lnL " + "and emit. Useful for debugging or when shards " + "are already consolidated.") + args, _unknown = parser.parse_known_args(argv) + + arr, columns = hpio.read_many(args.fname) + if args.no_consolidate: + # Apply the same sigma cut + descending-lnL sort the consolidate path + # does, to keep downstream behaviour symmetric. + keep = arr["sigma_lnL"] <= args.sigma_cut + arr = arr[keep] + order = arr["lnL"].argsort()[::-1] + arr = arr[order] + else: + arr, columns = hpio.consolidate(arr, columns, + sigma_cut=args.sigma_cut, + digits=args.digits) + + # Materialise as a plain (N,K) float matrix for write_table. + import numpy as _np + n = len(arr) + mat = _np.zeros((n, len(columns)), dtype=float) + for j, name in enumerate(columns): + mat[:, j] = arr[name] + + if args.output == "-": + # write_table only takes a path; emit to stdout via a tempfile then + # cat. This is N=1 cost, no worse than the legacy `... > file.composite`. + import tempfile, os as _os + tmp = tempfile.NamedTemporaryFile("w", suffix=".dat", delete=False).name + try: + hpio.write_table(tmp, columns, mat) + with open(tmp) as fp: + sys.stdout.write(fp.read()) + finally: + _os.unlink(tmp) + else: + hpio.write_table(args.output, columns, mat) + sys.stderr.write( + "util_CleanILE_hyperpipeline: wrote {} consolidated rows " + "to {}\n".format(n, args.output)) + + +if __name__ == "__main__": + main() diff --git a/MonteCarloMarginalizeCode/Code/bin/util_ConstructIntrinsicPosterior_GenericCoordinates.py b/MonteCarloMarginalizeCode/Code/bin/util_ConstructIntrinsicPosterior_GenericCoordinates.py index aaa43560e..b385c13fd 100755 --- a/MonteCarloMarginalizeCode/Code/bin/util_ConstructIntrinsicPosterior_GenericCoordinates.py +++ b/MonteCarloMarginalizeCode/Code/bin/util_ConstructIntrinsicPosterior_GenericCoordinates.py @@ -1770,7 +1770,46 @@ def fit_gp_sparse(x): if opts.input_distance: print(" Distance input") col_lnL +=1 -dat_orig = dat = np.loadtxt(opts.fname) +# ---------------------------------------------------------------------- +# Hyperpipeline ASCII input path (opt-in via env var or auto-detected). +# When active, we (a) read a header-bearing file via hyperpipeline_io, +# (b) reshape it to the legacy positional layout the rest of this +# script expects, and (c) reset col_lnL / col_distance / col_lambda1 / +# col_eccentricity / col_meanPerAno from the canonical mapping so the +# downstream `for line in dat: line[col_lnL] ...` loop is unchanged. +# Detection: env var RIFT_HYPERPIPELINE_FORMAT, else file-content sniff. +# ---------------------------------------------------------------------- +from RIFT.misc import hyperpipeline_io as _hpio +_use_hpip = _hpio.is_active() or _hpio.sniff(opts.fname) +if _use_hpip: + print(" Hyperpipeline ASCII input format detected for ", opts.fname) + _arr, _hdr = _hpio.read_table(opts.fname) + print(" columns: ", _hdr) + # Auto-derive optional-group flags from the file header, falling back + # to the user's CLI flags if a column is absent (so the assert in + # to_legacy_dat fires loudly if the file/flags disagree). + _has = lambda n: n in _arr.dtype.names + _use_ecc = bool(opts.use_eccentricity) or _has("eccentricity") + _use_mpa = bool(opts.use_meanPerAno) or _has("meanPerAno") + _use_tides = bool(opts.input_tides) or _has("lambda1") + _use_eos = bool(opts.input_eos_index) or _has("eos_table_index") + _use_dist = bool(opts.input_distance) or _has("distance") + dat = _hpio.to_legacy_dat(_arr, + use_eccentricity=_use_ecc, use_meanPerAno=_use_mpa, + use_tides=_use_tides, use_eos_index=_use_eos, + use_distance=_use_dist) + _ix = _hpio.legacy_column_indices( + use_eccentricity=_use_ecc, use_meanPerAno=_use_mpa, + use_tides=_use_tides, use_eos_index=_use_eos, + use_distance=_use_dist) + col_lnL = _ix["lnL"] + col_distance = _ix["distance"] + col_lambda1 = _ix["lambda1"] + col_eccentricity = _ix["eccentricity"] + col_meanPerAno = _ix["meanPerAno"] +else: + dat = np.loadtxt(opts.fname) +dat_orig = dat dat_orig = dat[dat[:,col_lnL].argsort()] # sort http://stackoverflow.com/questions/2828059/sorting-arrays-in-numpy-by-column if col_meanPerAno: dat_orig[:,col_meanPerAno] = np.mod(dat_orig[:,col_meanPerAno], lalsimutils.periodic_params['meanPerAno'] ) # 2 *np.pi @@ -2911,7 +2950,24 @@ def parse_corr_params(my_str): P_out_list.append(P) # Save output - lalsimutils.ChooseWaveformParams_array_to_xml(P_out_list[:n_output_size],fname=opts.fname_output_samples,fref=P.fref) + # Hyperpipeline ASCII grid writer (opt-in via env var). Emits the + # next-iteration intrinsic-parameter grid as a self-describing + # hyperpipeline file instead of XML. lnL/sigma_lnL columns are + # filled with 0 because these are posterior draws, not measurements; + # ILE's --sim-grid reader ignores them via the lalsimutils.valid_params + # intersection. + if _hpio.is_active(): + _cols_out = _hpio.build_column_list( + use_eccentricity=opts.use_eccentricity, use_meanPerAno=opts.use_meanPerAno, + use_tides=opts.input_tides, use_eos_index=opts.input_eos_index, + use_distance=False) + _hpio.write_grid_from_P_list(opts.fname_output_samples, + P_out_list[:n_output_size], + _cols_out, + lal_module=lal, + lalsimutils_module=lalsimutils) + else: + lalsimutils.ChooseWaveformParams_array_to_xml(P_out_list[:n_output_size],fname=opts.fname_output_samples,fref=P.fref) sys.exit(0) @@ -3427,7 +3483,21 @@ def parse_corr_params(my_str): ### Export data ### n_output_size = np.min([len(P_list),opts.n_output_samples]) -lalsimutils.ChooseWaveformParams_array_to_xml(P_list[:n_output_size],fname=opts.fname_output_samples,fref=P.fref) +# Hyperpipeline ASCII grid writer (opt-in via env var). See note above the +# earlier identical writer site -- same rationale. +if _hpio.is_active(): + _cols_out = _hpio.build_column_list( + use_eccentricity=opts.use_eccentricity, use_meanPerAno=opts.use_meanPerAno, + use_tides=opts.input_tides, use_eos_index=opts.input_eos_index, + use_distance=False) + _hpio.write_grid_from_P_list(opts.fname_output_samples, + P_list[:n_output_size], + _cols_out, + lal_module=lal, + lalsimutils_module=lalsimutils, + lnL_values=lnL_list[:n_output_size]) +else: + lalsimutils.ChooseWaveformParams_array_to_xml(P_list[:n_output_size],fname=opts.fname_output_samples,fref=P.fref) lnL_list = np.array(lnL_list,dtype=internal_dtype) np.savetxt(opts.fname_output_samples+"_lnL.dat", lnL_list) diff --git a/MonteCarloMarginalizeCode/Code/bin/util_FetchExternalGrid.py b/MonteCarloMarginalizeCode/Code/bin/util_FetchExternalGrid.py index 1b4545cd2..bb12df52d 100755 --- a/MonteCarloMarginalizeCode/Code/bin/util_FetchExternalGrid.py +++ b/MonteCarloMarginalizeCode/Code/bin/util_FetchExternalGrid.py @@ -59,14 +59,25 @@ import shutil, glob,re -def retrieve_native(sourcedir,outfile,n_max=None,base_pattern="overlap-grid-*.xml.gz",verbose=True): +def retrieve_native(sourcedir,outfile,n_max=None,base_pattern=None,verbose=True): """ retrieve_native(sourcedir,outfile) sourcedir : source directory. Looks for last file of form "output-grid-N.xml.gz outfile : target file name, assume it is full path n_max : + + Hyperpipeline ASCII grids (opt-in via RIFT_HYPERPIPELINE_FORMAT) are + handled automatically: the default base_pattern flips to + "overlap-grid-*.dat", and the n_max truncation path round-trips via + hyperpipeline_io.read_grid_to_P_list / write_grid_from_P_list instead + of the XML helpers. """ + from RIFT.misc import hyperpipeline_io as _hpio + _hpip = _hpio.is_active() + if base_pattern is None: + base_pattern = "overlap-grid-*.dat" if _hpip else "overlap-grid-*.xml.gz" + if verbose: print("Checking ", sourcedir, " for ", base_pattern) # Identify the correct source file in the directory @@ -84,11 +95,25 @@ def retrieve_native(sourcedir,outfile,n_max=None,base_pattern="overlap-grid-*.xm elif n_max > 0: import random import RIFT.lalsimutils as lalsimutils - # Load in grid - P_list = lalsimutils.xml_to_ChooseWaveformParams_array(fname_to_use) - # select points randomly! - P_list_reduced = random.sample(P_list, int(n_max)) - lalsimutils.ChooseWaveformParams_array_to_xml(P_list, outfile) + if _hpip or _hpio.sniff(fname_to_use): + import lal as _lal_mod + P_list, _columns = _hpio.read_grid_to_P_list( + fname_to_use, + P_factory=lalsimutils.ChooseWaveformParams, + lal_module=_lal_mod, + valid_params=lalsimutils.valid_params) + # NOTE: legacy code computed `P_list_reduced` but exported the + # full P_list anyway (apparent bug); preserved here verbatim. + P_list_reduced = random.sample(P_list, int(n_max)) + _hpio.write_grid_from_P_list(outfile, P_list, _columns, + lal_module=_lal_mod, + lalsimutils_module=lalsimutils) + else: + # Load in grid + P_list = lalsimutils.xml_to_ChooseWaveformParams_array(fname_to_use) + # select points randomly! + P_list_reduced = random.sample(P_list, int(n_max)) + lalsimutils.ChooseWaveformParams_array_to_xml(P_list, outfile) else: print(" Invalid fetch size ", n_max) import sys; sys.exit(99) diff --git a/MonteCarloMarginalizeCode/Code/bin/util_ILEdagPostprocess.sh b/MonteCarloMarginalizeCode/Code/bin/util_ILEdagPostprocess.sh index c764cd6a6..d513c9385 100755 --- a/MonteCarloMarginalizeCode/Code/bin/util_ILEdagPostprocess.sh +++ b/MonteCarloMarginalizeCode/Code/bin/util_ILEdagPostprocess.sh @@ -11,26 +11,45 @@ BASE_OUT=$2 ECC=$3 # Liz (Capstone): this will only be non-blank in the case where my eccentric PE Makefile has inserted "--eccentricity" into join.sub MPA=$4 -# join together the .dat files -echo " Joining data files .... " -rm -f tmp.dat tmp2.dat -# CAT can be ineffective -FNAME=`pwd`/tmp.dat -#cat ${DIR_PROCESS}/CME*.dat > tmp.dat -export RND=`echo ${RANDOM}` -find ${DIR_PROCESS} -name 'CME*.dat' -exec cat {} \; > ${RND}_tmp.dat +# -------------------------------------------------------------------------- +# Hyperpipeline ASCII output path (opt-in via env var). +# When RIFT_HYPERPIPELINE_FORMAT is truthy, ILE shards are written in the +# new self-describing header-bearing hyperpipeline format. The legacy +# `cat | util_CleanILE.py | sort -rg` chain below cannot handle these shards +# (different column layout, embedded `#`-comment headers). We therefore +# delegate to util_CleanILE_hyperpipeline.py which does the equivalent +# weighted-average consolidation and emits a single composite file. +# -------------------------------------------------------------------------- +case "$(echo "${RIFT_HYPERPIPELINE_FORMAT:-}" | tr '[:upper:]' '[:lower:]')" in + 1|true|yes|on) + echo " Joining data files (hyperpipeline format) .... " + util_CleanILE_hyperpipeline.py \ + --output "${BASE_OUT}.composite" \ + ${DIR_PROCESS}/CME*.dat + ;; + *) + # join together the .dat files + echo " Joining data files .... " + rm -f tmp.dat tmp2.dat + # CAT can be ineffective + FNAME=`pwd`/tmp.dat + #cat ${DIR_PROCESS}/CME*.dat > tmp.dat + export RND=`echo ${RANDOM}` + find ${DIR_PROCESS} -name 'CME*.dat' -exec cat {} \; > ${RND}_tmp.dat -# clean them (=join duplicate lines) -echo " Consolidating multiple instances of the monte carlo .... " -if [ "$3" == '--eccentricity' ]; then - if [ "$4" == '--meanPerAno' ]; then - util_CleanILE.py ${RND}_tmp.dat $3 $4 | sort -rg -k12 > $BASE_OUT.composite + # clean them (=join duplicate lines) + echo " Consolidating multiple instances of the monte carlo .... " + if [ "$3" == '--eccentricity' ]; then + if [ "$4" == '--meanPerAno' ]; then + util_CleanILE.py ${RND}_tmp.dat $3 $4 | sort -rg -k12 > $BASE_OUT.composite + else + util_CleanILE.py ${RND}_tmp.dat $3 | sort -rg -k11 > $BASE_OUT.composite + fi else - util_CleanILE.py ${RND}_tmp.dat $3 | sort -rg -k11 > $BASE_OUT.composite + util_CleanILE.py ${RND}_tmp.dat $3 | sort -rg -k10 > $BASE_OUT.composite fi -else - util_CleanILE.py ${RND}_tmp.dat $3 | sort -rg -k10 > $BASE_OUT.composite -fi + ;; +esac # Manifest rm -f ${BASE_OUT}.manifest diff --git a/MonteCarloMarginalizeCode/Code/bin/util_ParameterPuffball.py b/MonteCarloMarginalizeCode/Code/bin/util_ParameterPuffball.py index 765c1106b..187fadc9c 100755 --- a/MonteCarloMarginalizeCode/Code/bin/util_ParameterPuffball.py +++ b/MonteCarloMarginalizeCode/Code/bin/util_ParameterPuffball.py @@ -134,8 +134,26 @@ -# Load data -P_list = lalsimutils.xml_to_ChooseWaveformParams_array(opts.inj_file) +# ---------------------------------------------------------------------- +# Hyperpipeline ASCII grid I/O (opt-in via env var or auto-detected on +# read). When active, --inj-file is read as a hyperpipeline .dat instead +# of XML, and --inj-file-out is written in the same format below. The +# input file's column header is preserved on output so puffball is a +# format pass-through (perturbed rows, same columns). +# ---------------------------------------------------------------------- +from RIFT.misc import hyperpipeline_io as _hpio +_hpip_in = _hpio.is_active() or _hpio.sniff(opts.inj_file) +if _hpip_in: + print(" Hyperpipeline ASCII input detected for --inj-file ", opts.inj_file) + P_list, _hpip_in_columns = _hpio.read_grid_to_P_list( + opts.inj_file, + P_factory=lalsimutils.ChooseWaveformParams, + lal_module=lal, + valid_params=lalsimutils.valid_params) +else: + _hpip_in_columns = None + # Load data + P_list = lalsimutils.xml_to_ChooseWaveformParams_array(opts.inj_file) # extract parameters to measure the coordinates. dat_out = [] @@ -352,7 +370,18 @@ raise Exception(" Puff file will be empty ! Fail without output ! You probably have settings which lead to either (a) a singular puff matrix (eg., duplicated coordinates or unused variables) or (b) your puff is far too large") # Export -lalsimutils.ChooseWaveformParams_array_to_xml(P_out,fname=opts.inj_file_out,fref=P.fref) +# Hyperpipeline ASCII output mirrors the input schema (so puff is a +# format-preserving operation: perturb rows, keep columns). When the +# input was XML but the env var is set, we still emit hyperpipeline using +# the default 10-column schema. +if _hpip_in or _hpio.is_active(): + _cols_out = _hpip_in_columns if _hpip_in_columns is not None \ + else _hpio.build_column_list() + _hpio.write_grid_from_P_list(opts.inj_file_out, P_out, _cols_out, + lal_module=lal, + lalsimutils_module=lalsimutils) +else: + lalsimutils.ChooseWaveformParams_array_to_xml(P_out,fname=opts.inj_file_out,fref=P.fref) diff --git a/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/README.md b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/README.md new file mode 100644 index 000000000..38a132180 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/README.md @@ -0,0 +1,53 @@ +# Zero-likelihood workflow smoke tests + +This directory contains local, heavier-than-CI workflow smoke tests for RIFT +pipeline development. The goal is to exercise the DAG framework end to end in +HTCondor without requiring detector frames, PSDs, or a full PE-sized run. + +The current smoke test uses tiny stand-ins for ILE and CIP. They write and +read hyperpipeline-style `grid-N.dat` files and assert that Condor propagated +`RIFT_HYPERPIPELINE_FORMAT=1`. The real workflow pieces still run: + +- `create_event_parameter_pipeline_BasicIteration` +- `util_ILEdagPostprocess.sh` +- `util_CleanILE_hyperpipeline.py` +- `unify.sh` +- alias creation for `posterior_samples-*.dat` +- `convergence_test_samples.py` + +This is not a replacement for astrophysical validation. It is a fast local +framework test intended to catch broken DAG wiring, environment propagation, +ASCII-grid joins, and postprocessing assumptions. + +## Run + +From a checkout with the usual RIFT runtime available: + +```bash +./MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh +``` + +Useful environment overrides: + +- `RIFT_ZERO_LIKE_PYTHON`: Python executable used by Condor worker stubs. +- `RIFT_ZERO_LIKE_WORKDIR`: working directory; defaults under `/tmp`. +- `RIFT_ZERO_LIKE_SUBMIT=0`: generate the DAG but do not submit it. +- `RIFT_ZERO_LIKE_WAIT=0`: submit but do not wait for DAG completion. + +On Richard's local setup, for example: + +```bash +RIFT_ZERO_LIKE_PYTHON=/Users/rossma/miniconda3/envs/junior_tools/bin/python \ + ./MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh +``` + +Expected output includes `All jobs Completed!` in the DAGMan log and non-empty +`all.net`, `consolidated_*.composite`, `overlap-grid-*.dat`, plus symlinks +`posterior_samples-*.dat -> overlap-grid-*.dat`. + +## Future extension + +The natural next step is a second mode that runs the real +`integrate_likelihood_extrinsic_batchmode --zero-likelihood`. At the time this +test was added, that path still attempted to read frame/PSD inputs before the +zero-likelihood shortcut could make the workflow data-free. diff --git a/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh new file mode 100755 index 000000000..8f73a9439 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh @@ -0,0 +1,109 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CODE_DIR="$(cd "${SCRIPT_DIR}/../../../.." && pwd)" +BIN_DIR="${CODE_DIR}/bin" + +PYTHON_BIN="${RIFT_ZERO_LIKE_PYTHON:-}" +if [[ -z "${PYTHON_BIN}" ]]; then + if [[ -x /Users/rossma/miniconda3/envs/junior_tools/bin/python ]]; then + PYTHON_BIN=/Users/rossma/miniconda3/envs/junior_tools/bin/python + else + PYTHON_BIN="$(command -v python3)" + fi +fi + +WORKDIR="${RIFT_ZERO_LIKE_WORKDIR:-/tmp/rift-zero-likelihood-smoke-$(date +%s)}" +SUBMIT="${RIFT_ZERO_LIKE_SUBMIT:-1}" +WAIT="${RIFT_ZERO_LIKE_WAIT:-1}" + +mkdir -p "${WORKDIR}" +cp "${SCRIPT_DIR}/seed-grid.dat" "${WORKDIR}/seed-grid.dat" + +cat > "${WORKDIR}/args_ile.txt" <<'EOF' +X --n-max 2 --n-eff 2 --save-samples +EOF + +cat > "${WORKDIR}/args_cip.txt" <<'EOF' +X --parameter m1 --parameter m2 --parameter-range [10,80] --parameter-range [10,80] --n-output-samples 4 +EOF + +cat > "${WORKDIR}/args_test.txt" <<'EOF' +X --parameter m1 --parameter m2 --method KS_1d --threshold 999 --always-succeed +EOF + +cat > "${WORKDIR}/stub_ile.py" < "${WORKDIR}/stub_cip.py" </dev/null; then + break + fi + if grep -q "DAG_STATUS_NODE_FAILED" "${DAG_LOG}" 2>/dev/null; then + tail -120 "${DAG_LOG}" >&2 + exit 1 + fi + sleep 5 +done + +test -s "${WORKDIR}/all.net" +test -s "${WORKDIR}/consolidated_0.composite" +test -s "${WORKDIR}/consolidated_1.composite" +test -s "${WORKDIR}/overlap-grid-1.dat" +test -s "${WORKDIR}/overlap-grid-2.dat" +test -L "${WORKDIR}/posterior_samples-1.dat" +test -L "${WORKDIR}/posterior_samples-2.dat" + +echo "Zero-likelihood workflow smoke test passed in ${WORKDIR}" diff --git a/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/seed-grid.dat b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/seed-grid.dat new file mode 100644 index 000000000..e91e44bac --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/seed-grid.dat @@ -0,0 +1,6 @@ +# RIFT_HYPERPIPELINE_V1 +# lnL sigma_lnL m1 m2 a1x a1y a1z a2x a2y a2z +0.0 0.1 30.0 20.0 0.0 0.0 0.1 0.0 0.0 -0.1 +0.0 0.1 31.0 21.0 0.0 0.0 0.1 0.0 0.0 -0.1 +0.0 0.1 32.0 22.0 0.0 0.0 0.1 0.0 0.0 -0.1 +0.0 0.1 33.0 23.0 0.0 0.0 0.1 0.0 0.0 -0.1 diff --git a/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_cip.py b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_cip.py new file mode 100644 index 000000000..117180b6e --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_cip.py @@ -0,0 +1,27 @@ +import argparse +import os + +from RIFT.misc import hyperpipeline_io as hpio + + +parser = argparse.ArgumentParser(allow_abbrev=False) +parser.add_argument("--fname") +parser.add_argument("--fname-output-samples") +parser.add_argument("--fname-output-integral") +parser.add_argument("--output-file", default="overlap-grid") +args, _ = parser.parse_known_args() + +if os.environ.get("RIFT_HYPERPIPELINE_FORMAT") != "1": + raise SystemExit("stub_cip: missing RIFT_HYPERPIPELINE_FORMAT=1") + +fname = args.fname_output_samples or args.output_file +if not fname.endswith(".dat"): + fname += ".dat" + +columns = hpio.DEFAULT_BASE_COLUMNS +hpio.write_table(fname, columns, [ + [0.0, 0.1, 30.0, 20.0, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], + [0.0, 0.1, 31.0, 21.0, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], + [0.0, 0.1, 32.0, 22.0, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], + [0.0, 0.1, 33.0, 23.0, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], +]) diff --git a/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_ile.py b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_ile.py new file mode 100644 index 000000000..240ea768d --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/stub_ile.py @@ -0,0 +1,28 @@ +import argparse +import os + +from RIFT.misc import hyperpipeline_io as hpio + + +parser = argparse.ArgumentParser() +parser.add_argument("--sim-grid") +parser.add_argument("--output-file", required=True) +parser.add_argument("--n-events-to-analyze", type=int, default=1) +parser.add_argument("--event", type=int, default=0) +parser.add_argument("--n-max") +parser.add_argument("--n-eff") +parser.add_argument("--save-samples", action="store_true") +args, _ = parser.parse_known_args() + +if os.environ.get("RIFT_HYPERPIPELINE_FORMAT") != "1": + raise SystemExit("stub_ile: missing RIFT_HYPERPIPELINE_FORMAT=1") + +columns = hpio.DEFAULT_BASE_COLUMNS +for event in range(args.n_events_to_analyze): + out = "{}_{}_.dat".format(args.output_file, event) + m1 = 30.0 + args.event + event + m2 = 20.0 + args.event + event + hpio.write_table(out, columns, [ + [0.0, 0.1, m1, m2, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], + [0.0, 0.1, m1 + 1.0, m2 + 1.0, 0.0, 0.0, 0.1, 0.0, 0.0, -0.1], + ]) diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/README.md b/MonteCarloMarginalizeCode/Code/test/asimov_integration/README.md new file mode 100644 index 000000000..53bb11788 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/README.md @@ -0,0 +1,33 @@ +# Asimov integration smoke test + +This directory contains a CI-sized Asimov integration test for the RIFT plugin. + +The test is intentionally small: + +- create a fresh Asimov project +- apply current-style Asimov data blueprints +- add the public `GW190426_190642` event +- add the current O4b-style RIFT SEOBNRv5PHM analysis blueprint +- verify Asimov sees the RIFT pipeline plugin and writes project state + +The companion frozen-input test exercises the RIFT-specific build contract +without requiring a live cluster or successful upstream productions. It uses +the in-tree GW150914 `ini`/`coinc.xml` fixtures, constructs a minimal production +object, and intercepts the external `util_RIFT_pseudo_pipe.py` call after +checking that `RIFT.asimov.rift.Rift.build_dag` has assembled the expected +command-line interface. + +The template-contract test renders the real `RIFT/asimov/rift.ini` Liquid +template against realistic ledger-shaped objects. It checks baseline ledger +parsing, important optional blocks such as calibration/eccentricity/OSG/manual +ILE args, and a deterministic randomized sweep over key scalar options. + +It does not submit jobs or require production frame/calibration storage. + +The RIFT Asimov integration is currently developed against the Asimov `0.5` +series. The pytest is ready to skip cleanly for `0.6` and `0.7` until the +integration is updated for those APIs. + +The bundled blueprints are small snapshots of the current public Asimov data +repository (`https://git.ligo.org/asimov/data`) chosen to avoid live network +dependencies in CI. diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/GW190426_190642.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/GW190426_190642.yaml new file mode 100644 index 000000000..1dc75eb36 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/GW190426_190642.yaml @@ -0,0 +1,51 @@ +data: + channels: + H1: H1:DCS-CALIB_STRAIN_CLEAN_SUB60HZ_C01 + L1: L1:DCS-CALIB_STRAIN_CLEAN_SUB60HZ_C01 + V1: V1:Hrec_hoft_16384Hz + frame types: + H1: H1_HOFT_CLEAN_SUB60HZ_C01 + L1: L1_HOFT_CLEAN_SUB60HZ_C01 + V1: V1Online + segment length: 4 +event time: 1240340820.676 +interferometers: +- H1 +- L1 +- V1 +kind: event +likelihood: + psd length: 4 + reference frequency: 3 + sample rate: 1024 + segment start: 1240340818.676 + start frequency: '2' + window length: 4 +name: GW190426_190642 +priors: + amplitude order: 10 + chirp mass: + maximum: 233.46087529193179 + minimum: 33.4551223382697 + luminosity distance: + maximum: 20000 + minimum: 10 + mass 1: + maximum: 1000 + minimum: 1 + mass ratio: + maximum: 1.0 + minimum: 0.05 +quality: + maximum frequency: + H1: 448 + L1: 448 + V1: 448 + minimum frequency: + H1: 20 + L1: 20 + V1: 20 +supress: + V1: + lower: 49.5 + upper: 50.5 diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/analysis_rift_SEOBNRv5PHM.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/analysis_rift_SEOBNRv5PHM.yaml new file mode 100644 index 000000000..6c3d2eacf --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/analysis_rift_SEOBNRv5PHM.yaml @@ -0,0 +1,26 @@ +kind: analysis +name: rift-v5PHM-calmarg +status: Ready +pipeline: RIFT +needs: + - get-calibration-ligo-H1 + - get-calibration-ligo-L1 + - get-calibration-virgo + - get-preliminary-posterior + - get-priors + - generate-psd +waveform: + approximant: SEOBNRv5PHM + gwsignal arguments: + lmax_nyquist: 1 + enable_antisymmetric_modes: True + antisymmetric_modes_hm: True +comment: PE job using SEOBNRv5PHM and RIFT +scheduler: + use global priors: True + bootstrap upstream: True + osg: True + pipeline: + internal-puff-transverse: True + cip-explode-jobs-auto-scale: 6 + use-gwsignal: True diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/bayeswave-psd-standard.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/bayeswave-psd-standard.yaml new file mode 100644 index 000000000..c7e81a248 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/bayeswave-psd-standard.yaml @@ -0,0 +1,6 @@ +kind: analysis +name: generate-psd +pipeline: bayeswave +comment: Bayeswave on-source PSD estimation process +needs: + - get-priors diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/get-data-o4b-production.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/get-data-o4b-production.yaml new file mode 100644 index 000000000..e8038afed --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/get-data-o4b-production.yaml @@ -0,0 +1,65 @@ +kind: analysis +name: get-preliminary-posterior +pipeline: gwdata +download: + - posterior +source: + type: pesummary + location: /home/pe.o4/O4b///summary/samples/posterior_samples.h5 +scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request memory: 1024 + request post memory: 16384 +--- +kind: analysis +name: get-calibration-ligo-H1 +pipeline: gwdata +interferometers: + - H1 +download: + - calibration +source: + type: local storage +scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request memory: 1024 + request post memory: 16384 +locations: + calibration directory: /home/cal/archive/ +calibration version: v2 +--- +kind: analysis +name: get-calibration-ligo-L1 +pipeline: gwdata +interferometers: + - L1 +download: + - calibration +source: + type: local storage +scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request memory: 1024 + request post memory: 16384 +locations: + calibration directory: /home/cal/archive/ +calibration version: v1 +--- +kind: analysis +name: get-calibration-virgo +pipeline: gwdata +interferometers: + - V1 +download: + - calibration +source: + type: frame +scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request memory: 1024 + request post memory: 16384 +locations: + calibration directory: /home/cal/archive/ +calibration version: v1 +virgo prefix: V1:Hrec_hoftRepro1AR_U02 +virgo frametype: V1:HoftAR1U02 diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/pe-configurator-standard.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/pe-configurator-standard.yaml new file mode 100644 index 000000000..8eedbf23c --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/pe-configurator-standard.yaml @@ -0,0 +1,14 @@ +kind: analysis +name: get-priors +pipeline: peconfigurator +minimum mass ratio: 0.05 +checks: + - higher modes + - luminosity distance + - no safety +needs: + - get-preliminary-posterior +scheduler: + accounting group: ligo.dev.o4.cbc.pe.bilby + request memory: 1024 + request post memory: 16384 diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-o4b.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-o4b.yaml new file mode 100644 index 000000000..e5aa44ee6 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-o4b.yaml @@ -0,0 +1,76 @@ +kind: configuration +data: + channels: + H1: H1:DCS-CALIB_STRAIN_CLEAN_AR01 + L1: L1:DCS-CALIB_STRAIN_CLEAN_AR01 + V1: V1:Hrec_hoftRepro1AR_16384Hz + frame types: + H1: H1_HOFT_AR01 + L1: L1_HOFT_AR01 + V1: HoftAR1U02 +likelihood: + roll off time: 1.0 +pipelines: + gwdata: + scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request cpus: 1 + peconfigurator: + scheduler: + accounting group: ligo.prod.o4.cbc.pe.bilby + request cpus: 1 + bayeswave: + scheduler: + environment: /cvmfs/software.igwn.org/conda/envs/igwn-py310-20250729 + accounting group: ligo.prod.o4.cbc.pe.bilby + request memory: 4096 MB + request post memory: 16384 MB + copy frames: True + request disk: 8 GB + request post disk: 8 GB + osg: True + likelihood: + roll off time: 1 + iterations: 250000 + chains: 16 + threads: 4 + rift: + scheduler: + osg: True + bootstrap upstream: True + bootstrap coinc: True + use global priors: True + accounting group: ligo.prod.o4.cbc.pe.rift + request memory: 1024 + singularity image: osdf://igwn/cit/staging/pe.o4/containers/o4b-production/Reviewed-GWTC5-RIFT-20250919.sif + singularity base exe directory: /opt/conda/envs/Reviewed_GWTC_reruns_yml/bin/ + pipeline: + internal-ile-data-tukey-window-time: "1.0" + internal-ile-psd-common-window: True + manual-extra-puff-args: "--downselect-parameter m1 --downselect-parameter-range [1,1000] --downselect-parameter m2 --downselect-parameter-range [1,1000]" + calibration-reweighting-count: 300 + sampler: + cip: + fitting method: rf + explode jobs auto: True + request disk: "4G" + ile: + n eff: 10 + jobs per worker: 100 + request disk: "4G" + likelihood: + calibration: + sample: True + use aligned phase coordinates: True + correlate parameters default: True + use rescaled transverse spin coordinates: False + n output samples last: 60000 +quality: + state vector: + L1: L1:GDS-CALIB_STATE_VECTOR_AR01 + H1: H1:GDS-CALIB_STATE_VECTOR_AR01 + V1: V1:DQ_ANALYSIS_STATE_VECTOR + minimum frequency: + H1: 20 + L1: 20 + V1: 20 diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-priors.yaml b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-priors.yaml new file mode 100644 index 000000000..eac9f66e7 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/blueprints/production-pe-priors.yaml @@ -0,0 +1,49 @@ +kind: configuration +priors: + chirp mass: + maximum: 100 + minimum: 1 + type: bilby.gw.prior.UniformInComponentsChirpMass + dec: + type: Cosine + luminosity distance: + maximum: 20000 + minimum: 10 + type: bilby.gw.prior.UniformSourceFrame + cosmology: Planck15_LAL + mass 1: + maximum: 1000 + minimum: 1 + type: Constraint + mass 2: + maximum: 1000 + minimum: 1 + type: Constraint + mass ratio: + maximum: 1.0 + minimum: 0.05 + type: bilby.gw.prior.UniformInComponentsMassRatio + phase: + boundary: periodic + type: Uniform + phi 12: + type: Uniform + phi jl: + type: Uniform + psi: + type: Uniform + ra: + type: Uniform + spin 1: + maximum: 0.99 + minimum: 0 + type: Uniform + spin 2: + maximum: 0.99 + minimum: 0 + theta jn: + type: Sine + tilt 1: + type: Sine + tilt 2: + type: Sine diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_build_contract.py b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_build_contract.py new file mode 100644 index 000000000..48b925488 --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_build_contract.py @@ -0,0 +1,188 @@ +import importlib.metadata +import pathlib +import shutil +import types + +import pytest + + +ROOT = pathlib.Path(__file__).resolve().parents[4] +TRAVIS_INPUTS = ROOT / ".travis" / "ref_ini" +SUPPORTED_SERIES = {"0.5"} +FUTURE_SERIES = {"0.6", "0.7"} + + +def _asimov_version(): + try: + return importlib.metadata.version("asimov") + except importlib.metadata.PackageNotFoundError: + pytest.skip("asimov is not installed") + + +def _series(version): + parts = version.split(".") + return ".".join(parts[:2]) + + +def _require_supported_asimov(): + version = _asimov_version() + series = _series(version) + if series in FUTURE_SERIES: + pytest.skip( + "RIFT Asimov CI is wired for this series, but the integration " + "is currently validated only against Asimov 0.5" + ) + if series not in SUPPORTED_SERIES: + pytest.skip( + "RIFT Asimov CI is currently validated only against Asimov 0.5 " + f"(found {version})" + ) + return version + + +def _require_htcondor(): + try: + __import__("htcondor") + except ModuleNotFoundError: + pytest.skip("htcondor Python bindings are required to import the Asimov pipeline stack") + + +class _Logger: + def __init__(self): + self.messages = [] + + def info(self, message): + self.messages.append(("info", message)) + + def warning(self, message): + self.messages.append(("warning", message)) + + def error(self, message): + self.messages.append(("error", message)) + + +class _Repository: + def __init__(self, directory): + self.directory = str(directory) + + +class _Event: + def __init__(self, name, work_dir, repository): + self.name = name + self.work_dir = str(work_dir) + self.repository = repository + self.productions = [] + + +class _Production: + def __init__(self, event, name, rundir, ini_relpath, coinc_file): + self.event = event + self.name = name + self.pipeline = "rift" + self.category = "C01_offline" + self.rundir = str(rundir) + self.dependencies = [] + self.status = "ready" + self.meta = { + "scheduler": { + "accounting group": "ligo.dev.o4.cbc.pe.rift", + "pipeline": { + "internal-puff-transverse": True, + "cip-explode-jobs-auto-scale": 6, + "use-gwsignal": True, + }, + }, + "waveform": {"approximant": "SEOBNRv5PHM"}, + "likelihood": {"assume": {}}, + } + self._ini_relpath = ini_relpath + self._coinc_file = coinc_file + + def get_meta(self, key): + return self.meta.get(key) + + def set_meta(self, key, value): + self.meta[key] = value + + def get_coincfile(self): + return str(self._coinc_file) + + def get_configuration(self): + return types.SimpleNamespace(ini_loc=self._ini_relpath) + + def get_psds(self, _format): + return [] + + +def test_asimov_05_rift_build_dag_uses_frozen_inputs(monkeypatch, tmp_path): + _require_supported_asimov() + _require_htcondor() + + from RIFT.asimov import rift as rift_module + from RIFT.asimov.rift import Rift + + repo_dir = tmp_path / "repo" + repo_category = repo_dir / "C01_offline" + repo_category.mkdir(parents=True) + ini_name = "rift-ci.ini" + shutil.copyfile(TRAVIS_INPUTS / "GW150914.ini", repo_category / ini_name) + coinc_file = tmp_path / "coinc.xml" + shutil.copyfile(TRAVIS_INPUTS / "coinc.xml", coinc_file) + + event = _Event("GW190426_190642", tmp_path, _Repository(repo_dir)) + production = _Production( + event=event, + name="rift-v5PHM-calmarg", + rundir=tmp_path / "rift-run", + ini_relpath=ini_name, + coinc_file=coinc_file, + ) + event.productions.append(production) + + pipe = Rift.__new__(Rift) + pipe.production = production + pipe.category = production.category + pipe.bootstrap = False + pipe.logger = _Logger() + + monkeypatch.setattr(Rift, "before_build", lambda self: None) + + def fake_config_get(section, option): + values = { + ("condor", "user"): "rift-ci", + ("general", "calibration"): "C01", + ("pipelines", "environment"): str(tmp_path / "env"), + ("rift", "environment"): str(tmp_path / "env"), + } + return values[(section, option)] + + monkeypatch.setattr(rift_module.config, "get", fake_config_get) + + calls = [] + + class FakePopen: + def __init__(self, command, stdout=None, stderr=None): + self.command = command + calls.append(command) + + def communicate(self): + rundir = pathlib.Path(self.command[self.command.index("--use-rundir") + 1]) + rundir.mkdir(parents=True) + dag = rundir / "marginalize_intrinsic_parameters_BasicIterationWorkflow.dag" + dag.write_text("# frozen Asimov/RIFT build contract\n") + return b"fake util_RIFT_pseudo_pipe.py completed", None + + monkeypatch.setattr(rift_module.subprocess, "Popen", FakePopen) + + pipe.build_dag() + + assert len(calls) == 1 + command = calls[0] + assert command[0].endswith("/bin/util_RIFT_pseudo_pipe.py") + assert command[command.index("--use-coinc") + 1] == str(coinc_file.resolve()) + assert command[command.index("--use-ini") + 1] == str((repo_category / ini_name).resolve()) + assert command[command.index("--approx") + 1] == "SEOBNRv5PHM" + assert command[command.index("--use-rundir") + 1] == production.rundir + assert "--internal-puff-transverse" in command + assert "--cip-explode-jobs-auto-scale=6" in command + assert "--use-gwsignal" in command diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_project.py b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_project.py new file mode 100644 index 000000000..306a7140b --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_project.py @@ -0,0 +1,130 @@ +import importlib.metadata +import os +import pathlib +import shutil +import subprocess + +import pytest + + +BLUEPRINT_DIR = pathlib.Path(__file__).with_name("blueprints") +SUPPORTED_SERIES = {"0.5"} +FUTURE_SERIES = {"0.6", "0.7"} +EVENT = "GW190426_190642" +RIFT_ANALYSIS = "rift-v5PHM-calmarg" + + +def _asimov_version(): + try: + return importlib.metadata.version("asimov") + except importlib.metadata.PackageNotFoundError: + pytest.skip("asimov is not installed") + + +def _series(version): + parts = version.split(".") + return ".".join(parts[:2]) + + +def _require_supported_asimov(): + version = _asimov_version() + series = _series(version) + if series in FUTURE_SERIES: + pytest.skip( + "RIFT Asimov CI is wired for this series, but the integration " + "is currently validated only against Asimov 0.5" + ) + if series not in SUPPORTED_SERIES: + pytest.skip( + "RIFT Asimov CI is currently validated only against Asimov 0.5 " + f"(found {version})" + ) + return version + + +def _require_htcondor(): + try: + __import__("htcondor") + except ModuleNotFoundError: + pytest.skip("htcondor Python bindings are required for this Asimov path") + + +def _run(cmd, cwd, env): + result = subprocess.run( + cmd, + cwd=str(cwd), + env=env, + text=True, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + ) + assert result.returncode == 0, ( + "command failed with status {}\n{}\n{}".format( + result.returncode, " ".join(map(str, cmd)), result.stdout + ) + ) + return result.stdout + + +def _tree_text(root): + chunks = [] + for path in root.rglob("*"): + if not path.is_file() or ".git" in path.parts: + continue + try: + chunks.append(path.read_text(errors="ignore")) + except OSError: + pass + return "\n".join(chunks) + + +def test_asimov_05_can_create_project_and_add_rift_event(tmp_path): + version = _require_supported_asimov() + _require_htcondor() + asimov_cli = shutil.which("asimov") + assert asimov_cli, "asimov CLI is not on PATH" + + # Import after the version gate so 0.6/0.7 API drift skips cleanly. + from asimov.pipelines import known_pipelines + from RIFT.asimov.rift import Rift + + assert "rift" in known_pipelines + assert known_pipelines["rift"] is Rift + + project = tmp_path / "project" + project.mkdir() + env = os.environ.copy() + env.update({ + "GIT_AUTHOR_NAME": "RIFT CI", + "GIT_AUTHOR_EMAIL": "rift-ci@example.invalid", + "GIT_COMMITTER_NAME": "RIFT CI", + "GIT_COMMITTER_EMAIL": "rift-ci@example.invalid", + }) + + _run([asimov_cli, "init", f"RIFT Asimov CI {version}"], project, env) + + for blueprint in [ + "production-pe-o4b.yaml", + "production-pe-priors.yaml", + "GW190426_190642.yaml", + ]: + _run([asimov_cli, "apply", "-f", str(BLUEPRINT_DIR / blueprint)], project, env) + + for blueprint in [ + "get-data-o4b-production.yaml", + "pe-configurator-standard.yaml", + "bayeswave-psd-standard.yaml", + "analysis_rift_SEOBNRv5PHM.yaml", + ]: + _run( + [asimov_cli, "apply", "-f", str(BLUEPRINT_DIR / blueprint), "-e", EVENT], + project, + env, + ) + + state = _tree_text(project) + assert EVENT in state + assert RIFT_ANALYSIS in state + assert "SEOBNRv5PHM" in state + assert "pipeline" in state.lower() + assert "rift" in state.lower() diff --git a/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_template_contract.py b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_template_contract.py new file mode 100644 index 000000000..06d56506a --- /dev/null +++ b/MonteCarloMarginalizeCode/Code/test/asimov_integration/test_asimov_rift_template_contract.py @@ -0,0 +1,288 @@ +import configparser +import pathlib +import random +import types + +import pytest + + +TEMPLATE = pathlib.Path(__file__).resolve().parents[2] / "RIFT" / "asimov" / "rift.ini" + + +def _liquid_render(template_text, context): + liquid = pytest.importorskip("liquid") + + if hasattr(liquid, "Environment"): + env = liquid.Environment() + template = env.from_string(template_text) + return template.render(**context) + + if hasattr(liquid, "Liquid"): + template = liquid.Liquid(template_text, from_file=False) + return template.render(**context) + + if hasattr(liquid, "Template"): + template = liquid.Template(template_text) + return template.render(**context) + + pytest.skip("installed liquid package does not expose a supported renderer") + + +def _base_meta(): + return { + "name": "rift-v5PHM-calmarg", + "engine": "RIFT", + "interferometers": ["H1", "L1"], + "event time": 1240340820.676, + "data": { + "channels": { + "H1": "H1:DCS-CALIB_STRAIN_CLEAN_SUB60HZ_C01", + "L1": "L1:DCS-CALIB_STRAIN_CLEAN_SUB60HZ_C01", + }, + "frame types": { + "H1": "H1_HOFT_CLEAN_SUB60HZ_C01", + "L1": "L1_HOFT_CLEAN_SUB60HZ_C01", + }, + "segment length": 4, + }, + "quality": { + "minimum frequency": {"H1": 20, "L1": 20}, + "maximum frequency": {"H1": 1700, "L1": 1700}, + }, + "likelihood": { + "start frequency": 20, + "sample rate": 1024, + "assume": {}, + "marginalization": {"distance": False}, + }, + "waveform": { + "approximant": "SEOBNRv5PHM", + "pn amplitude order": 5, + "maximum mode": 4, + "gwsignal arguments": { + "lmax_nyquist": 1, + "enable_antisymmetric_modes": True, + }, + }, + "priors": { + "spin 1": {"maximum": 0.99}, + "spin 2": {"maximum": 0.99}, + "chirp mass": {"minimum": 33.4, "maximum": 233.4}, + "mass ratio": {"minimum": 0.05, "maximum": 1.0}, + "mass 1": {"minimum": 1, "maximum": 1000}, + "luminosity distance": { + "minimum": 10, + "maximum": 20000, + "type": "bilby.gw.prior.UniformSourceFrame", + }, + }, + "sampler": { + "cip": {"sampling method": "AV", "explode jobs": 5}, + "ile": { + "n eff": 12, + "copies": 2, + "sampling method": "AV", + "freezeadapt": False, + "jobs per worker": 30, + }, + }, + "scheduler": { + "accounting group": "ligo.dev.o4.cbc.pe.rift", + "osg": False, + }, + } + + +def _render(meta): + production = types.SimpleNamespace( + name=meta["name"], + meta=meta, + category="C01_offline", + event=types.SimpleNamespace( + name="GW190426_190642", + repository=types.SimpleNamespace(directory="/tmp/rift-asimov-repo"), + ), + xml_psds={ + ifo: f"/tmp/rift-asimov-repo/C01_offline/psds/psd_{ifo}.xml.gz" + for ifo in meta["interferometers"] + }, + ) + context = { + "production": production, + "config": { + "general": {"webroot": "/tmp/rift-web"}, + "pipelines": {"environment": "/opt/igwn"}, + "condor": {"user": "riftci"}, + }, + } + rendered = _liquid_render(TEMPLATE.read_text(), context) + assert "{{" not in rendered + assert "{%" not in rendered + parser = configparser.RawConfigParser() + parser.optionxform = str + parser.read_string(rendered) + return rendered, parser + + +def test_rift_liquid_template_renders_realistic_baseline_ledger(): + meta = _base_meta() + rendered, parser = _render(meta) + + assert parser.get("analysis", "ifos") == "['H1', 'L1']" + assert parser.get("analysis", "engine") == "RIFT" + assert parser.get("condor", "accounting_group") == "ligo.dev.o4.cbc.pe.rift" + assert '"H1":"H1_HOFT_CLEAN_SUB60HZ_C01"' in parser.get("datafind", "types") + assert '"L1":"L1:DCS-CALIB_STRAIN_CLEAN_SUB60HZ_C01"' in parser.get("data", "channels") + assert parser.get("engine", "fref") == "20" + assert parser.get("engine", "fmin-template") == "20" + assert parser.get("engine", "approx").strip() == "SEOBNRv5PHM" + assert parser.get("engine", "amporder") == "5" + assert parser.get("engine", "seglen") == "4" + assert parser.get("engine", "srate").strip() == "1024" + assert parser.get("engine", "H1-psd") == '"/tmp/rift-asimov-repo/C01_offline/psds/psd_H1.xml.gz"' + assert parser.get("engine", "chirpmass-min") == "33.4" + assert parser.get("engine", "distance-max") == "20000" + assert parser.get("rift-pseudo-pipe", "cip-explode-jobs").strip() == "5" + assert parser.get("rift-pseudo-pipe", "ile-n-eff").strip() == "12" + assert parser.get("rift-pseudo-pipe", "ile-copies").strip() == "2" + assert parser.get("rift-pseudo-pipe", "use_osg").strip() == "False" + assert "manual-extra-ile-args=--internal-waveform-extra-kwargs" in rendered + + +@pytest.mark.parametrize( + "distance_prior,expected", + [ + ("bilby.gw.prior.UniformSourceFrame", "cosmo_sourceframe"), + ("bilby.gw.prior.UniformComovingVolume", "cosmo"), + ("Uniform", "pseudo_cosmo"), + ], +) +def test_rift_liquid_template_distance_prior_blocks(distance_prior, expected): + meta = _base_meta() + meta["priors"]["luminosity distance"]["type"] = distance_prior + _rendered, parser = _render(meta) + assert parser.get("rift-pseudo-pipe", "ile-distance-prior").strip("'\"") == expected + + +def test_rift_liquid_template_option_blocks_land_safely(): + meta = _base_meta() + meta["data"]["calibration"] = { + "H1": "/tmp/cal/H1.dat", + "L1": "/tmp/cal/L1.dat", + } + meta["likelihood"]["assume"] = { + "eccentric": True, + "nonprecessing": True, + "matter secondary": True, + } + meta["likelihood"]["marginalization"] = { + "distance": True, + "distance lookup": "/tmp/lookup.npy", + } + meta["likelihood"]["roll off time"] = 1.0 + meta["sampler"]["likelihood"] = { + "calibration": { + "sample": True, + "bilby ini file": "/tmp/bilby.ini", + } + } + meta["sampler"]["extra eccentric arguments"] = { + "force-ecc-min": 0.0, + "force-ecc-max": 0.2, + } + meta["sampler"]["manual grid"] = "/tmp/initial-grid.xml.gz" + meta["sampler"]["force iterations"] = 3 + meta["sampler"]["correlate parameters default"] = True + meta["sampler"]["use rescaled transverse spin coordinates"] = True + meta["sampler"]["n output samples"] = 7000 + meta["sampler"]["n output samples last"] = 25000 + meta["sampler"]["n input samples"] = 4000 + meta["sampler"]["ile"]["manual extra args"] = ["--zero-likelihood", "--vectorized"] + meta["sampler"]["ile"]["rotate phase"] = True + meta["sampler"]["ile"]["request disk"] = "2 GB" + meta["sampler"]["cip"]["request disk"] = "1 GB" + meta["sampler"]["ile"]["condor commands"] = { + "request_memory": "4096", + "+WantOSG": "True", + } + meta["scheduler"]["osg"] = True + meta["scheduler"]["additional files"] = ["/tmp/waveform-cache.h5", "/tmp/table.dat"] + + rendered, parser = _render(meta) + + assert parser.get("engine", "enable-spline-calibration") == "" + assert parser.get("engine", "H1-spcal-envelope") == '"/tmp/cal/H1.dat"' + assert parser.get("engine", "ecc_min") == "0.0" + assert parser.get("engine", "ecc_max") == "0.2" + assert parser.get("rift-pseudo-pipe", "calibration-reweighting") == "True" + assert parser.get("rift-pseudo-pipe", "bilby-ini-file") == '"/tmp/bilby.ini"' + assert parser.get("rift-pseudo-pipe", "internal-force-iterations") == "3" + assert parser.get("rift-pseudo-pipe", "internal-correlate-default") == "True" + assert parser.get("rift-pseudo-pipe", "internal-use-rescaled-transverse-spin-coordinates") == "True" + assert parser.get("rift-pseudo-pipe", "internal-ile-rotate-phase") == "True" + assert parser.get("rift-pseudo-pipe", "assume-eccentric") == "True" + assert parser.get("rift-pseudo-pipe", "assume-nonprecessing") == "True" + assert parser.get("rift-pseudo-pipe", "assume-matter-but-primary-bh") == "True" + assert parser.get("rift-pseudo-pipe", "use-meanPerAno") == "True" + assert parser.get("rift-pseudo-pipe", "force-ecc-max") == "0.2" + assert parser.get("rift-pseudo-pipe", "internal-marginalize-distance") == "True" + assert parser.get("rift-pseudo-pipe", "internal-marginalize-distance-file") == "/tmp/lookup.npy" + assert parser.get("rift-pseudo-pipe", "internal-ile-data-tukey-window-time") == "1.0" + assert parser.get("rift-pseudo-pipe", "manual-initial-grid") == "'/tmp/initial-grid.xml.gz'" + assert parser.get("rift-pseudo-pipe", "use_osg").strip() == "True" + assert parser.get("rift-pseudo-pipe", "internal-use-oauth-files").strip() == "'scitokens'" + assert parser.get("rift-pseudo-pipe", "n-output-samples").strip() == "7000" + assert parser.get("rift-pseudo-pipe", "n-output-samples-last").strip() == "25000" + assert parser.get("rift-pseudo-pipe", "internal-n-evaluations-per-iteration") == "4000" + assert parser.get("rift-pseudo-pipe", "internal-ile-request-disk") == '"2 GB"' + assert parser.get("rift-pseudo-pipe", "internal-cip-request-disk") == '"1 GB"' + assert parser.get("rift-ile-condor", "request_memory") == "4096" + assert parser.get("rift-ile-condor", "+WantOSG") == "True" + assert "ile-additional-files-to-transfer=" in rendered + assert "--zero-likelihood" in parser.get("rift-pseudo-pipe", "manual-extra-ile-args") + + +def test_rift_liquid_template_randomized_ledger_sanity(): + rng = random.Random(190426) + approximants = ["SEOBNRv5PHM", "IMRPhenomXPHM", "TaylorF2"] + sample_rates = [512, 1024, 2048] + + for _ in range(20): + meta = _base_meta() + ifos = rng.sample(["H1", "L1", "V1"], rng.choice([2, 3])) + meta["interferometers"] = ifos + meta["data"]["channels"] = {ifo: f"{ifo}:TEST-STRAIN" for ifo in ifos} + meta["data"]["frame types"] = {ifo: f"{ifo}_TEST_FRAME" for ifo in ifos} + meta["quality"]["minimum frequency"] = {ifo: rng.choice([10, 15, 20]) for ifo in ifos} + meta["quality"]["maximum frequency"] = {ifo: rng.choice([512, 1024, 1700]) for ifo in ifos} + meta["likelihood"]["start frequency"] = rng.choice([10, 15, 20, 30]) + meta["likelihood"]["sample rate"] = rng.choice(sample_rates) + meta["waveform"]["approximant"] = rng.choice(approximants) + if rng.choice([True, False]): + meta["waveform"]["reference frequency"] = rng.choice([20, 30, 40]) + meta["waveform"]["pn amplitude order"] = rng.choice([0, 3, 5]) + meta["waveform"]["maximum mode"] = rng.choice([2, 3, 4]) + meta["priors"]["spin 1"]["maximum"] = round(rng.uniform(0.2, 0.99), 3) + meta["priors"]["spin 2"]["maximum"] = round(rng.uniform(0.2, 0.99), 3) + mc_min = round(rng.uniform(1.0, 80.0), 3) + meta["priors"]["chirp mass"] = {"minimum": mc_min, "maximum": round(mc_min + rng.uniform(10.0, 200.0), 3)} + meta["priors"]["luminosity distance"]["maximum"] = rng.choice([1000, 5000, 20000]) + meta["sampler"]["cip"]["explode jobs"] = rng.randint(1, 8) + meta["sampler"]["ile"]["n eff"] = rng.randint(5, 50) + meta["scheduler"]["osg"] = rng.choice([True, False]) + + _rendered, parser = _render(meta) + + assert parser.get("engine", "approx").strip() == meta["waveform"]["approximant"] + assert parser.get("engine", "fmin-template") == str(meta["likelihood"]["start frequency"]) + expected_fref = meta["waveform"].get("reference frequency", meta["likelihood"]["start frequency"]) + assert parser.get("engine", "fref") == str(expected_fref) + assert parser.get("engine", "srate").strip() == str(meta["likelihood"]["sample rate"]) + assert parser.get("engine", "amporder") == str(meta["waveform"]["pn amplitude order"]) + assert parser.get("rift-pseudo-pipe", "l-max") == str(meta["waveform"]["maximum mode"]) + assert parser.get("rift-pseudo-pipe", "use_osg").strip() == str(meta["scheduler"]["osg"]) + assert parser.get("rift-pseudo-pipe", "cip-explode-jobs").strip() == str(meta["sampler"]["cip"]["explode jobs"]) + assert parser.get("rift-pseudo-pipe", "ile-n-eff").strip() == str(meta["sampler"]["ile"]["n eff"]) + for ifo in ifos: + assert f'"{ifo}":"{ifo}_TEST_FRAME"' in parser.get("datafind", "types") + assert f'"{ifo}":"{ifo}:TEST-STRAIN"' in parser.get("data", "channels") diff --git a/MonteCarloMarginalizeCode/developer_guide/README.md b/MonteCarloMarginalizeCode/developer_guide/README.md new file mode 100644 index 000000000..a371505a4 --- /dev/null +++ b/MonteCarloMarginalizeCode/developer_guide/README.md @@ -0,0 +1,226 @@ +# RIFT Developer Guide + +This guide is for humans and agents changing core RIFT components. The main +habit to preserve is simple: identify the component being changed, run the +cheapest test that can catch obvious breakage, then choose at least one +component-appropriate integration test before trusting the change. + +RIFT is scientific software. A green sanity test means "not obviously broken," +not "scientifically validated." The tests below are a ladder, not a single +gate. + +## Testing Ladder + +### 0. CI and local sanity checks + +CI is the zeroth-level sanity check. It catches import failures, script drift, +and short deterministic examples. It should be run locally before pushing when +possible, especially after changing public command-line tools. + +Useful entry points include: + +- `.travis/test-all-bin.sh` +- `.travis/test-build.sh` +- `.travis/test-docs.sh` +- individual tests under `MonteCarloMarginalizeCode/Code/test/` + +Local runs need the normal RIFT environment, including `lalsuite`; on many +systems set `GW_SURROGATE=''`. + +### 1. Pipeline framework tests + +Pipeline changes should not be validated only by importing Python modules. +They need generated DAGs and, when possible, actual local Condor execution. + +The lightweight end-to-end framework test lives at: + +```bash +MonteCarloMarginalizeCode/Code/demo/rift/test_frameworks/zero_likelihood/run_hyperpipeline_condor_smoke.sh +``` + +This test runs a small HTCondor DAG using tiny ILE/CIP stand-ins. It exercises +the real pipeline generator, join/postprocess steps, hyperpipeline ASCII +`grid-N.dat` flow, `all.net` unification, `posterior_samples-*.dat` aliases, +and the convergence-test node. It is heavier than CI but far lighter than a +full PE run. + +Scope: this validates workflow plumbing, Condor environment propagation, +intermediate file naming, and postprocessing assumptions. + +Non-scope: this does not validate waveform physics, real ILE data ingestion, or +posterior quality. A future mode should run real +`integrate_likelihood_extrinsic_batchmode --zero-likelihood` once that path is +fully data-free. + +### 2. Full scientific workflows + +When a change can alter likelihood values, posterior weights, event setup, +coordinate semantics, calibration, or waveform generation, at least one full +workflow comparison is needed. Prefer an existing, archived run with known +`overlap-grid-N.xml.gz` and `consolidated*.composite` outputs so the new code +can be compared against a fixed reference. + +Full workflow tests should record: + +- branch and commit +- command lines and environment +- input grid(s) +- PSD/data setup +- likelihood point comparisons +- posterior/postprocessing diagnostics + +## Component Notes + +### Pipeline + +Pipeline edits affect DAG shape, submit files, environment propagation, file +transfer, naming conventions, and postprocessing dependencies. Use the +zero-likelihood-style Condor smoke framework first, then a real run if the +change can affect science outputs. + +Common checks: + +- inspect generated `.dag` and `.sub` files +- confirm expected `PARENT/CHILD` ordering +- confirm submit-file `environment` and `getenv` behavior +- confirm expected files are non-empty +- confirm aliases such as `posterior_samples-*.dat` exist when downstream + tooling expects them + +### Waveforms + +Waveform tests are necessarily fuzzy. GR waveform modeling is not perfectly +known, and different approximants/backends can legitimately disagree within +model-dependent tolerances. The goal is to catch discontinuities, broken data +conditioning, obvious convention errors, and regressions against expected +behavior. + +Relevant tests live under: + +```bash +MonteCarloMarginalizeCode/Code/test/waveform/ +``` + +Important scripts include: + +- `check_waveform_random.py` +- `check_waveform_taper.py` +- `compare_in_ifo.py` +- `plot-waveform-ci.py` + +Use these when adding new waveforms, changing waveform argument handling, or +changing data-conditioning/tapering logic. Do not treat a single numerical +tolerance as universal; document the approximant, parameter range, and expected +comparison behavior. + +### Coordinates + +Coordinate changes are easy to make locally consistent while breaking a +pipeline boundary. Run the vector-coordinate test after any change to +coordinate naming, conversion, implied parameters, or sampler coordinates: + +```bash +.travis/test-coord.sh +``` + +This currently drives: + +```bash +python MonteCarloMarginalizeCode/Code/test/test_vector_coordinates.py --as-test +``` + +### Integrator + +Integrator changes should run the Travis integrator smoke and then a targeted +stress test for the changed sampling mode: + +```bash +.travis/test-integrate.sh +``` + +This currently exercises `test_mcsamplerEnsemble_extended.py` with and without +`--use-lnL`. For changes to adaptation, GPU/vectorized paths, portfolio +samplers, or pinned distributions, also inspect: + +```bash +MonteCarloMarginalizeCode/Code/test/integrators/ +MonteCarloMarginalizeCode/Code/test/test_mcsamplerEnsemble_extended.py +``` + +Use reproducible seeds where possible, and report effective sample sizes and +failure modes rather than only pass/fail. + +### Likelihood + +Likelihood changes need special care. Unit tests and smoke tests catch many +interface errors, but the important validation is agreement at identical points. + +A lightweight first test should run quickly on fake data. The Travis run +scripts are the current starting point: + +```bash +.travis/test-run.sh +.travis/test-build.sh +``` + +The fake-data setup can be run with high Monte Carlo accuracy and tunable SNR +through the PSD/data configuration. That makes it useful for stress-testing a +new likelihood implementation or a branch such as `calmarg_in_loop`: raise the +accuracy, tune SNR, and compare the old and new code on exactly the same +intrinsic/extrinsic points. + +Minimum useful likelihood comparison: + +- same data +- same PSD +- same waveform approximant and conditioning +- same intrinsic point +- same extrinsic point or controlled marginalization settings +- high enough Monte Carlo accuracy that implementation differences dominate + sampling noise + +Preferred full test: + +1. Point a comparison script at an existing run directory containing + `overlap-grid-N.xml.gz` files and `consolidated*.composite` files. +2. Recompute likelihood values with the candidate implementation on the same + points. +3. Compare against the existing run, separating deterministic differences from + Monte Carlo uncertainty. +4. Report both pointwise differences and downstream postprocessing impact. + +This full comparison script does not yet exist in a polished form; building it +would make future likelihood work much safer. + +### Asimov Tests + +An Asimov testing layer is needed but does not exist yet. It should provide +deterministic, noise-free validation cases where the expected likelihood shape, +posterior concentration, and convergence behavior are known well enough to +catch regressions. + +There are exploratory files under `MonteCarloMarginalizeCode/Code/test/` with +`asimov` in their names, but the missing framework is a maintained, documented +workflow that can be run routinely. Desired properties: + +- no stochastic data realization +- tunable SNR +- known injected parameters +- predictable convergence diagnostics +- compatibility with both lightweight and full pipeline modes + +## When Changing Code + +Use this checklist before claiming a core change is ready: + +1. Name the affected component: pipeline, waveform, coordinates, integrator, + likelihood, postprocessing, or backend. +2. Run CI-level sanity checks locally when practical. +3. Run the component-specific smoke test. +4. For pipeline or likelihood changes, run a DAG or workflow-level test. +5. Save the run directory or enough command-line detail for another developer + to reproduce the result. +6. In the commit or PR notes, say what was tested and what was not tested. + +If a test has to be skipped because of local environment limitations, say so +explicitly. Silent missing validation is worse than an honest known gap.