A hypergraph-based model for tumor prognosis using local and global information fusion on H&E-stained histology images

This repo is the PyTorch implementation for the HeMiCoRe described in the paper "A hypergraph-based model for tumor prognosis using local and global information fusion on H&E-stained histology images", which has been published on Medical Image Analysis (https://doi.org/10.1016/j.media.2026.103991).

Folder structures

│  environment.yaml
│  pretrain_HeMiCoRe.py  # pre-training HeMiCoRe
│  train_HeMiCoRe.py  # training, fine-tuning and evaluating HeMiCoRe
├─HGNN
|      H_GNN.py
|      hypergraph_util.py
|      layers.py
|            
├─models  # the model's architecture
|      __init__.py
│      cl.py
│      rlmil.py
│      
├─runs  # the training scripts 
│      finetune.sh
|      get_key_patch.sh
│      pretrain.sh
│      scratch.sh
│      
├─utils
│      __init__.py
│      datasets.py  # WSI class and function for WSIs
│      general.py   # help function
│      losses.py    # loss function

OS Requirements

The code was designed on Ubuntu 22.04 LTS

Requirements

environment.yaml

Datasets

Download

TCGA: Use GDC data portal with a manifest file and configuration file. Please see the section Results in the paper for the sample selection standard.

Private dataset: Use [link]([https://www.scidb.cn /en/anonymous/UVpOclV2]). We provide the private dataset with de-identified data only for replicating the results.

WSI Processing

cd wsi_processing

Tile all the WSI into patches

python create_patches.py --slide_dir /dir/to/silde --save_dir /save/dir/patch --overview --save_mask --wsi_format .tif --overview_level 3

Extract patch features

 python extract_features_multiprocess.py --patch_dir /dir/to/patch --image_encoder resnet18 --device 0 --coord coord

Clustering patch features

python features_clustering.py --feat_dir /dir/to/patch/features --num_clusters 10

Data Organization

The format of input csv file:

case_id	features_filepath	label	clusters_filepath	clusters_json_filepath
2014_00322-3-HE-DX1	/path/to/patch_features/2014_00322-3-HE-DX1.npz	0	/path/to/cluster_indices/2014_00322-3-HE-DX1.npz	/path/to/cluster_indices/2014_00322-3-HE-DX1.json
...	...	...	...	...

case_id: [str] the index for each WSI.

features_filepath: [str] the .npz file path for each WSI, this .npz file contains several keywords as follows:

filename: [str] case_id.

img_features: [numpy.ndarray] the all patch's features as a numpy.ndarray, the shape is (num_patches, dim_features), like (9004, 512).

label: [int] the label of the WSI.

clusters_filepath: [str] this .npz file indicates the clustering category corresponding to each patch in WSI. It contaions several keywords as follows:

filename: [str] case_id.

features_cluster_indices: [numpy.ndarray] This array represents the clustering category of each patch feature in WSI, it's shape is (num_pathces, 1).

clusters_json_filepath: [str] This JSON file represents the patch index contained in each category of clustering, like:
[
    [55, 68, 86, 112, 126, 131, 145, 149, ...],
    [0, 1, 5, 12, 27, 53, 74, 92, 134, 178, 185, ...],
    ...
    [6, 19, 30, 36, 45, 48, 61, 77, 78, ...]
]
Each list represents a cluster, and a WSI is divide into multiple subregions. Based on the affiliation between patches and subregions, a WSI can be regarded as a cluster-constrained hypergraph. MIL can be regarded as a form of graph augmentation.

Pre-training

pre-training by hypergraph-level contrastive learning. Note: maybe you need to change the final row of code in utils/datasets.py.

cd runs
sh pretrain.sh

Training from scratch, fine-tuning, and exporting the desicion-making patches

evaluation of our proposed framework HeMiCoRe. Note: maybe you need to change the final row of code in utils/datasets.py.

cd runs
# training from scatch with the initial weights, does not load pre-trained weights
sh scratch.sh
# fine-tuning with pre-training weights 
sh fintune.sh
# exporting the desicion-making patches by testing with the weights of the target stage. Note: when you are running get_key_patch.sh，maybe you need to change the run() function in train_HeMiCoRe.py to another. 
sh get_key_patch.sh

Visualization

This code could create the distribution map mentioned in our paper.

cd my_utils
python visualization.py

Training on your own datasets

You can simply process your own dataset into a format acceptable to our code, see [WSI Processing](###WSI Processing) and [Data Organization](###Data Organization).
Then modify the input parameters of the training script in the runs directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A hypergraph-based model for tumor prognosis using local and global information fusion on H&E-stained histology images

Folder structures

OS Requirements

Requirements

Datasets

Download

WSI Processing

Data Organization

Pre-training

Training from scratch, fine-tuning, and exporting the desicion-making patches

Visualization

Training on your own datasets

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
HGNN		HGNN
models		models
my_utils		my_utils
runs		runs
stain_normalization		stain_normalization
utils		utils
wsi_processing		wsi_processing
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pretrain_HeMiCoRe.py		pretrain_HeMiCoRe.py
train_HeMiCoRe.py		train_HeMiCoRe.py

Folders and files

Latest commit

History

Repository files navigation

A hypergraph-based model for tumor prognosis using local and global information fusion on H&E-stained histology images

Folder structures

OS Requirements

Requirements

Datasets

Download

WSI Processing

Data Organization

Pre-training

Training from scratch, fine-tuning, and exporting the desicion-making patches

Visualization

Training on your own datasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages