Skip to content

Latest commit

 

History

History

README.md

Reproduction of Our Experiments

In this file we explain how to reproduce our experiment.

Requirements

To reproduce our experiment, the following software must be installed.

Procedure

Download experimental dataset.

Experimental data can be downloaded from the following link. https://waseda.box.com/s/qcbqhdft7jt2p85gw75ez6mq4tcw9er2

Downliad and unzip the files.

tar xzf chr21_40x_datasets.tar.gz

Build indexes

Suffix Tree

The code to build a suffix tree is make_st.cpp in this folder. Compile this code, and run it with specified file.

g++ -o make_st.out make_st.cpp -I {your_sdsl_install_path}/include -L {your_sdsl_install_path}/lib -lsdsl -ldivsufsort -ldivsufsort64 -std=c++17 -O3 -DNDEBUG
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_st.out ./chr21_40x_err.fastq chr21_st_err
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_st.out ./chr21_40x_noerr.fastq chr21_st_noerr

If running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.

Previous method

The code to build a suffix tree is make_prev.cpp in this folder. Compile this code, and run it with specified file.

g++ -o make_prev.out make_prev.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_prev.out ./chr21_40x_err.fastq chr21_prev_err.boss
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_prev.out ./chr21_40x_noerr.fastq chr21_prev_noerr.boss

If running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.

Our method

The code to build a suffix tree is make_hash_cdbg.cpp in this folder. Compile this code, and run it with specified file.

g++ -o make_hash_cdbg.out make_hash_cdbg.cpp -I {your_hash_cdbg_install_path}/include -L {your_hash_cdbg_install_path}/lib -lhash_dbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_hash_cdbg.out ./chr21_40x_err.fastq chr21_hash_cdbg_err.cdbg
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_hash_cdbg.out ./chr21_40x_noerr.fastq chr21_hash_cdbg_noerr.cdbg

If running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.

To adjust the parameters of the bloom filter, you need to modify the hash_cdbg code. Modify the code as instructed below and reinstall.

Try Case #1 (ds = |α|)

Rewrite filter_size_add = col_num; in the following places.

filter_size_add = 1;

In addition, comment out the following sections.

// if (sibling_color_num <= col_num * 2) {
// filter_size_start = 2;
// filter_size_add = 1;
// } else {
// filter_size_start = col_num;
// filter_size_add = col_num;
// }
// if (filter_size / col_num >= 2) filter_size_add = col_num;

Try Case #2 (ds = 1 if |β| < 2|α| and s < 2|α|, and ds = |α| otherwise)

Rewrite filter_size_add = 1; in the following places.

filter_size_add = 1;

In addition, uncomment out the following sections.

// if (sibling_color_num <= col_num * 2) {
// filter_size_start = 2;
// filter_size_add = 1;
// } else {
// filter_size_start = col_num;
// filter_size_add = col_num;
// }
// if (filter_size / col_num >= 2) filter_size_add = col_num;

Try Case #3 (ds = 1)

Rewrite filter_size_add = 1; in the following places.

filter_size_add = 1;

In addition, comment out the following sections.

// if (sibling_color_num <= col_num * 2) {
// filter_size_start = 2;
// filter_size_add = 1;
// } else {
// filter_size_start = col_num;
// filter_size_add = col_num;
// }
// if (filter_size / col_num >= 2) filter_size_add = col_num;

Calcurate the time to build BOSS

To calculate the building time of BOSS which is the same for both the previous method and our method, do the following.

g++ -o make_boss.out make_boss.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_boss.out ./chr21_40x_err.fastq
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_boss.out ./chr21_40x_noerr.fastq

Subtracting this BOSS building time from the overall index building time yields "Building time of C".

Rebuilding original reads

Suffix Tree

The code to rebuilding original reads from index of suffix tree is rebuild_st.cpp in this folder. Compile this code, and run it with specified file.

g++ -o rebuild_st.out rebuild_st.cpp -I {your_sdsl_install_path}/include -L {your_sdsl_install_path}/lib -lsdsl -ldivsufsort -ldivsufsort64 -std=c++17 -O3 -DNDEBUG
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_st.out ./chr21_st_err.cst chr21_st_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_st.out ./chr21_st_noerr.cst chr21_st_noerr.re

If running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively

Build FM index

Build an FM-index to more accurately count the reads rebuilt by the previous and our methods; the FM-index is not necessarily needed if you just want to do rebuilding.

g++ -o build_fm_index.out build_fm_index.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz
./build_fm_index.out chr21_40x_err.fastq chr21_40x_err
./build_fm_index.out chr21_40x_noerr.fastq chr21_40x_noerr

Previous method

The code to rebuilding original reads from index of previous method is rebuild_prev.cpp in this folder. Compile this code, and run it with specified file.

g++ -o rebuild_prev.out rebuild_prev.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_prev.out ./chr21_prev_err.boss ./chr21_40x_err.fm_index 16 chr21_prev_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_prev.out ./chr21_prev_noerr.boss ./chr21_40x_noerr.fm_index 16 chr21_prev_noerr.re

If running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively

Our method

The code to rebuilding original reads from index of previous method is rebuild_prev.cpp in this folder. Before compiling, modify the parameters to those used when you built the index and reinstall it. Compile rebuild_prev.cpp, and run it with specified file.

g++ -o rebuild_hash_cdbg.out rebuild_hash_cdbg.cpp -I {your_hash_cdbg_install_path}/include -L {your_hash_cdbg_install_path}/lib -lhash_dbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_hash_cdbg.out ./chr21_hash_cdbg_err.cdbg ./chr21_40x_err.fm_index 16 chr21_hash_cdbg_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_hash_cdbg.out ./chr21_hash_cdbg_noerr.cdbg ./chr21_40x_noerr.fm_index 16 chr21_hash_cdbg_noerr.re

If running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively

Count ambiguous sequenses

You can find out how many reads have become ambiguous reads by running the following

g++ -o fastq2fasta.out fastq2fasta.cpp
./fastq2fasta chr21_40x_err.fastq chr21_40x_err.fasta
./fastq2fasta chr21_40x_noerr.fastq chr21_40x_noerr.fasta
g++ -o check_rebuild_read.out check_rebuild_read.cpp
./check_rebuild_read.out chr21_40x_err.fasta chr21_{method}_err.re.fasta
./check_rebuild_read.out chr21_40x_noerr.fasta chr21_{method}_noerr.re.fasta