In this file we explain how to reproduce our experiment.
To reproduce our experiment, the following software must be installed.
Experimental data can be downloaded from the following link. https://waseda.box.com/s/qcbqhdft7jt2p85gw75ez6mq4tcw9er2
Downliad and unzip the files.
tar xzf chr21_40x_datasets.tar.gzThe code to build a suffix tree is make_st.cpp in this folder. Compile this code, and run it with specified file.
g++ -o make_st.out make_st.cpp -I {your_sdsl_install_path}/include -L {your_sdsl_install_path}/lib -lsdsl -ldivsufsort -ldivsufsort64 -std=c++17 -O3 -DNDEBUG
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_st.out ./chr21_40x_err.fastq chr21_st_err
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_st.out ./chr21_40x_noerr.fastq chr21_st_noerrIf running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.
The code to build a suffix tree is make_prev.cpp in this folder. Compile this code, and run it with specified file.
g++ -o make_prev.out make_prev.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_prev.out ./chr21_40x_err.fastq chr21_prev_err.boss
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_prev.out ./chr21_40x_noerr.fastq chr21_prev_noerr.bossIf running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.
The code to build a suffix tree is make_hash_cdbg.cpp in this folder. Compile this code, and run it with specified file.
g++ -o make_hash_cdbg.out make_hash_cdbg.cpp -I {your_hash_cdbg_install_path}/include -L {your_hash_cdbg_install_path}/lib -lhash_dbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_hash_cdbg.out ./chr21_40x_err.fastq chr21_hash_cdbg_err.cdbg
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_hash_cdbg.out ./chr21_40x_noerr.fastq chr21_hash_cdbg_noerr.cdbgIf running code finishes, an index file is generated and the building time and peak memory are output to the console, respectively.
To adjust the parameters of the bloom filter, you need to modify the hash_cdbg code. Modify the code as instructed below and reinstall.
Rewrite filter_size_add = col_num; in the following places.
hash_cdbg/include/bloom_filter.hpp
Line 189 in f0ea0b3
In addition, comment out the following sections.
hash_cdbg/include/bloom_filter.hpp
Lines 219 to 225 in f0ea0b3
hash_cdbg/include/bloom_filter.hpp
Line 281 in f0ea0b3
Rewrite filter_size_add = 1; in the following places.
hash_cdbg/include/bloom_filter.hpp
Line 189 in f0ea0b3
In addition, uncomment out the following sections.
hash_cdbg/include/bloom_filter.hpp
Lines 219 to 225 in f0ea0b3
hash_cdbg/include/bloom_filter.hpp
Line 281 in f0ea0b3
Rewrite filter_size_add = 1; in the following places.
hash_cdbg/include/bloom_filter.hpp
Line 189 in f0ea0b3
In addition, comment out the following sections.
hash_cdbg/include/bloom_filter.hpp
Lines 219 to 225 in f0ea0b3
hash_cdbg/include/bloom_filter.hpp
Line 281 in f0ea0b3
To calculate the building time of BOSS which is the same for both the previous method and our method, do the following.
g++ -o make_boss.out make_boss.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_boss.out ./chr21_40x_err.fastq
/usr/bin/time -f "Memory:%MKB,time:%E" ./make_boss.out ./chr21_40x_noerr.fastqSubtracting this BOSS building time from the overall index building time yields "Building time of C".
The code to rebuilding original reads from index of suffix tree is rebuild_st.cpp in this folder. Compile this code, and run it with specified file.
g++ -o rebuild_st.out rebuild_st.cpp -I {your_sdsl_install_path}/include -L {your_sdsl_install_path}/lib -lsdsl -ldivsufsort -ldivsufsort64 -std=c++17 -O3 -DNDEBUG
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_st.out ./chr21_st_err.cst chr21_st_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_st.out ./chr21_st_noerr.cst chr21_st_noerr.reIf running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively
Build an FM-index to more accurately count the reads rebuilt by the previous and our methods; the FM-index is not necessarily needed if you just want to do rebuilding.
g++ -o build_fm_index.out build_fm_index.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz
./build_fm_index.out chr21_40x_err.fastq chr21_40x_err
./build_fm_index.out chr21_40x_noerr.fastq chr21_40x_noerrThe code to rebuilding original reads from index of previous method is rebuild_prev.cpp in this folder. Compile this code, and run it with specified file.
g++ -o rebuild_prev.out rebuild_prev.cpp -I {your_colored_bos_install_path}/include -L {your_colored_bos_install_path}/lib -lcol_boss -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_prev.out ./chr21_prev_err.boss ./chr21_40x_err.fm_index 16 chr21_prev_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_prev.out ./chr21_prev_noerr.boss ./chr21_40x_noerr.fm_index 16 chr21_prev_noerr.reIf running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively
The code to rebuilding original reads from index of previous method is rebuild_prev.cpp in this folder. Before compiling, modify the parameters to those used when you built the index and reinstall it. Compile rebuild_prev.cpp, and run it with specified file.
g++ -o rebuild_hash_cdbg.out rebuild_hash_cdbg.cpp -I {your_hash_cdbg_install_path}/include -L {your_hash_cdbg_install_path}/lib -lhash_dbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_hash_cdbg.out ./chr21_hash_cdbg_err.cdbg ./chr21_40x_err.fm_index 16 chr21_hash_cdbg_err.re
/usr/bin/time -f "Memory:%MKB,time:%E" ./rebuild_hash_cdbg.out ./chr21_hash_cdbg_noerr.cdbg ./chr21_40x_noerr.fm_index 16 chr21_hash_cdbg_noerr.reIf running code finishes, an index file is generated and the rebuilding time and peak memory are output to the console, respectively
You can find out how many reads have become ambiguous reads by running the following
g++ -o fastq2fasta.out fastq2fasta.cpp
./fastq2fasta chr21_40x_err.fastq chr21_40x_err.fasta
./fastq2fasta chr21_40x_noerr.fastq chr21_40x_noerr.fasta
g++ -o check_rebuild_read.out check_rebuild_read.cpp
./check_rebuild_read.out chr21_40x_err.fasta chr21_{method}_err.re.fasta
./check_rebuild_read.out chr21_40x_noerr.fasta chr21_{method}_noerr.re.fasta