Missing Data Preprocessing Pipeline and Required Input Files

When attempting to reproduce the repository, I found that the current code cannot be run directly. The main reason is that the repository only provides some execution entry points and configuration files, but does not include a complete data preprocessing pipeline, nor does it explain how the raw data should be converted into the format required by the code.

For example, the code directly depends on several preprocessed files, as well as the corresponding trained model files. However, the repository does not provide the scripts used to generate these files, descriptions of the data formats, or any example data. As a result, even after installing the required dependencies and following the commands provided in the README, it is still not possible to construct the input data structure required by the code. In addition, although the EDU-graphRAG component provides a settings.yaml file and an example command for execution, the input/*.txt data directory specified in the configuration is not provided. There is also no explanation of how these text files should be generated from the original dataset. This prevents the GraphRAG indexing process from being fully executed as well.

I would appreciate it if the authors could provide the missing data preprocessing scripts, example input files, or detailed instructions for preparing the required data. Resolving this issue would greatly improve the reproducibility and usability of the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Data Preprocessing Pipeline and Required Input Files #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Missing Data Preprocessing Pipeline and Required Input Files #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions