Skip to content

Missing Data Preprocessing Pipeline and Required Input Files #3

@DDCY220

Description

@DDCY220

When attempting to reproduce the repository, I found that the current code cannot be run directly. The main reason is that the repository only provides some execution entry points and configuration files, but does not include a complete data preprocessing pipeline, nor does it explain how the raw data should be converted into the format required by the code.

For example, the code directly depends on several preprocessed files, as well as the corresponding trained model files. However, the repository does not provide the scripts used to generate these files, descriptions of the data formats, or any example data. As a result, even after installing the required dependencies and following the commands provided in the README, it is still not possible to construct the input data structure required by the code. In addition, although the EDU-graphRAG component provides a settings.yaml file and an example command for execution, the input/*.txt data directory specified in the configuration is not provided. There is also no explanation of how these text files should be generated from the original dataset. This prevents the GraphRAG indexing process from being fully executed as well.

I would appreciate it if the authors could provide the missing data preprocessing scripts, example input files, or detailed instructions for preparing the required data. Resolving this issue would greatly improve the reproducibility and usability of the repository.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions