In this project, a GAN for synthesizing electricity load profiles was developed.
The recommended Python version for running this code is 3.11.
- Clone the repository to your local machine:
git clone https://github.com/MODERATE-Project/Synthetic-Load-Profiles.git- Navigate to the project directory:
cd path/to/repository/Synthetic-Load-Profiles-
Create an enviroment:
Conda command for creating a suitable environment (replace myenv with the desired enviroment name):
conda create --name myenv python=3.11-
Activate the enviroment:
Conda command for activating the created enviroment (replace myenv with the selected name):
conda activate myenv- Install required Python packages:
conda install pippip install -r requirements.txtThe input data needs to be provided in form of a CSV file.
The data should roughly cover one year (min 365, max 368 days) of hourly electricity consumption values.
Each column of the CSV file should correspond to a single profile/household.
The first column of the CSV file needs to be an index column (ideally containing timestamps).
Example:
There are two ways to run the code:
A marimo notebook is provided for easily uploading files, creating projects and training models.
The notebook can be accessed by running the following command in the project directory:
marimo run marimo.pyAfter uploading the required Input file(s) and adjusting the settings, the program can be started by pressing the "Start" button below the options menu.
⚠ marimo notebooks only allow file sizes up to 100 MB; for larger input files, the Python script has to be used ⚠
As an alternative to the marimo notebook, a Python script ("run.py") can be used to create projects and train models.
Settings have to be adjusted directly in the script and file paths have to be provided for the input files. Advanced options are not provided here, however, a multitude of underlying parameters can be adjusted in "model" → "GAN_params.py" and "WGAN_params.py".
The following (hyper)parameters can be adjusted:
Model & hardware
- modelType: Lets you choose between an ordinary GAN and a WGAN model. The WGAN is usually more stable in training but for some usecases the GAN might be more suitable.
- device: Lets you choose between CPU and GPUs for creating and training a model. Leave the default value to enable automatic GPU detection.
Training loop
- epochCount: Amount of epochs for training.
- batchSize: The batch size (number of training examples processed together in one forward and backward pass) used for training.
Optimizer
- lrGen/lrDis: Define the learning rates of the generator and the discriminator.
- betas: By default, AdamOptimizer is used in both the Generator and Discriminator. The beta values define the moving averages.
- genLoopCount: When a model is trained, in the beginning, the discriminator might outperform the generator, leading to no training effect. The generator can be trained multiple times per iteration, defined by this variable.
- lossFct: Loss function defined in GAN_params.py. Note that the training code always overrides this with BCEWithLogitsLoss for numerical stability; changing this value has no effect without modifying the training code. This is only valid for the GAN.
- lambdaGP: Gradient penalty coefficient used during training to enforce the Lipschitz constraint. This is only valid for the WGAN.
Regularization
- dropout: Dropout probability applied in each dropout layer of the generator and discriminator during training. This is only valid for the GAN.
- dropoutOffEpoch: Defines the epoch after which all dropout layers in the generator are deactivated (might improve the results). This is only valid for the GAN.
Architecture
- dimNoise: Dimension of the noise vector that is fed into the generator as input.
- dimHidden: Base number of hidden channels in the convolutional layers. Actual channel counts in deeper layers scale as multiples of this value, controlling the overall model capacity.
- channelCount: Number of channels in the generator output and discriminator input. Should be set to 1 for single-channel load profile data.
Output & logging
- outputFormat: Lets you choose between three possible file formats for the synthetic data: ".npy", ".csv" and ".xlsx".
- saveFreq: Defines the frequency of epochs at which results should be saved. If the save frequency is higher than the epochCount, plots, models and synthetic data samples are only saved if logStats is enabled for the best performing epoch.
- saveModels: Whether or not to save models at the specified frequency or for the best performing epoch if logStats is set to true.
- savePlots: Whether or not to save plots at the specified frequency or for the best performing epoch if logStats is set to true.
- saveSamples: Whether or not to save samples at the specified frequency or for the best performing epoch if logStats is set to true.
- logStats: Whether or not to log a composite metric for checking the quality of the results in every epoch. If enabled, plots, models and samples are saved for the best performing epoch within the training.
- checkForMinStats: Epoch after which the model starts checking whether the composite metric improves before deciding whether to plot and save the model state and results. If set to 0 every epoch is checked for better results which might slow down the training process in the first couple of epochs as the model improves almost every epoch in the beginning.
- useWandb: Whether or not to track certain parameters online via Weights & Biases. Requires a Wandb account.
