Skip to content

GPUOpen-LibrariesAndSDKs/MiniDXNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MiniDXNN β€” MLP Inference & Training on DirectX 12 with LinAlg Matrix

CMake build on Windows

An implementation of MLP (Multi-Layer Perceptron) inference and training using DirectX 12 LinAlg Matrix. This library demonstrates GPU-accelerated neural network inference and training with cutting-edge shader features.

  • πŸš€ High Performance: GPU-accelerated inference and training using LinAlg Matrix
  • πŸ”§ Flexible Architecture: Configurable layers, activations, and data types
  • 🎯 Single-header HLSL: Easy to integrate into any DX12 project

Requirements

  • OS: Windows 11 with Developer Mode enabled
  • GPU: Supports Shader Model 6.10 and LinAlg Matrix in D3D12 (AMD Radeonβ„’ RX 9000 Series GPUs or equivalent NVIDIA)
  • Build: CMake β‰₯ 3.21, Visual Studio 2022 (C++20), Windows SDK
  • DX12 Runtime: Agility SDK 1.720-preview, DXC v1.10.2605.2
  • Python: Python 3.8+ with PyTorch (optional, for example python training)

Getting Started

# Clone with submodules (gfx, GoogleTest, CLI11)
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDXNN.git
cd MiniDXNN

# Build (library + examples)
cmake -B build
cmake --build build --config Release

# Optional: build & run tests
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release
cd build/unittest && ctest -C Release

Example binaries are output to build/example/Release/. Run them from build/example/ as the working directory. See example/README.md for details.

DX12 Setup

⚠️ Important: As of early 2026, LinAlg Matrix requires experimental feature support.

  1. Install a LinAlg Matrix supported driver
  2. Enable Experimental Shader Model with D3D12EnableExperimentalFeatures before creating the device
  3. Compile shaders with Shader Model 6.10

For a detailed walkthrough β€” including feature checks, weight matrix conversion (GetLinearAlgebraMatrixConversionDestinationInfo / ConvertLinearAlgebraMatrix), bias alignment, and full sample code β€” see the LinAlg Matrix MLP Guide.

HLSL Usage

Include the header-only library in your compute shader:

#include <minidxnn/hlsl/mlp.hlsl>

static const uint NUM_LAYERS = 3;       // total layers (hidden + 1)
static const int  INPUT_DIM  = 2;
static const int  HIDDEN_DIM = 64;
static const int  OUTPUT_DIM = 2;

using LayerData = mininn::InferenceLayerDataRef<
    NUM_LAYERS, HIDDEN_DIM,
    dx::linalg::DATA_TYPE_FLOAT16,
    dx::linalg::MATRIX_LAYOUT_ROW_MAJOR,
    dx::linalg::DATA_TYPE_FLOAT16,      // bias type
    dx::linalg::DATA_TYPE_FLOAT16,      // accumulator type
    mininn::LeakyReluActivation,        // hidden activation
    mininn::SigmoidActivation           // output activation
>;

ByteAddressBuffer g_weights : register(t0);
ByteAddressBuffer g_biases  : register(t1);

[numthreads(32, 1, 1)]
void main(uint3 tid : SV_DispatchThreadID)
{
    LayerData layerData;
    layerData.setWeightData(g_weights, uint2(firstLayerMatSize, hiddenLayerMatSize));
    layerData.setBiasData(g_biases);

    vector<half, INPUT_DIM> input = ...;
    vector<half, OUTPUT_DIM> output;
    mininn::forward(output, input, layerData);
}

See the HLSL API Reference for the full API including training (backward).

C++ Fallback Mode

MiniDXNN can be built without DirectX 12 or GPU dependencies, using a pure C++ fallback path. This is useful for CI, unit testing, or platforms without DX12 support (e.g. Linux).

# Build with CPU fallback only (no GPU/DX12 required)
cmake -B build \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DMINIDXNN_CPP_FALLBACK_ONLY=ON \
  -DMINIDXNN_BUILD_TESTS=ON \
  -DMINIDXNN_BUILD_EXAMPLES=ON

cmake --build build -j$(nproc)

# Run tests
cd build && ctest --output-on-failure

How It Works

The CPU fallback compiles include/minidxnn/hlsl/mlp.hlsl as standard C++ by providing HLSL-compatible shims in include/minidxnn/cpp/hlsl_compat.hpp. This header maps HLSL intrinsics (vector, ByteAddressBuffer, dx::linalg::*) to C++ equivalents. The half-precision type is configurable via the MINIDXNN_CPP_FALLBACK_HALF_TYPE CMake compile definition (defaults to half_float::half).

Project Structure

MiniDXNN/
β”œβ”€β”€ include/minidxnn/
β”‚   β”œβ”€β”€ hlsl/mlp.hlsl              # Header-only HLSL library: MLP forward & backward
β”‚   └── cpp/hlsl_compat.hpp         # HLSL β†’ C++ shim for CPU-only builds
β”œβ”€β”€ docs/                          # Documentation
β”‚   └── mlp_hlsl.md                #   HLSL API reference
β”œβ”€β”€ example/                       # Example applications
β”‚   β”œβ”€β”€ common/                    #   Shared C++ utilities
β”‚   β”‚   β”œβ”€β”€ mlp_layer.hpp          #     MLP layer data, CPU forward/backward, weight init
β”‚   β”‚   β”œβ”€β”€ cpp_fallback.hpp       #     C++ fallback infrastructure (buffer packing, etc.)
β”‚   β”‚   β”œβ”€β”€ optimizer.hpp          #     Optimizer implementations (SGD, Adam, Lion)
β”‚   β”‚   β”œβ”€β”€ activation.hpp         #     Activation functions (Identity–LeakyReLU + Tanh)
β”‚   β”‚   β”œβ”€β”€ loss.hpp               #     Loss functions (MSE)
β”‚   β”‚   └── ...                    #     GPU helpers, image I/O, textures, matrix, RNG
β”‚   β”œβ”€β”€ kernel/                    #   HLSL compute shaders
β”‚   β”‚   β”œβ”€β”€ optimizer.hlsl         #     GPU optimizer kernels (SGD, Adam, Lion)
β”‚   β”‚   └── ...                    #     Training, inference, and shared kernel headers
β”‚   β”œβ”€β”€ 01_texture_inference/      #   Inference from a pre-trained MLP binary
β”‚   β”œβ”€β”€ 02_texture_training/       #   On-GPU training + reconstruction
β”‚   └── README.md                  #   Example documentation
β”œβ”€β”€ scripts/pyreference/           # Python reference implementation
β”‚   β”œβ”€β”€ texture_reconstruction_mlp.py  # Train & export MLP model
β”‚   └── xoshiro128p.py                 # RNG matching C++ xoshiro128+
β”œβ”€β”€ unittest/                      # GoogleTest unit tests
β”œβ”€β”€ third_party/                   # Submodules & vendored deps
β”œβ”€β”€ cmake/                         # CMake scripts
└── tools/natvis/                  # Debugging helpers (Visual Studio natvis)

Features

Category Details
Architecture MLP with 0–N hidden layers, independent input/hidden/output dimensions
Operations Forward pass (inference), backward pass (training with gradient accumulation)
Activations Identity, Sigmoid, ReLU, Leaky ReLU (custom activations supported β€” e.g. Tanh)
Data type float16 (DATA_TYPE_FLOAT16) β€” currently the only tested type
Matrix layout Row-major, Column-major, Mul-optimal, Outer-product-optimal

Examples

# Name Description
01 Texture Inference Load a pre-trained MLP binary and reconstruct a texture on the GPU
02 Texture Training Train an MLP on-GPU to learn a 2D texture pattern, then reconstruct it
03 Texture Compression with Input Encoding Train with positional/grid input encoding for higher-quality texture compression

See example/README.md for step-by-step instructions.

Documentation

License

MIT License β€” see LICENSE.

Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.

Third-Party Notices

See NOTICE.md for details.

CMake dependency downloads

When building with GPU support (without MINIDXNN_CPP_FALLBACK_ONLY), CMake auto-downloads dependencies to third_party/gfx_dep/gfx/third_party/.


About

An implementation of MLP using DX12 cooperative vector.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors