Performance enhancements (batched predictions using GEMM)

Hi,

Depending on the application - model and problem sizes - it's possible to make the inference very much faster by doing it in batches (packing vector-sized inputs into a 2D array) and replacing the matrix-vector multiplications by matrix-matrix which are delegated to a BLAS library. I have a Fortran application based on FKB, or actually its earlier incarnation neural-Fortran, where I did such that (I referenced neural-Fortran in my paper).  It works well, and the nice thing is it's trivial to run the code on GPU's too:

```
#ifdef USE_CUDA
#define sgemm cublassgemm
#endif
```
plus some OpenACC directives above the bias addition and activation loops. You can find my code [here](https://github.com/peterukk/rte-rrtmgp-nn/blob/master/neural/mod_network.F90#L328). I think a similar batched output procedure for 2D arrays would be a valuable contribution to the main repo. I am happy to work on a pull request if you agree. If so let me know if you'd like to keep the GPU stuff: I'd have to add a few things to make it more general, like copying the input array to device, and creating the intermediate arrays for hidden layers (in my code I can get away with just two intermediate arrays where I do pointer swapping, because my models had the same number of neurons in all hidden layers).  

There are a few other points too: 
- should DGEMM be called if input data is in double precision?
- if the pointer-based activation functions of the current code are used,[ last I checked those can't be elemental functions](https://github.com/modern-fortran/neural-fortran/issues/18#issuecomment-571668880), which is what I used

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance enhancements (batched predictions using GEMM) #17

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Performance enhancements (batched predictions using GEMM) #17

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions