![]() |
![]() |
![]() |
| LDC Vel Magnitude Re=100 | u Benchmark Results | v Benchmark Results |
A basic implementation for 2D D2Q9 LBM for Mojo using only the Standard Library as a learning exercise. If you are interested in simulation (and coming from python) this is a great exercise as it:
- Learning Correct Typing and Parameterization in Mojo a. Supports any DType Floating point (mainly fp32 or fp64)
- GPU kernels and TileTensor Layouts
- How to call Python Modules in Mojo: a. Passing buffers into Numpy arrays with Unsafe Pointers b. Using Pyvista for Visualisation
- Creating Custom structs and functions to reduce repeated code (e.g. vector, contextTileTensor)
- Basic Origin tracking
- Mojo Packaging
2026/06/05 Implemented TiledLayouts for LBM 2026/06/04 Implemeted First Variation that uses thread reording 2026/05/20 LBM working with mid-gridbounce bounceback and moving wall BC. Row Major. Base Example
Lattice Boltzmann Method (LBM) is a fluid simulation based on the Boltzmann Equation and specifically made for GPU like compute. It is an explicit time stepping algorithim (so no solving systems of equations) and performed on a structured grid. The Single relaxation time (SRT) model implemented is designed for incompressible flow (Mach number less than 0.3)
Its simplicity allows one to capture fluid motion in a single tight kernel ~ 50 lines.
- Stream Populations And Apply BC (I use a pulled approach here)
- Calculate Post BC and streamed velocity and density
- Compute Collision Step
Stack allocated vector with value semantics (i.e. ImplicitelyCopyable Trait and so behaves like a number) and support for standard ops (+-*/) with same vector type or scalars. Also support sum, prod with oneself and dot product with another vector. An InlineArray stores the data inside the vector.
Currently Not Simd optimized for large vector (uses simple for loops)
Simple Struct that manages the host and device buffer together and keeps the 2 buffers in sync. Uses familiar .cpu() and .gpu() getters to call the
Tensor as a cpu or gpu tile tensor respecitively. Buffer copies between the 2 buffers only occur when we call different buffers in a row.
a = ContextTileTensor(ctx,layout)
cpu_tensor = a.cpu() # No Copy as initial call
# Some CPU Work Here
# ...
gpu_tensor = a.gpu() # Copy is performed from Host Buffer (CPU) to Device Buffer (GPU)
# Some Gpu Work Here...
gpu_tensor2 = a.gpu() # No Copy as last call was the same GPU
cpu_tensor = a.cpu() # Copy is perfomed from GPU to CPU
- Create function to set BC - Moving and No Slip
- Create LBM kernel with mid grid bounceback
- [] Use Benchmarking to determine speed ups and optimisations
- [] Add Simd optimisation
- [] Add Layout Analysis
- [] Swizzling analysis
- [] Implement 3D lattices models
- [] Implement Custom Floating Point
- [] Equilibrium Conditions
-
2026/05/12
- Awkward slicing syntax
- Type System can be annoying
- Int and Scalar[Dtype.int32] for Gpu kernels type mismatching
- Lack of clarity what can be passed to GPU
- Very Barebones so have to basically build everything from scratch
- Maybe to low level for now to incentivise a switch from CUDA or Python DSLs
-
2026_05/14
- Optional is weird and doesnt make sense
- Bool dont have is implemented so foo is False does not work
-
2026_05_19
- While theyare building some awesome stuff, the QA and actual usage of the language features in more realistic context can be a bit lacking
- A python User, because Mojo is targeted for systems (i.e. "low level") programming design, theres a significant gap between using std builtins and Python functions. Might be unavoidable.


