FriendLLM is a mini LLM inference project written in C++.
The goal of this repository is not to compete with llama.cpp, but to provide a smaller and easier codebase for beginners who want to understand how an LLM inference engine is built from scratch.
You can think of it as a learning-oriented, mini version of llama.cpp.
This project is mainly for learning:
- how tensor metadata is represented
- how operators are described
- how a compute graph is built
- how a simple executor runs the graph
- how model weights and runtime memory are managed
The design aims to stay small, readable, and easy to modify.
FriendLLM is being built step by step from the bottom up:
- basic internal types
- tensor structure
- operator definitions
- compute graph
- executor
- model loading
- simple generation
Right now the repository is still in the early stage, with the focus on building the core data structures first.
- a mini inference engine for learning
- a readable C++ codebase
- a playground for understanding LLM internals
- not a production inference engine
- not a performance-first implementation
- not a full replacement for
llama.cpp
Projects like llama.cpp are excellent, but for beginners they can feel large and dense.
FriendLLM tries to keep the important ideas while reducing the amount of code you need to hold in your head at once.
The hope is that you can read the code, follow the data flow, and gradually build intuition for how LLM inference works.
Current structure:
FriendLLM/
├── include/
│ ├── core/
│ │ └── tensor.hpp
│ └── utils/
│ └── utils.hpp
├── CMakeLists.txt
└── README.md
As the project grows, more core modules will be added, such as graph, executor, backend, and model loader.
- start from the smallest useful abstraction
- prefer clarity over cleverness
- build one layer at a time
- keep the code beginner-friendly
If you want to follow the project in order, the recommended reading path is:
include/utils/utils.hppinclude/core/tensor.hpp- graph-related structures
- operator factory
- executor
- model parser and weight loading
Early work in progress.
The current focus is on the internal type system and tensor representation.