PlaygroundRL is a real-time reinforcement learning playground that runs entirely in the browser. Watch autonomous agents explore stylized grid worlds, adapt to obstacles, and chase rewards using Proximal Policy Optimization (PPO).
PlaygroundRL turns PPO training into an interactive visual experience. Multiple agents learn concurrently inside richly lit Three.js environments, helping you understand how policy gradients behave under different levels of difficulty.
- Real-time AI Training: Watch PPO agents improve directly in the browser
- Multiple Difficulty Levels: Two distinct environments with increasing complexity
- Smooth 3D Visualization: Powered by React Three Fiber for performant 3D graphics
- Multi-Agent System: Ten agents learn simultaneously for richer dynamics
- Dynamic Environments: Level 2 introduces moving obstacles for an added challenge
- Frontend: Next.js 14, React, TypeScript
- 3D Graphics: React Three Fiber, Three.js
- AI/ML: ONNX Runtime Web for in-browser inference
- Styling: Tailwind CSS, shadcn/ui components
- State Management: Zustand
- Animation: React Spring
- Node.js 14+
- npm or yarn
# Clone the repository
git clone https://github.com/yourusername/playgroundrl.git
# Navigate to project directory
cd playgroundrl
# Install dependencies
npm install
# or
yarn install
# Run the development server
npm run dev
# or
yarn devOpen http://localhost:3000 to see the application.
npm run build
npm start- Grid World: 25x25 tile-based environment
- Agents: Bunny agents start from random positions
- Goal: Find the pink reward tile while avoiding hologram tiles
- Obstacles:
- Level 1: Static hologram tiles (instant failure)
- Level 2: Moving hologram tiles + vision-based navigation
The bunnies use PPO (Proximal Policy Optimization) to learn optimal policies:
- State Space: Agent position, target position, distance to goal (+ vision in Level 2)
- Action Space: 4 discrete actions (up, down, left, right)
- Reward Structure: Positive reward for reaching the goal, negative for hitting obstacles
- Architecture: Actor-Critic neural network
- Training: Python implementation with stable-baselines3
- Deployment: ONNX models running in-browser via ONNX Runtime Web
- Hyperparameters: See in-app "Model Details" for complete configuration
├── app/
│ ├── (game)/
│ │ ├── page.tsx # Main game page
│ │ ├── LevelOne.tsx # Level 1 implementation
│ │ ├── LevelTwo.tsx # Level 2 implementation
│ │ ├── Player.tsx # Player bunny component
│ │ ├── runModel.ts # ONNX inference logic
│ │ └── store/ # Zustand stores
│ └── components/ # UI components
├── public/
│ └── models/ # 3D models and ONNX files
└── train/ # Python training scripts
The train/ directory contains Python scripts for training new PPO models:
cd train
python ppo.py # Train the model
python torch2onnx.py # Convert to ONNX formatThis project is licensed under the MIT License - see the LICENSE file for details.
