Multi-Agent Spectrum Sharing for Cognitive Radar
MASS is a research-oriented simulation framework for studying autonomous spectrum sharing in congested radio frequency (RF) environments using reinforcement learning, heuristic control methods, and evolutionary optimization techniques.
The framework models dynamic spectrum occupancy, interference, adaptive transmission behavior, and multi-agent competition within radar-relevant RF environments. Agents learn or act to maximize spectrum efficiency while minimizing collisions and interference.
The simulation compares multiple cognitive agent architectures operating in a shared RF spectrum.
Implemented agent types include:
- Static Agents โ Fixed spectrum users with predefined behavior
- Random Start Agents โ Baseline random transmission strategy
- SAA Agents โ Sense-and-Avoid heuristic agents
- PPO Agents โ Proximal Policy Optimization agents
- DQN Agents โ Deep Q-Network agents
- DPG Agents โ Deterministic Policy Gradient agents
- MFOS Agents โ Evolutionary/meta-learning agents
- Ablated MFOS Agents โ Simplified MFOS variants used for ablation studies
The environment supports both synthetic and recorded spectrum data for evaluating adaptive RF behavior under realistic interference conditions.
The RF spectrum is represented using FFT-based frequency bins and simulated over discrete time intervals.
Core environment characteristics:
- FFT-based spectrum representation (
FFT_SIZE = 1024) - CPI/PRI radar timing structure
- Dynamic and static occupancy generation
- Collision tracking via spectrum ownership
- Multi-agent simultaneous transmission support
- Real or synthetic spectrum occupancy modeling
Key concepts:
- Occupancy state โ Boolean map representing active spectrum usage
- Bin ownership โ Tracks which agent controls each frequency bin
- Collisions โ Overlapping or invalid spectrum transmissions
- Dead space โ Unused spectrum available for exploitation
Agents are rewarded based on:
- Successful transmission bandwidth
- Collision avoidance
- Spectrum efficiency
- Stable bandwidth selection
- Stable center frequency selection
- Dead-space utilization efficiency
The reward function is configurable through config.py.
Important reward parameters include:
collision_ratiobetatransmission_weightcollision_weightbandwidth_distortioncenter_distortiondeadspace_penalty_scale
Implemented learning frameworks include:
- On-policy actor-critic reinforcement learning
- Generalized Advantage Estimation (GAE)
- Clipped policy updates
- Value-based reinforcement learning
- Experience replay memory
- Target network updates
- Deterministic policy gradient optimization
- Evolutionary/meta-learning optimization
- Population-based policy mutation
- Elite selection and exploration
- SAA heuristic control
- Random baseline policies
The framework produces multiple forms of evaluation data.
- Time vs frequency occupancy maps
- Agent ownership visualization
- Collision highlighting
- Spectrum utilization analysis
Tracked metrics include:
- Reward over time
- Collision rate
- Bandwidth usage
- Center frequency drift (
ฮCF) - Bandwidth adaptation (
ฮBW)
Evaluation summaries are exported to Excel files.
Example:
./Output/agent_eval_summarySimMulti.xlsx
Reported metrics include:
- Average reward
- Reward variance
- Collision statistics
- Bandwidth utilization
- Frequency stability metrics
All runtime configuration is controlled through:
config.py
The main execution script (main.py) does not accept command-line arguments.
Key configurable settings include:
SIM_MODE = True # True if no live data is to be used
MULTI_AGENT = True # Determines if the agents will be run against each other or just against data
EVAL_MODE = False # Always leave this false, set true automatically when 80% of iterations have passed
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
SEED = 42069DATA_CHOICE = "245"
DATA_CHOICE = "u245"
DATA_CHOICE = "264"
DATA_CHOICE = "u264"Provided spectrum datasets:
245โ 1M snapshots recorded at ARRC at OU in 2.4โ2.5 GHz band using X310u245โ 4.1M snapshots recorded at Oklahoma Memorial Union in 2.4โ2.5 GHz band using X310264โ 1M snapshots recorded at ARRC at OU in 2.59โ2.69 GHz band using X310u264โ 2.4M snapshots recorded at Oklahoma Memorial Union in 2.59โ2.69 GHz band using X310
AGENTS = {
"static": { # Static agents provide simulation data
"fat": 3, # wide bandwidth occupancy with large gaps between transmissions
"skinny": 4, # narrow bandwidth occupancy with equal resting vs transmission
"pulsed": 5, # narrow bandwidth occupancy that pulses for a long period then waits a long period
"rectangular": 0 # wide bandwidth occupancy with less modulation for short time and random new transmission locations. Good for simulating 2.59-2.69 GHz bandwidth
},
"random_start": 0,
"saa": 0,
"ppo": 1,
"dqn": 1,
"mfos": 1,
"dpg": 0,
"ablated_mfos": 0
}ITERATIONS = 1_000_000 # Overridden when live data used. This is only used for Sim
SPECTRUM_SAMPLE_SIZE = 30_000
PRINT_INTERVAL = 100_000
TIMESTEP_US = 10.24MASS/
โ
โโโ Agents/
โ โโโ PPO/
โ โโโ DQN/
โ โโโ DPG/
โ โโโ MFOS/
โ โโโ Control/
โ
โโโ Data/
โ โโโ spectrum_245ghz.dat
โ โโโ union_spectrum_245ghz.dat
โ โโโ spectrum_264ghz.dat
โ โโโ union_spectrum_264ghz.dat
โ
โโโ DataVisualization/
โ
โโโ Output/
โ
โโโ config.py
โโโ environment.py
โโโ rewards.py
โโโ signal_processing.py
โโโ main.py
Run the simulation with:
python main.pyAll simulation parameters are configured through config.py.
Install required Python packages:
pip install numpy torch pandas matplotlib openpyxlPyTorch-based agents support GPU acceleration when CUDA is available.
Device selection is handled automatically:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"- This framework is intended for research and experimentation.
- The simulation uses a centralized environment loop.
- Runtime scales significantly with the number of agents and iterations.
- Large-scale experiments may require substantial CPU/GPU resources.
- Real spectrum datasets are loaded from
.datand cached.npzfiles.
MASS is designed for experimentation involving:
- Cognitive radar
- Autonomous spectrum access
- RF coexistence
- Multi-agent reinforcement learning
- Electronic warfare simulation
- Adaptive communications
- Dynamic spectrum management