Spaces:
Running
Pipeline Parallelism Scheduler and Visualizer
This tool simulates and visualizes pipeline parallelism scheduling strategies, focusing on the 1F1B (One-Forward-One-Backward) scheduling algorithm commonly used in distributed deep learning.
Usage
Example Output
python pipeline.py --num-stages 4 --num-batches 8
Command Line Interface
Option | Short | Description |
---|---|---|
--config |
-c |
Path to config file (JSON or YAML) |
--num-stages |
-s |
Number of pipeline stages (devices) |
--num-batches |
-b |
Number of micro-batches |
--forward-times |
-f |
Time for forward pass at each stage (space-separated list) |
--backward-times |
-bw |
Time for backward pass at each stage (space-separated list) |
--output |
-o |
Output file path for visualization |
--no-visualization |
Skip visualization generation | |
--p2p-time |
P2P communication time of PP |
Using Configuration Files
You can use either JSON or YAML configuration files:
Example JSON configuration (sample_config.json):
{
"num_stages": 6,
"num_batches": 12,
"forward_times": [0.8, 1.0, 1.2, 1.0, 0.9, 1.1],
"backward_times": [1.6, 2.0, 2.4, 2.0, 1.8, 2.2],
"output_file": "pipeline_1f1b_custom.png"
}
Example YAML configuration (sample_config.yaml):
# Pipeline Parallelism Configuration
num_stages: 5
num_batches: 8
forward_times:
- 0.9
- 1.1
- 1.0
- 0.8
- 1.2
backward_times:
- 1.8
- 2.2
- 2.0
- 1.6
- 2.4
output_file: "pipeline_1f1b_yaml.png"
About Pipeline Parallelism
Pipeline parallelism is a distributed deep learning training strategy that splits model layers across multiple devices. Each device processes a different stage of the neural network, creating a pipeline where multiple micro-batches can be processed simultaneously.
The 1F1B (One-Forward-One-Backward) scheduling algorithm is an efficient strategy for pipeline parallelism that balances throughput with memory usage. It follows these phases:
- Warmup Phase: Forward passes for the first several micro-batches
- Steady State: Each device alternates between forward and backward passes
- Cooldown Phase: Backward passes to complete the computation for remaining micro-batches
The "bubble rate" metric measures the inefficiency in the pipeline, representing the percentage of time devices spend idle waiting for dependencies.
References
- PipeDream: Generalized Pipeline Parallelism for DNN Training (SOSP'19)
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (NeurIPS'19)
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism