🔒 Permission Denied — Role Viewer has limited access. Some actions are disabled.
Distributed Jobs
Manage multi-node distributed training jobs.
🌍
406
Total
▶️
3,353
Running
✅
8,662
Completed
❌
8,879
Failed
| Name | Status | Framework | Nodes | GPUs/Node | Total GPUs | Duration | Owner | |
|---|---|---|---|---|---|---|---|---|
| fsdp-10 | Failed | DeepSpeed | 3 | 8 | 114 | henry.zhao | ||
| fsdp-5 | Queued | Horovod | 15 | 8 | 47 | alice.liu | ||
| ray-train-27 | Completed | PyTorch FSDP | 4 | 8 | 38 | bob.zhang | ||
| fsdp-15 | Running | PyTorch FSDP | 8 | 4 | 14 | bob.zhang | ||
| pretrain-48 | Completed | Horovod | 13 | 4 | 14 | bob.zhang | ||
| ray-train-50 | Failed | Ray Train | 3 | 8 | 80 | alice.liu | ||
| fsdp-11 | Failed | PyTorch FSDP | 2 | 4 | 122 | henry.zhao | ||
| fsdp-24 | Failed | Megatron-LM | 15 | 8 | 122 | bob.zhang | ||
| ray-train-21 | Running | PyTorch FSDP | 7 | 4 | 8 | bob.zhang | ||
| ray-train-33 | Completed | Megatron-LM | 7 | 8 | 119 | henry.zhao |