Argonne Leadership Computing Facility

Deploy Qwen2.5-VL on a Single Node

Overview Qwen2.5-VL 72B is a flagship multimodal large language model, distinguished by its 72 billion parameters and advanced capabilities in vision and language integration. This model excels in a wide range of tasks, including sophisticated visual understanding, robust multilingual OCR, and complex document and video analysis. Unlike its predecessors, Qwen2.5-VL introduces dynamic resolution and temporal video alignment, allowing it to accurately process and summarize long-form videos and pinpoint events with second-level granularity. A key feature is its “agentic” ability, which enables it to act as a visual agent for interactive tasks, such as operating a computer or mobile device based on visual input and instructions. The model also offers precise object grounding with bounding boxes and can generate structured outputs in formats like JSON, making it highly suitable for applications requiring data extraction from tables, forms, and other complex layouts.

Sunday, September 7, 2025 | 4 minutes Read

Distributed Training on ALCF Polaris

Overview Polaris is a high-performance computing (HPC) system at the Argonne Leadership Computing Facility (ALCF) that provides robust support for distributed training workflows and advanced scientific computing applications. This post will go through how to train a deep learning model in distributed parallel using Hugging Face Accelerate, a library that simplifies distributed training across multiple GPUs and nodes. Prerequisites Before starting this tutorial, ensure you have: ALCF Account: Active account with access to Polaris system Project Allocation: Computing time allocation on a project (you’ll need the project name) MFA Setup: CRYPTOCard or MobilePASS+ token configured for authentication Basic Knowledge: Familiarity with SSH, Linux command line, and Python virtual environments Python Experience: Understanding of deep learning concepts and PyTorch/Transformers What is DeepSpeed? DeepSpeed is Microsoft’s deep learning optimization library that enables efficient distributed training. It provides memory optimization techniques like ZeRO (Zero Redundancy Optimizer) and supports large model training across multiple GPUs and nodes with minimal code changes.

Saturday, September 6, 2025 | 4 minutes Read