Logo Ray's Blog
  • Home
  • About
  • News
  • Publications
  • Education
  • More
    Experiences
  • Posts
  • Notes
  • Dark Theme
    Light Theme
Logo Inverted Logo
  • Posts
  • AI
    • Infrastructure
      • Guides On Choosing Deep Learning Server
    • LLM
      • Asksage
    • PyTorch
      • Learning PyTorch Part I
      • Pytorch Distributed Data Parallel With Model Parallel in an HPC Environment
  • Tools
    • NeoVim
    • An Intro to a CLI Password Management: Pass
    • Exercism Cli Shortcut
    • Random Docker/Podman tips
  • HPC
    • ALCF
      • Distributed Training
      • QWen2.5-VL
  • Linux
    • Manage Users in Linux
    • Setup Ubuntu 22.04
  • Embedded Systems
  • Programming
    • C++
      • C++ Enum Pattern
    • Competitive Programming
      • How to Learn Programming
      • Mistakes I Have Made
      • TopCoder
        • HoleCakeCuts topcoder SRM411 div2 level3
        • InfiniteSequence topcoder SRM413 div2 level3
        • StringsAndTabs topcoder SRM412 div2 level3
        • TeleportsNetwork topcoder SRM409 div2 level3
    • Design Patterns
      • Object-Oriented Analysis
      • Object-Oriented Design Principles
    • Python
      • Python Conditional Timeit Decorator
Hero Image
Deploy Qwen2.5-VL on a Single Node

Overview Qwen2.5-VL 72B is a flagship multimodal large language model, distinguished by its 72 billion parameters and advanced capabilities in vision and language integration. This model excels in a wide range of tasks, including sophisticated visual understanding, robust multilingual OCR, and complex document and video analysis. Unlike its predecessors, Qwen2.5-VL introduces dynamic resolution and temporal video alignment, allowing it to accurately process and summarize long-form videos and pinpoint events with second-level granularity. A key feature is its “agentic” ability, which enables it to act as a visual agent for interactive tasks, such as operating a computer or mobile device based on visual input and instructions. The model also offers precise object grounding with bounding boxes and can generate structured outputs in formats like JSON, making it highly suitable for applications requiring data extraction from tables, forms, and other complex layouts.

    Sunday, September 7, 2025 | 4 minutes Read
    Hero Image
    Distributed Training on ALCF Polaris

    Overview Polaris is a high-performance computing (HPC) system at the Argonne Leadership Computing Facility (ALCF) that provides robust support for distributed training workflows and advanced scientific computing applications. This post will go through how to train a deep learning model in distributed parallel using Hugging Face Accelerate, a library that simplifies distributed training across multiple GPUs and nodes. Prerequisites Before starting this tutorial, ensure you have: ALCF Account: Active account with access to Polaris system Project Allocation: Computing time allocation on a project (you’ll need the project name) MFA Setup: CRYPTOCard or MobilePASS+ token configured for authentication Basic Knowledge: Familiarity with SSH, Linux command line, and Python virtual environments Python Experience: Understanding of deep learning concepts and PyTorch/Transformers What is DeepSpeed? DeepSpeed is Microsoft’s deep learning optimization library that enables efficient distributed training. It provides memory optimization techniques like ZeRO (Zero Redundancy Optimizer) and supports large model training across multiple GPUs and nodes with minimal code changes.

      Saturday, September 6, 2025 | 4 minutes Read
      Navigation
      • About
      • News
      • Publications
      • Education
      • Experiences
      Contact me:
      • yren@bnl.gov
      • yhren
      • Yihui (Ray) Ren

      Liability Notice: This blog is for informational and educational purposes only. The content provided here represents personal opinions and is not intended as professional advice. Readers should not rely solely on this information and are responsible for their own actions and decisions. This blog is not liable for any damages or consequences resulting from the use of its content. The views expressed here are my own and do not reflect those of my employer or any funding agencies. © 2017-2025 Yihui Ren. All rights reserved.


      Toha Theme Logo Toha
      © 2017-2025 Yihui Ren. All rights reserved.
      Powered by Hugo Logo