Blog | PyTorch

June 11, 2024

PyTorch Foundation Welcomes New Executive Director

The PyTorch Foundation is excited to welcome Matt White, our new executive director. The PyTorch Foundation formed in 2022 with the goal to drive adoption of AI tooling by fostering and sustaining an ecosystem of open source, vendor-neutral projects with PyTorch. Over the past 2 years, we’ve seen excellent growth across the project – with both contributor and member growth.

June 06, 2024

INT4 Decoding GQA CUDA Optimizations for LLM Inference

An efficient decoding Grouped-Query Attention with low-precision KV cache

June 04, 2024

Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024

The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Our hope with this Docathon is to simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine learning.

May 21, 2024

The Path to Achieve PyTorch Performance Boost on Windows CPU

The challenge of PyTorch’s lower CPU performance on Windows compared to Linux has been a significant issue. There are multiple factors leading to this performance disparity. Through our investigation, we’ve identified one of the primary reasons for poor CPU performance on Windows, which is linked to the Windows default malloc memory allocator.

May 21, 2024

Maximizing Training Throughput Using PyTorch FSDP and Torch.compile

Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU (Model Flops Utilization) for training a 7B model on A100 GPUs. We also demonstrated how it can train a high quality model, which we open sourced as Granite 7B base model on Hugging Face Hub under the Apache v2.0 license.

May 15, 2024

Achieving Sustainability Goals with PyTorch and Intel AI

This post was contributed by Intel AI in partnership with the PyTorch Foundation.

May 14, 2024

Speeding up ViTs using Block Sparsity

TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32 Vision Transformers on A100 GPUs by applying block sparsity on MLP module’s weights. This approach can potentially be applied to other types of transformers including large language models. Our implementation and benchmarks to reproduce our results are available at https://github.com/pytorch-labs/superblock.

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing

PyTorch Foundation Welcomes New Executive Director

INT4 Decoding GQA CUDA Optimizations for LLM Inference

Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024

The Path to Achieve PyTorch Performance Boost on Windows CPU

Maximizing Training Throughput Using PyTorch FSDP and Torch.compile

Achieving Sustainability Goals with PyTorch and Intel AI

Speeding up ViTs using Block Sparsity

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners