NEWS

ORNL Debuts JACC for Performance-Portable Julia on Four GPU Vendors

ORNL’s open-source JACC framework lets a single Julia codebase run on NVIDIA, AMD, Intel, and Apple GPUs. Early adopters include CERFACS and Riken.

Published

1 day ago

June 24, 2026

Logan Pierce

The Department of Energy’s Oak Ridge National Laboratory has shipped the first stable release of JACC, an open-source Julia framework that runs a single Julia codebase on NVIDIA, AMD, Intel, and Apple graphics processors alongside conventional CPUs. The framework is now publicly available, and the code lives on GitHub as part of the JuliaGPU organization. The release lands as accelerator diversity grows inside Top500 systems.

Short for Julia for ACCelerators, JACC builds on a long tradition in scientific computing of vendor-neutral performance-portable layers. ORNL says the framework replaces the vendor-specific kernels, undocumented workflows, and duplicated code paths that have forced Julia developers to commit to one GPU vendor at a time.

One Julia Codebase, Four GPU Vendors

JACC is the first high-level, performance-portable model for the Julia language built on the just-in-time and LLVM compiler infrastructure, per ORNL’s announcement of the JACC Julia framework. The four GPU vendors: NVIDIA, AMD, Intel, and Apple. CPUs from Intel, AMD, and Arm also work.

Portability is the aim, per ORNL. Domain scientists write a single Julia file and have it execute on any of those targets. JACC is conceptually aligned with established performance-portable layers in other languages, including Kokkos, RAJA, and SYCL in C++ and OpenMP and OpenACC in C and Fortran. ORNL argues that Julia’s just-in-time and metaprogramming model exposes capabilities, including multi-GPU node programmability, shared memory access, and asynchronous kernel execution, that are not necessarily available in those C++ and Fortran models.

Hosted on GitHub as part of the JuliaGPU organization, JACC builds directly on the existing CUDA, AMDGPU, Metal, and oneAPI backends developed by the Julia community, per the JACC.jl package and its source code. Developers pick their backend once, with “threads” as the default, and the rest of the code stays the same. ORNL says the framework keeps the Department of Energy science stack portable across the next round of system procurements.

By the Numbers

4 GPU vendors supported: NVIDIA, AMD, Intel, Apple
3 core API primitives: array, parallel_for, parallel_reduce
5 execution backends: threads, CUDA, AMDGPU, Metal, oneAPI

ORNL JACC Julia performance portable HPC framework

How the API Works

JACC’s public surface is small. It exposes three abstractions, array, parallel_for, and parallel_reduce, that let developers express parallelism and data movement in a portable way that still feels native to Julia users.

The default backend uses Julia’s multi-threaded CPU execution, and developers switch to GPU execution by selecting one of four other backends: CUDA, AMDGPU, Metal, or oneAPI. For non-expert users, the framework auto-selects reasonable execution strategies and data layouts with minimal configuration. Performance specialists can drop down to low-level controls for thread blocks, synchronization, multi-GPU execution, streams, and shared memory. Power users get full control. Both modes work on the same source code, and the choice of backend is stored in Julia’s LocalPreferences.toml file. A companion macro, @init_backend, inserts the right import statement at top-level scope, so a single Julia file moves from one HPC platform to another without code changes.

JACC Backend Support

Feature	CPU (threads)	CUDA	AMDGPU	Metal	oneAPI
Float64	Yes	Yes	Yes	No	If supported
Multi-GPU	N/A	Yes	No	No	Yes
Shared memory	N/A	Yes	Yes	Yes	Yes
@atomic	Yes	Yes	Yes	Yes	Yes
rand in kernels	Yes	Yes	Yes	Yes	No

Why the Lab Built It

The push for JACC sits inside a broader Department of Energy bet on vendor-neutral computing. As accelerator diversity grows inside Top500 systems, software that locks to a single GPU vendor at the source level becomes a maintenance liability that few research teams can keep up with.

Julia has been gaining ground in scientific computing for its expressive syntax, high-level productivity, and strong performance. At leadership-class scale, however, Julia developers seeking GPU performance have historically had to rely on vendor-specific programming models, undocumented workflows, or duplicated code paths. JACC was designed, per the lab, to remove those barriers. Patrick Diehl, a research scientist at Los Alamos National Laboratory, framed it as a continuation of an established DOE tradition of contributions to vendor-neutral computing as Julia adoption expands into AI and quantum science.

The Top500 supercomputer list, dominated by heterogeneous nodes packing multiple accelerator types per socket, is the system class ORNL says JACC is built for. The lab argues that scientific applications must be prepared to run on rapidly evolving, vendor-diverse systems. The release lists three outcomes the framework targets: reducing software complexity, improving maintainability, and lowering the cost of adapting applications to current and future leadership-class platforms.

It also brings Julia closer to the peer languages that already have those established performance-portable layers. C++ has Kokkos, RAJA, and SYCL. C and Fortran have OpenMP and OpenACC. Julia, until JACC, had community efforts but no broadly adopted equivalent.

Running in Production Outside ORNL

Outside ORNL, production users are already on board. The early adopters listed by the lab span Europe, Japan, and the United States, and the list includes production CFD code, university research, and a Japanese national lab scoping out its next leadership-class system.

Early Adopters Outside ORNL

CERFACS, the European Center for Research and Advanced Training in Scientific Computing, using JACC for its flagship Lattice Boltzmann code, BLAST
The University of Tokyo
The New Jersey Institute of Technology
Riken, exploring programming systems for the upcoming FugakuNEXT supercomputer

Our flagship Lattice Boltzmann code BLAST was quickly running on NVIDIA, AMD, Intel and Apple GPUs and multi-threaded CPUs, with competitive performance and without maintaining separate backend-specific kernels.

Jean-François Boussuge, head of the Advanced Aerodynamics and Multiphysics team at CERFACS, in the ORNL release.

Riken’s focus is FugakuNEXT. Hitoshi Murai, Software Development Technology Unit Leader for the Riken Center for Computational Science, said the lab is exploring JACC for portable CPU and GPU programming as it transitions to the next system. The exploration is also consistent with the spirit of the long-standing U.S. Department of Energy and Japan MEXT collaboration on advanced computing, per the ORNL release.

The Gaps the Support Table Shows

Rough edges come with the first stable release, and the project’s own support table shows them. Double-precision floating point, the workhorse of scientific simulation, is not supported on Apple’s Metal backend, and the AMDGPU backend shows no multi-GPU execution. Intel’s oneAPI support lists Float64 as conditional (“if supported”) and marks random-number generation inside kernels as unsupported. The shared memory and @atomic primitives are present across the CUDA, AMDGPU, Metal, and oneAPI backends, and random-number generation inside kernels works on the CPU, CUDA, AMDGPU, and Metal backends. Continuous integration runs on x86 and Arm hosts, an NVIDIA RTX A4000, an NVIDIA GTX 1080, an AMD MI100, an Apple M1, and an Intel A770. The matrix is real hardware.

Workloads that depend on Float64 and multi-GPU scaling, including CFD and molecular dynamics, would face the Metal and AMDGPU limits first, given the support table. JACC.BLAS is on the roadmap, and broader scientific application integration is the second priority. Both address gaps on the support table.

DOE Funding and the Roadmap

JACC sits inside a wider Department of Energy pipeline. Funding flows through the Department of Energy’s Advanced Scientific Computing Research (ASCR) program, with current support from the S4PST project under the Next Generation of Scientific Software Technologies program, which also sponsors the Consortium for the Advancement of Scientific Software. Earlier support came from ASCR’s Bluestone X-Stack and the Exascale Computing Project’s PROTEAS-TUNE effort, and the MAGMA and Fairbanks projects are listed as additional sponsors, per the project page.

The contributing institutions span Oak Ridge, Lawrence Berkeley, and Argonne national laboratories, per the SC’24 paper on JACC’s metaprogramming design, which credits Pedro Valero-Lara, William F. Godoy, Het Mankad, Keita Teranishi, Jeffrey S. Vetter, Johannes Blaschke, and Michel Schanen.

Recognition is also arriving. JACC capabilities were presented at the SC’24 XLOOP workshop, and the paper “Integrating ORNL’s HPC and Neutron Facilities with a Performance-Portable CPU/GPU Ecosystem” won a best paper award at the SC’24 XLOOP workshop. Julian Samaroo, a research software engineer at MIT and an original JuliaGPU AMDGPU.jl developer, credited the JuliaGPU community homepage in the ORNL release for making JACC possible through its unified backend-first development model. The roadmap names JACC.BLAS for kernel-level linear algebra routines and broader scientific application integration as near-term priorities, per the ORNL release.