Extension for CUDA.jl

This is an extension to support GPU (CUDA.jl) acceleration for solving the time evolution and spectra. This improves the execution time and memory usage especially when the HEOMLS matrix is super large.

Compat

The described feature requires Julia 1.9+.

The functions of calculating time evolution (only supports ODE method with time-independent system Hamiltonian) and spectra will automatically choose to solve on CPU or GPU depend on the type of the sparse matrix in M::AbstractHEOMLSMatrix objects (i.e., the type of the field M.data).

typeof(M.data) <:   SparseMatrixCSC # solve on CPU
typeof(M.data) <: CuSparseMatrixCSC # solve on GPU

Therefore, we wrapped several functions in CUDA and CUDA.CUSPARSE in order to return a new HEOMLS-matrix-type object with M.data is in the type of CuSparseMatrix, and also change the element type into ComplexF32 and Int32 (since GPU performs better in this type). The functions are listed as follows:

  • cu(M::AbstractHEOMLSMatrix) : Translate M.data into the type CuSparseMatrixCSC{ComplexF32, Int32}
  • CuSparseMatrixCSC(M::AbstractHEOMLSMatrix) : Translate M.data into the type CuSparseMatrixCSC{ComplexF32, Int32}

Demonstration

The extension will be automatically loaded if user imports the package CUDA.jl :

using HierarchicalEOM
using LinearSolve # to change the solver for better GPU performance
using CUDA
CUDA.allowscalar(false) # Avoid unexpected scalar indexing

Setup

Here, we demonstrate this extension by using the example of the single-impurity Anderson model.

ϵ  = -5
U  = 10
Γ  = 2
μ  = 0
W  = 10
kT = 0.5
N  = 5
tier  = 3

tlist = 0:0.1:10
ωlist = -10:1:10

σm = [0 1; 0  0]
σz = [1 0; 0 -1]
II = [1 0; 0  1]
d_up = kron(     σm, II)
d_dn = kron(-1 * σz, σm)
ρ0   = kron([1 0; 0 0], [1 0; 0 0])
Hsys = ϵ * (d_up' * d_up + d_dn' * d_dn) + U * (d_up' * d_up * d_dn' * d_dn)

bath_up = Fermion_Lorentz_Pade(d_up, Γ, μ, W, kT, N)
bath_dn = Fermion_Lorentz_Pade(d_dn, Γ, μ, W, kT, N)
bath_list = [bath_up, bath_dn]

# even HEOMLS matrix
M_even_cpu = M_Fermion(Hsys, tier, bath_list)
M_even_gpu = cu(M_even_cpu)

# odd HEOMLS matrix
M_odd_cpu  = M_Fermion(Hsys, tier, bath_list, ODD)
M_odd_gpu  = cu(M_odd_cpu)

# solve steady state with CPU
ados_ss = steadystate(M_even_cpu);
Note

This extension does not support for solving stationary state on GPU since it is not efficient and might get wrong solutions. If you really want to obtain the stationary state with GPU, you can repeatedly solve the time evolution until you find it.

Solving time evolution with CPU

ados_list_cpu = HEOMsolve(M_even_cpu, ρ0, tlist)

Solving time evolution with GPU

ados_list_gpu = HEOMsolve(M_even_gpu, ρ0, tlist)

Solving Spectrum with CPU

dos_cpu = DensityOfStates(M_odd_cpu, ados_ss, d_up, ωlist)

Solving Spectrum with GPU

dos_gpu = DensityOfStates(M_odd_gpu, ados_ss, d_up, ωlist; solver=KrylovJL_BICGSTAB(rtol=1f-10, atol=1f-12))