Memory Optimization with Lazy Operators

Introduction

HierarchicalEOM.jl provides a memory-efficient lazy evaluation feature for HEOM Liouvillian superoperator (HEOMLS) matrices using SciMLOperators.TensorProductOperator. This feature can significantly reduce memory usage for large systems.

By default, a HEOMLS matrix is assembled as a single large sparse matrix with space complexity (memory scaling) as:

\[\mathcal{O}(N_{\text{ADO}}^2 \cdot d^4)\]

where $d$ is the system dimension, and $N_{\text{ADO}}$ is the number of auxiliary density operators (ADOs) which grows rapidly with hierarchy tier and number of bath terms.

Instead of assembling HEOMLS into a single large sparse matrix, the lazy operator feature represents it as a sum of Kronecker products:

\[\hat{\mathcal{M}} = \sum_i A_i \otimes B_i,\]

where $B_i$ represent the system coupling superoperators, and $A_i$ encodes the hierarchy (ADO) connectivity pattern and corresponding prefix values of $B_i$. In this case, the space complexity (memory allocation) has been reduced to

\[\mathcal{O}(N_{\text{ADO}}^2 + d^4),\]

which significantly lowers the memory requirements for high-dimensional simulations. The key insight is that the $B_i$ (system coupling superoperator) matrices are small (size $d^2 \times d^2$) and shared across different connections (coupling) between ADOs, while only the $A_i$ matrices (hierarchy connectivity patterns) need to scale with $N_{\text{ADO}}^2$.

Memory savings scale with the number of ADOs, making this especially beneficial for:

  • High hierarchy tiers
  • Multiple bath terms
  • Large system dimensions
  • Long-time dynamics requiring high accuracy

The assemble keyword argument

The assemble keyword argument controls how the HEOMLS matrix is constructed. It is available in:

assemble = Val(:full) - Full Sparse Matrix (Default)

Assembles the complete HEOMLS matrix as a single sparse matrix. This is the default behavior for backward compatibility.

L_full = M_Boson(Hsys, tier, bath; assemble = Val(:full))

assemble = Val(:combine) - Combined Lazy Operators (Recommended)

Combines terms with identical system coupling superoperators ($B_i$) but does not materialize the full matrix. This provides excellent memory savings while maintaining computational efficiency. The lazy representation can leverage matrix-matrix multiplication patterns, which are more parallelizable than the sparse matrix-vector products used by full assembly.

L_combine = M_Boson(Hsys, tier, bath; assemble = Val(:combine))

Recommended for most applications, especially memory-constrained devices.

assemble = Val(:none) - Individual Lazy Operators (For Sparsity Analysis)

Keeps all tensor product terms separate without any combination. This mode actually uses more memory than Val(:combine) because Val(:combine) reduces the number of terms in the SciMLOperators.AddedOperator by grouping terms with identical system coupling superoperators. However, Val(:none) provides flexibility for investigating sparsity patterns and analyzing the structure of individual tensor product terms.

L_none = M_Boson(Hsys, tier, bath; assemble = Val(:none))

Demonstrating Memory Savings

# spin-boson model
Hsys = 1.0 * sigmaz() + 0.25 * sigmax()

Δ  = 0.1 # coupling strength
W  = 0.2 # band-width  of the environment
kT = 0.5 # the product of the Boltzmann constant k and the absolute temperature T
N  = 10  # Number of exponential terms
baths = [
    Boson_DrudeLorentz_Pade(sigmax(), 0.01, 0.5, 0.5, 3),
    Boson_DrudeLorentz_Pade(sigmax(), 0.01, 0.5, 0.5, 3),
]

# Create matrices with different assembly modes
tier = 10
L_full    = M_Boson(Hsys, tier, baths; assemble=Val(:full),    verbose=false)
L_combine = M_Boson(Hsys, tier, baths; assemble=Val(:combine), verbose=false)
L_none    = M_Boson(Hsys, tier, baths; assemble=Val(:none),    verbose=false)

Use Base.summarysize to measure and compare memory usage (convert to MB)

mem_full = Base.summarysize(L_full.data) / 1024^2
mem_combine = Base.summarysize(L_combine.data) / 1024^2
mem_none = Base.summarysize(L_none.data) / 1024^2

print(
    "System dimension (d)     : $(size(Hsys, 1))\n",
    "Number of ADOs   (N_ADO) : $(L_full.N)\n",
    "L_full matrix size       : $(size(L_full, 1)) × $(size(L_full, 2))\n",
    "L_full non-zero elements : $(nnz(L_full.data.A))\n",
    "\n",
    "Memory usage:\n",
    "  full    : $(round(mem_full, digits=3)) MB\n",
    "  combine : $(round(mem_combine, digits=3)) MB\n",
    "  none    : $(round(mem_none, digits=3)) MB\n",
    "Memory reduction:\n",
    "  combine vs full : $(round(100 * (1 - mem_combine/mem_full), digits=1))%\n",
    "   none   vs full : $(round(100 * (1 - mem_none/mem_full), digits=1))%\n",
)
System dimension (d)     : 2
Number of ADOs   (N_ADO) : 43758
L_full matrix size       : 175032 × 175032
L_full non-zero elements : 3636774

Memory usage:
  full    : 93.477 MB
  combine : 15.359 MB
  none    : 16.027 MB
Memory reduction:
  combine vs full : 83.6%
   none   vs full : 82.9%

In this case, you might see memory reductions of 80-90% with lazy operators. The savings increase dramatically with higher tiers and more ADOs.