# The linear solver

The linear solver implemented in FEAT2 is designed as a black-box preconditioning approach. It covers single-grid as well as multigrid solvers, and iterative solvers can arbitrarily be combined with other solvers as preconditioners. Boundary conditions that stem from a finite element discretisation can be imposed via so-called "filter techniques", i.e., defect vectors are filtered at specific steps of the algorithms to receive the boundary conditions.

To give an introduction to the solver library, one first has to realise a very basic fact: *All linear solvers can be expressed as preconditioners.*

## The preconditioner approach

Let an arbitrary linear system be given, expressed as $$Ax=b,$$ for a matrix $A\in\mathbb{R}^{n\times n}$ and vectors $x$, $b\in\mathbb{R^n}$. Furthermore, let $P:\mathbb{R}^{n}\to \mathbb{R}^{n}$ be some mapping in the $\mathbb{R}^{n}$. If there is $P=A$, a solution $x^{*}$ of the linear system can be calculated by one step of a defect correction loop: $$x^{*} := x + P^{-1}(b-Ax),$$ or equivalently in three steps,

$$\begin{align} d :=& b-Ax \\ P y =& d \\ x^{*} :=& x + y \end{align}$$

for a *defect* vector $d\in\mathbb{R}^{n}$ and a *correction* vector $y\in\mathbb{R}^{n}$. At this point there are some things to note:

The system $P y = d$ is a "homogeneous" linear system. Assume, that there is an index set $I\subset\{1,...,n\}$ identifying the degrees of freedom on the boundary, and that $$x_i = b_i = g_i \qquad\text{for all $i\in I$}$$ for a vector $g\in\mathbb{R^n}$ which encapsules the degrees of freedom on the boundary. The rows $I$ in $A$ replaced by unit vectors. (This is the usual implementation of Dirichlet boundary conditions in FEAT2.) Then, there is necessarily $d_i=0$ in the defect vector for all $i\in I$, and the correction vector $y$ has to fulfil $y_i=0$ for all $i\in I$ as well.

The mapping $P^{-1}:d\mapsto y$ is a

*preconditioner*. It is applied to a defect vector $d$ and provides a correction vector $y$ for the solution. It only has to deal with "homogeneous" boundary conditions.The mapping $P^{-1}$ can be any approximation to the matrix $A^{-1}$. In particular, $P^{-1}$ can be expressed by any direct or iterative solver.

The last point needs additional clarification. In the above approach, every type of preconditioner (Jacobi, Gauss-Seidel,...) can be imposed as $P$. On the other hand, every linear solver which is able to solve a linear system $Ax=b$ is also capable of solving the linear system $Py=d$: One provides the matrix $A$ as system matrix and the vector $d$ instead of $b$ as right-hand side. A direct solver will compute $y$, an iterative solver an approximation to $y$.

Remark:The preconditioning system $$Py=d$$ has a very specific style. It has to deal with "homogeneous boundary conditions", i.e., with defect and correction vectors instead of right-hand side and solution vectors $x$ and $b$, respectively. This allows to do nesting with linear solvers.

## Nested solvers

Due to the preconditioning approach, linear solvers can be arbitrarily nested while still keeping a black-box style. The "outer" linear solver is always one step of the above defect correction loop. All other linear solvers are realised as preconditioners in this outer solver -- and thus, all solvers can serve as preconditioners in other linear solvers. Here some examples:

**Jacobi-Solver: DefCorr(Jac)**- A Jacobi solver is a defect correction loop with a Jacobi preconditioner. It is realised in the form $$ y_{n+1} := y_n + \omega D^{-1} (d - A y_n) $$ with $D$ the diagonal of the matrix $A$ and $\omega>0$ a damping parameter. The sequence $y_n$ converges to $y$ which is used to compute the final solution $x$, see above. To be more specific, the solver is realised in three steps: $$\begin{align} \tilde d :=& d-Ay_n \\ P_D \tilde y =& \tilde d \\ y_{n+1} :=& y_n + \omega \tilde y \end{align}$$ with $P_D:=D$ the Jacobi preconditioner.
**DefCorr(4BiCGStab(ILU))**-
This is an example of three solver levels being nested.

The outer iteration is a standard defect correction loop: $$ y_{n+1} := y_n + P_{\text{4BiCGStab}} (d - A y_n) $$

The preconditioner $P_{\text{4BiCGStab}}$ solves a local system $$ P_{\text{4BiCGStab}} \tilde y = (d - A y_n) $$ to get a correction $\tilde y$ for $y_n$. This is done by applying four steps of a BiCGStab algorithm.

BiCGStab however can internally be preconditioned by an arbitrary preconditioner. In this case, the preconditioner is the ILU preconditioner $P_{\text{ILU}}$, which solves a linear system approximately: $$ P_{\text{ILU}} y_{\text{B}} = d_{\text{B}} $$ Here, $d_{\text{B}}$ is a defect vector computed by BiCGStab and $y_{\text{B}}$ the correction vector that BiCGStab is going to use.

## The three solution phases

In FEAT2, solving a linear system always decomposes into three solution phases. These phases are motivated by modern Gauss elimination solvers and proved useful:

Symbolic decomposition

During the symbolic decomposition phase, the matrix entries are not available. Linear solvers can prepare all data which is not bound to the matrix entries. This involves allocation of temporary vectors on the one hand, but also a symbolic factorisation/pattern analysis on the other hand (e.g., for LU decomposition solvers). The symbolic decomposition is usually applied only once and repeated only if the structure of the matrix changes (e.g. due to local or global mesh refinement).

Numeric decomposition

In this phase, matrix entries are available. Numerical algorithms can apply any preparation in-advance which is necessary to be applied to a specific linear system. This involves, in particular, numerical factorisation like the computation of actual entries in a LU decomposition. The numerical decomposition has only to be repeated if the entries of a matrix change.

Solve

In the final phase, the numerical decomposition is applied to actually solve a linear system. This can be a time-consuming task (e.g., for CG algorithms or complex, nested solvers) or can be very fast (e.g. in the LU decomposition, where all the numerical work is done in the numerical decomposition).

## Using the linear solver library - the basics

The linear solver library is encapsuled in the module `linearsolver.f90`

. For setting up a linear system and solving it, one has to apply a specific sequence of commands which is summarised in the following piece of code:

```
call linsol_initXXXX (...) ! Setting up a solver tree
...
call linsol_setMatrix (...) ! Define the system matrix
call linsol_initStructure (...) ! Symbolic decomposition
call linsol_initData (...) ! Numeric decomposition
call linsol_solveAdaptively (...) ! Solve Ax=b
call linsol_doneData (...) ! Clean up numerical data
call linsol_doneStructure (...) ! Clean up symbolical data
call linsol_done (...) ! Release the solver
```

The code decomposes into the following parts:

- With
`linsol_initXXXX`

(with "XXXX" replaced by a name of a solver), a "solver tree" is set up. In the simplest case, a solver tree consists of only one (e.g., Gauss elimination) solver. However, more complicated nesting of solvers is possible by defining a solver as preconditioner of another solver, see below. - The routine
`linsol_setMatrix`

is used to attach a system matrix $A$ from the underlying linear system $Ax=b$ to the solver. - The two commands
`linsol_initStructure`

and`linsol_initData`

apply a symbolic and numeric factorisation of the matrix. They prepare the linear solver for being invoked. - The actual system is solved using
`linsol_solveAdaptively`

. - After solving the system,
`linsol_doneData`

and`linsol_doneStructure`

clean up the data that is allocated/created by the corresponing "init" routines. The final`linsol_done`

releases the linear solver.

**Remarks**:

The solver library is designed to work with tensor product spaces, so all matrices and vectors must be "block matrices" and "block vectors".

The

`linsol_initXXXX`

used for creation of a solver always return a pointer to a solver structure on the heap. Using this technique, a simple`linsol_done()`

at the end will automatically release all solvers and preconditioners in a solver tree automatically.

**Example:** The following example demonsrtates how to set up a Gauss elimination solver and solve a system $Ax=b$, where $A$ is given as a matrix `rmatrix`

, $b$ is given as a vector `rrhs`

and $x$ is given as `rx`

; the additional vector `rd`

is a temporary vector which has to be provided to the linear solver.

```
type(t_matrixBlock) :: rmatrix
type(t_vectorBlock) :: rrhs, rx, rd
type(t_linsolNode), pointer :: p_rlinsol
integer :: ierror
...
! Set up the Gauss elimination solver
call linsol_initUMFPACK4 (p_rsolver)
! Define the matrix A
call linsol_setMatrix (p_rsolver,rmatrix)
! Symbolic factorisation
call linsol_initStructure (p_rsolver,ierror)
if (ierror .ne. LINSOL_ERR_NOERROR) then
call output_line ("Error in symbolic factorisation")
call sys_halt()
end if
! Numeric factorisation
call linsol_initData (p_rsolver,ierror)
if (ierror .ne. LINSOL_ERR_NOERROR) then
call output_line ("Error in numeric factorisation")
call sys_halt()
end if
! Solve
call linsol_solveAdaptively (p_rsolver,rx,rb,rd)
! Clean up
call linsol_doneData (p_rsolver)
call linsol_doneStructure (p_rsolver)
call linsol_done (p_rsolver)
```

In the beginning, the UMFPACK Gauss elimination solver is set up by `linsol_initUMFPACK4`

. After attaching the matrix, the symbolic and numeric factorisation is invoked. These two routines return an error code `ierror`

which is `LINSOL_ERR_NBOERROR`

if everything works well. If not, the program can be aborted, e.g., by calling `sys_halt()`

. Afterwards, the system is solved and the structures are cleaned uo.

## Using the linear solver as a preconditioner

All linear solvers in the library are set up as preconditioners. As a consequence, they are ideally designed to work as a linear preconditioner in a nonlinear loop. This can be done by invoking `linsol_precondDefect`

instead of `linsol_solveAdaptively`

. The routine `linsol_precondDefect`

replaces a given defect by a correction vector obtained by a linear solver.

The following example demonstrates the use of the linear solver in a nonlinear loop. We want to solve a semilinear system of the form $$A(x)x=b.$$ A simple nonlinear loop in this case is the Picard iteration $$x_{n+1} = x_n + P(A(x_n))^{-1} (b-A(x_n)x_n)$$ which we want to realise here.

```
type(t_matrixBlock) :: rmatrix
type(t_vectorBlock) :: rrhs, rx, ry, rd
type(t_linsolNode), pointer :: p_rlinsol
integer :: ierror
...
! Create the matrix structure
...
! Set up the Gauss elimination solver
call linsol_initUMFPACK4 (p_rsolver)
! Define the matrix A
call linsol_setMatrix (p_rsolver,rmatrix)
! Symbolic factorisation
call linsol_initStructure (p_rsolver,ierror)
if (ierror .ne. LINSOL_ERR_NOERROR) then
call output_line ("Error in symbolic factorisation")
call sys_halt()
end if
! Set up the right-hand side rrhs
...
! Initialise solution
call lsysbl_clearVector (rx)
! Nonlinear loop
do while (...) ! not converged
! Compute the matrix A(x_n)
...
! Numeric factorisation
call linsol_initData (p_rsolver,ierror)
if (ierror .ne. LINSOL_ERR_NOERROR) then
call output_line ("Error in numeric factorisation")
call sys_halt()
end if
! Create the nonlinear defect d=b-A(x_n)x_n
call lsysbl_copyVector (rrhs,rd)
call lsysbl_matVec (rmatrix, rx, rd, -1.0_DP, 1.0_DP)
! Preconditioning of the defect; rd is overwritten by y with Py=d
call linsol_precondDefect (p_rsolver,rd)
! Clean up the numerical factorisation
call linsol_doneData (p_rsolver)
! Correction: x = x + y
call lsysbl_vectorLinearComb (rd,rx,1.0_DP,1.0_DP)
end do
! Clean up
call linsol_doneStructure (p_rsolver)
call linsol_done (p_rsolver)
...
```

Notice that embedded in a nonlinear loop, only the numerical factorisation has to be repeated for every nonlinear step as the matrix structure does not change. The symbolic factorisation is done once before the loop which saves approximately 30 percent of the CPU time for the UMFPACK solver.

## Setting up solver trees

Due to trhe fact that all solvers are realised as preconditioners, the solver library offers the possibility to set up an arbitrarily nested tree of solvers and preconditioners. The `linsol_initXXXX`

routines in `linearsolver.f90`

always return a pointer to a solver structure which can be used as preconditioner in another solver (provided that the other solver support preconditioners).

Roughly saying, all iterative solvers (including the multigrid solver) support preconditioners and subsolvers. A preconditioner for a solver can be specified in the initialisation of a solver as additional parameter.

Example:Here an example of aBiCGStab(ILU0)solver, i.e., a BiCGStab solver preconditioned with ILU0:

```
type(t_linsolNode), pointer :: p_rpreconditioner
type(t_linsolNode), pointer :: p_rsolver
call linsol_initILU0(p_rpreconditioner)
call linsol_initBiCGStab(p_rsolver,p_rpreconditioner)
... (Solver is identified by p_rsolver)
call linsol_solveAdaptively (p_rsolver,...)
```

As can be seen, `linsol_initILU0`

creates a solver node `p_rpreconditioner`

which is specified as preconditioner in `linsol_initBiCGStab`

.`linsol_initBiCGStab`

creates the actual solver node `p_rsolver`

.

Example:The following example creates a solverDefCorr(4 x BiCGStab(ILU0)). There is an outer defect correction loop with an inner BiCGSTab algorthm, applying exactly four iterations. BiCGStab is preconditioned by ILU0:

```
type(t_linsolNode), pointer :: p_rprecILU, p_rprecBiCGStab
type(t_linsolNode), pointer :: p_rsolver
call linsol_initILU0(p_rprecILU)
call linsol_initBiCGStab(p_rprecBiCGStab,p_rprecILU)
p_rprecBiCGStab%nminIterations = 4
p_rprecBiCGStab%nmaxIterations = 4
call linsol_initDefCorr(p_rsolver,p_rprecBiCGStab)
... (Solver is identified by p_rsolver)
call linsol_solveAdaptively (p_rsolver,...)
```

Again, every `linsol_initXXXX`

creates a `t_linsolNode`

strucure which can be used in another solver as preconditioner: `p_rprecILU`

is used as precondoitioner for BiCGStab, `p_rprecBiCGStab`

is used as preconditioner for the actual solver `p_rsolver`

. So enforce BiCGSTab applying exactly four iterations, `nminIterations`

and `nmaxIterations`

in the structure `p_rprecBiCGStab`

are set to "4".

## Basic parameters in the solver structure

THe solver structure `t_linsolNode`

contains a set of parameters with default values which can be modified by the user. The following list gives a brief overview about the most important parameters in this structure:

Parameter | Default | Description |
---|---|---|

ioutputLevel | 0 | Output level. =-1: no output =0: no output except for warning messages =1: basic output =2: extended output |

nminIterations | 1 | Minimum number of iterations |

nmaxIterations | 50 | Maximum number of iterations |

iresNorm | LINALG_NORML2 | Type of norm to use for residuals. Default is $||\cdot||_{l_2}$. |

istoppingCriterion | LINSOL_STOP_STANDARD | Type of stopping criterion to use. =LINSOL_STOP_STANDARD: Stop if relative (depsRel) and absolute (depsAbs) stopping criterion hold. =LINSOL_STOP_ONEOF: Stop if either the relative (depsRel) or the absolute (depsAbs) stopping criterion holds. |

depsRel | 1E-5 | Relative stopping criterion. Stop if $||\text{defect}|| < \text{depsRel} ||\text{initial defect}||$. =0.0: deactivated. |

depsAbs | 1E-5 | Absolute stopping criterion. Stop if $||\text{defect}|| < \text{depsAbs}$. =0.0: deactivated. |

depsDiff | 0.0 (deactivated) | Defect improvement stopping criterion. Stop if $||\text{defect}_{\text{new}}|| - ||\text{defect}_{\text{old}}|| < \text{depsDiff} ||\text{defect}_{\text{old}}||$. =0.0: deactivated. |

ddivRel | 1E6 | Relative divergence criterion. Stop if $||\text{defect}|| > \text{ddivRel} ||\text{initial defect}||$. =SYS_INFINITY: deactivated. |

depsAbs | SYS_INFINITY | Absolute divergence criterion. Stop if $||\text{defect}|| > \text{ddivAbs}$. =SYS_INFINITY: deactivated. |

drhsZero | 1E-90 | RHS-vector is treated as zero if $||\text{rhs}|| < \text{drhsZero}$. |

domega | 1.0 | Solver dependent damping parameter. Most solvers use this parameter as a final damping, realising the mapping $d \mapsto \omega P^{-1}d$. (I.e., they scale their return value by $\omega$.)Defect correction: Uses the parameter as damping parameter: $x_{n+1} = x_n + \omega P^{-1} (b-Ax_n)$ |

niteAsymptoticCVR | 0 | Numer of iterations to measure the asymptotic convergence rate from. =0: deactivate |

## Basic return values

The linear solver is invoked by

```
call linsol_solveAdaptively (p_rsolver,...)
```

or

```
call linsol_precondDefect (p_rsolver,...)
```

for a `t_linsolNode`

structure `p_rsolverNode`

, depending on whether a linear system is to be solved or a defect is to be preconditioned. The `t_linsolNode`

contains a set of status variables which describe the result of the iteration:

Variable | Description |
---|---|

iresult | =0: Success =1: iteration broke down, diverging =2: error in the parameters < 0: algorithm-specific error |

iiterations | Number of iterations, the algorithm needed |

dinitialDefect | Initial defect |

dfinalDefect | Final defect |

dconvergenceRate | Convergence rate |

dasymptoticConvergenceRate | Asymptotic convergence rate (computed from the last "niteAsymptoticCVR" steps) |

dtimeTotal | Total computation time |

dtimeInitStruct | Time for structural decomposition |

dtimeInitData | Time for numerical decomposition |

Remark:The user should be aware of the fact that not all parameters are used all the time for all solvers, and not all solvers provide meaningful return values in all return variables. Which parameters in a solver structure are used and which not depend on the actual solver. For example, the UMFPACK Gauss elimination solver does not use the number of minimum/maximum iterations and does not return the initial and final defect or the number of iterations (always set to 1). An iterative solver on the other hand usually uses these parameters and returns meaningful values here. The only variable which is guaranteed to be set by all solvers is`iresult`

.

## Extended parameters in the solver structure

Depending on the type of the solver, there can be found additional parameters in the solver structure which are algorithm dependent. Such "special" parameters are accessible via algorithm dependent "subnode" structures in the solver structure which are only valid for the appropriate solvers. For example, the following list gives a brief overview which substructures in the solver structure `t_linsolNode`

are active for which algorithm:

Algorithm | Substructure |
---|---|

Defect correction | (t_linsolNode)%p_rsubnodeDefCorr |

SOR | (t_linsolNode)%p_rsubnodeSOR |

CG | (t_linsolNode)%p_rsubnodeCG |

GMRES | (t_linsolNode)%p_rsubnodeGMRES |

BiCGStab | (t_linsolNode)%p_rsubnodeBiCGStab |

UMFPACK | (t_linsolNode)%p_rsubnodeUMFPACK4 |

Multigrid-2 | (t_linsolNode)%p_rsubnodeMultigrid2 |

... | ... |

After the creation of a solver node, the user can tweak some algorithm specific parameters in these structures if necessary - although the parmeters can usually be left at their default values. Here some examples what can be done with such "special" parameters:

Example:The following example creates an SOR algorithm and modifies the relaxation parameter.

```
...
call linsol_initSOR(p_rsolver,1.0_DP)
! Change the parameter from 1.0 to 1.2
p_rsolver%p_rsubnodeSOR%drelax = 1.2_DP
...
```

Example:The following example creates an UMFPACK solver and tells UMFPACK to write out the final matrix in human readable form to a text file on the hard disc before solving the system. (Good for debugging! :-) )

```
...
call linsol_initUMFPACK4(p_rsolver)
p_rsolver%p_rsubnodeUMFPACK4%imatrixDebugOutput = 1
p_rsolver%p_rsubnodeUMFPACK4%smatrixName = "mymatrix.txt"
...
```

Example:The following example creates a Multigrid solver and changes the cycle to W-cycle. At the end, the system is solved and the total time for smoothing is extracted from the structure.

```
...
call linsol_initMultigrid2(p_rsolver,nlevels)
p_rsolver%p_rsubnodeMultigrid2%icycle = 2 ! W-cycle
...
call linsol_solveAdaptively(p_rsolver,...)
dtimeSmoothing = p_rsolver%p_rsubnodeMultigrid2%dtimePreSmooth + &
p_rsolver%p_rsubnodeMultigrid2%dtimePostSmooth
```