## Description

Generic Linear Algebra Subprograms

## Package Information

Version | 0.0.6 (2016-Dec-16) |

Repository | https://github.com/libmir/mir-glas |

License | BSL-1.0 |

Copyright | Copyright © 2016-, Ilya Yaroshenko |

Authors | Ilya Yaroshenko |

Registered by | Ilya Yaroshenko |

Dependencies | none |

## Installation

To use this package, put the following dependency into your project's dependencies section:

## Readme

## glas

LLVM-accelerated Generic Linear Algebra Subprograms (GLAS)

### Description

GLAS is a C library written in Dlang. No C++/D runtime is required but libc, which is available everywhere.

The library provides

- BLAS (Basic Linear Algebra Subprograms) API.
- GLAS (Generic Linear Algebra Subprograms) API.

CBLAS API can be provided by linking with Natlib's CBLAS library.

### dub

GLAS can be used with DMD and LDC but
LDC (LLVM D Compiler) >= `1.1.0 beta 6`

should be installed in common path anyway.

GLAS can be included automatically in a project using dub (the D package manager). DUB will build GLAS and CPUID manually with LDC.

```
{
...
"dependencies": {
"mir-glas": "~><current_mir-glas_version>",
"mir-cpuid": "~><current_mir-cpuid_version>"
},
"lflags": ["-L$MIR_GLAS_PACKAGE_DIR", "-L$MIR_CPUID_PACKAGE_DIR"]
}
```

`$MIR_GLAS_PACKAGE_DIR`

and `$MIR_CPUID_PACKAGE_DIR`

will be replaced automatically by DUB to appropriate directories.

### Usage

`mir-glas`

can be used like a common C library. It should be linked with `mir-cpuid`

.
A compiler, for example GCC, may require `mir-cpuid`

to be passed after `mir-glas`

: `-lmir-glas -lmir-cpuid`

.

#### GLAS API and Documentation

Documentation can be found at http://docs.glas.dlang.io/.

GLAS API is based on `ndslice`

.
Both `mir.ndslice`

and `std.experimental.ndslice`

are supported.
Other languages can use simple structure definition.
Examples are available for C and for Dlang.

#### Headers

C/C++ headers are located in `include/`

.
D headers are located in `source/`

.

There are two files:

`glas/fortran.h`

/`glas/fortran.d`

- for Netilb's BLAS API`glas/ndslice.h`

/`glas/ndslice.d`

- for GLAS API

### Manual Compilation

##### Compiler installation

LDC (LLVM D Compiler) >= `1.1.0 beta 6`

is required to build a project.
`1.1.0`

version is not released yet.
You may want to build LDC from source or use LDC 1.1.0 beta 6.
Beta 2 generates a lot of warnings that can be ignored. Beta 3 is not supported.

LDC binaries contains two compilers: ldc2 and ldmd2. It is recommended to use ldmd2 with mir-glas.

Recent LDC packages come with the dub package manager. dub is used to build the project.

##### Mir CPUID

Mir CPUID is CPU Identification Routines.

Download `mir-cpuid`

```
dub fetch mir-cpuid --cache=local
```

Change the directory

```
cd mir-cpuid-<current-mir-cpuid-version>/mir-cpuid
```

Build `mir-cpuid`

```
dub build --build=release-nobounds --compiler=ldmd2 --build-mode=singleFile --parallel --force
```

You may need to add `--arch=x86_64`

, if you use windows.

Copy `libmir-cpuid.a`

to your project or add its directory to the library path.

##### Mir GLAS

Download `mir-glas`

```
dub fetch mir-glas --cache=local
```

Change the directory

```
cd mir-glas-<current-mir-glas-version>/mir-glas
```

Build `mir-glas`

```
dub build --config=static --build=target-native --compiler=ldmd2 --build-mode=singleFile --parallel --force
```

You may need to add `--arch=x86_64`

if you use windows.

Copy `libmir-glas.a`

to your project or add its directory to the library path.

### Status

We are open for contributing! The hardest part (GEMM) is already implemented.

- [x] CI testing with Netlib's CBLAS test suite.
- [ ] CI testing with Netlib's LAPACKE test suite.
- [ ] Multi-threading
- [ ] GPU back-end
- [ ] Shared library support - requires only DUB configuration fixes.
- [ ] Level 3 - matrix-matrix operations
- [x] GEMM - matrix matrix multiply
- [x] SYMM, HEMM - symmetric / hermitian matrix matrix multiply
- [ ] SYRK, HERK, SYR2K, HER2K - symmetric / hermitian rank-k / rank-2k update to a matrix
- [ ] TRMM - triangular matrix matrix multiply
- [ ] TRSM - solving triangular matrix with multiple right hand sides
- [ ] Level 2 - matrix-vector operations
- [ ] GEMV - matrix vector multiply
- [ ] GBMV - banded matrix vector multiply
- [ ] HEMV - hermitian matrix vector multiply
- [ ] HBMV - hermitian banded matrix vector multiply
- [ ] HPMV - hermitian packed matrix vector multiply
- [ ] TRMV - triangular matrix vector multiply
- [ ] TBMV - triangular banded matrix vector multiply
- [ ] TPMV - triangular packed matrix vector multiply
- [ ] TRSV - solving triangular matrix problems
- [ ] TBSV - solving triangular banded matrix problems
- [ ] TPSV - solving triangular packed matrix problems
- [ ] GERU - performs the rank 1 operation
`A := alpha*x*y' + A`

- [ ] GERC - performs the rank 1 operation
`A := alpha*x*conjg( y' ) + A`

- [ ] HER - hermitian rank 1 operation
`A := alpha*x*conjg(x') + A`

- [ ] HPR - hermitian packed rank 1 operation
`A := alpha*x*conjg( x' ) + A`

- [ ] HER2 - hermitian rank 2 operation
- [ ] HPR2 - hermitian packed rank 2 operation
- [ ] Level 1 - vector-vector and scalar operations. Note: Mir already provides generic implementation.
- [ ] ROTG - setup Givens rotation
- [ ] ROTMG - setup modified Givens rotation
- [ ] ROT - apply Givens rotation
- [ ] ROTM - apply modified Givens rotation
- [ ] SWAP - swap x and y
- [x] SCAL -
`x = a*x`

. Note: requires addition optimization for complex numbers. - [ ] COPY - copy x into y
- [ ] AXPY -
`y = a*x + y`

- [ ] DOT - dot product
- [ ] NRM2 - Euclidean norm
- [ ] ASUM - sum of absolute values
- [ ] IAMAX - index of max abs value

### Porting to a new target

Five steps

- Implement
`cpuid_init`

function for`mir-cpuid`

. This function should be implemented per platform or OS. Already implemented targets are

- x86, any OS
- x86_64, any OS

- Verify that source/glas/internal/memory.d contains an implementation for the OS. Already implemented targets are

- Posix (Linux, macOS, and others)
- Windows

- Add new configuration for register blocking to source/glas/internal/config.d. Already implemented configuration available for

- x87
- SSE2
- AVX / AVX2
- AVX512 (requires LLVM bug fixes).

- Create a Pool Request.
- Coordinate with LDC team in case of compiler bugs.

### Questions & Answers

##### Why GLAS is called "Generic ..."?

- GLAS has a generic internal implementation, which can be easily portable to any other architecture with minimal efforts (5 minutes).
- GLAS API provides more functionality comparing with BLAS.
- It is written in Dlang using generic programming.

##### Why it is better then other BLAS Open Source Libraries like OpenBLAS and Eigen?

- GLAS is faster.
- GLAS API is more user-friendly and does not require additional data copying.
- GLAS does not require C++ runtime comparing with Eigen.
- GLAS does not require platform specific optimizations like Eigen intrinsics micro kernels and OpenBLAS assembler macro kernels.
- GLAS has a simple implementation, which can be easily ported and extended.

##### Why GLAS does not have Lazy Evaluation and Aliasing like Eigen?

GLAS is a lower level library than Eigen. For example, GLAS can be an Eigen BLAS back-end in the future
Lazy Evaluation and Aliasing can be easily implemented in D.
Explicit composition of operations can be done using mir.ndslice.algorithm and `std.experimental.ndslice`

(>=2.072). `mapSlice`

, which is a generic way to perform any lazy operations you want.

## Available versions

*0.0.6*0.0.5 0.0.4 0.0.3 ~master ~simpl ~newnd ~9il-patch-1-1 ~9il-patch-1