intel-intrinsics 1.2.7

The most practical D SIMD solution! Using SIMD intrinsics with Intel syntax with D.

To use this package, run the following command in your project's root directory:


Travis Status

The DUB package intel-intrinsics implements Intel intrinsics for D.

intel-intrinsics lets you use x86 SIMD in D with support for LDC / DMD / GDC with a single syntax and API.

    "intel-intrinsics": "~>1.0"


SIMD intrinsics with _mm_ prefix

MMXYes but slow (#16)YesYes (slow in 32-bit)
SSEYes but slow (#16)YesYes (slow in 32-bit)
SSE2Yes but slow (#16)YesYes (slow in 32-bit)
SSE3Yes but slow (#16)Yes (use -mattr=+sse3)Yes but slow (#39)

The intrinsics implemented follow the syntax and semantics at:

The philosophy (and guarantee) of intel-intrinsics is:

  • When using LDC, intel-intrinsics should generate optimal code else it's a bug.
  • No promise that the exact instruction is generated, because it's not always the fastest thing to do.
  • Guarantee that the semantics of the intrinsic is preserved, above all other consideration.

SIMD types

intel-intrinsics define the following types whatever the compiler:

long1, float2, int2, short4, byte8, float4, int4, double2

though most of the time you would deal with

alias __m128 = float4; 
alias __m128i = int4; // and you can rely on __m128i being int4
alias __m128d = double2;
alias __m64 = long1;

Vector Operators for all

intel-intrinsics implements Vector Operators for compilers that don't have __vector support (DMD with 32-bit x86 target).


__m128 add_4x_floats(__m128 a, __m128 b)
    return a + b;

is the same as:

__m128 add_4x_floats(__m128 a, __m128 b)
    return _mm_add_ps(a, b);

See available operators...

Individual element access

It is recommended to do it in that way for maximum portability:

__m128i A;

// recommended portable way to set a single SIMD element
A.ptr[0] = 42; 

// recommended portable way to get a single SIMD element
int elem = A.array[0];

Why intel-intrinsics?

  • Portability It just works the same for DMD, LDC, and GDC.

  • Capabilities Some instructions just aren't accessible using core.simd and ldc.simd capabilities. For example: pmaddwd which is so important in digital video. Some instructions need an almost exact sequence of LLVM IR to get generated. ldc.intrinsics is a moving target and you need a layer on top of it.

  • Familiarity Intel intrinsic syntax is more familiar to C and C++ programmers. The Intel intrinsics names aren't good, but they are known identifiers. The problem with introducing new names is that you need hundreds of new identifiers.

  • Documentation There is a convenient online guide provided by Intel: Without this Intel documentation, it's much more difficult to write sizeable SIMD code.

Notable difference vs C/C++ or core.simd

When using intel-intrinsics, every implicit conversion of similarly-sized vectors should be done with a cast instead.

__m128i b = _mm_set1_epi32(42);
__m128 a = b;             // NO, only works in LDC
__m128 a = cast(__m128)b; // YES, works in all D compilers

This is because D does not allow user-defined implicit conversions, and core.simd might be emulated (DMD). Use this cast, or your code won't work in every D compiler variation.

Who is using it?

Video introduction

In this DConf 2019 talk, Auburn Sounds:

  • introduces how intel-intrinsicscame to be,
  • demonstrates a 3.5x speed-up for some particular loops,
  • reminds that normal D code can be really fast and intrinsics might harm performance

See the talk: intel-intrinsics: Not intrinsically about intrinsics

1.2.7 2020-Sep-14
1.2.6 2020-Jul-13
1.2.5 2020-Apr-12
1.2.4 2020-Apr-10
1.2.3 2020-Apr-09
Show all 46 versions
Download Stats:
  • 248 downloads today

  • 1218 downloads this week

  • 4858 downloads this month

  • 51749 downloads total

Short URL: