## apulSoft Blog

I wrote a c++ filter library to design the various filters used in the apulSoft plugins. It uses a lot of `std::complex` math functions when designing filters using complex poles and zeros. During profiling on OSX I found slowness and precision issues in the implementation of `std::sqrt()` for `std::complex<double>`. I looked up the implementation inside libc++ used by LLVM/Clang on OSX here.

This is a copy of the code found:

```
template<class _Tp>
complex<_Tp>
sqrt(const complex<_Tp>& __x)
{
if (__libcpp_isinf_or_builtin(__x.imag()))
return complex<_Tp>(_Tp(INFINITY), __x.imag());
if (__libcpp_isinf_or_builtin(__x.real()))
{
if (__x.real() > _Tp(0))
return complex<_Tp>(__x.real(), __libcpp_isnan_or_builtin(__x.imag()) ? __x.imag() : copysign(_Tp(0), __x.imag()));
return complex<_Tp>(__libcpp_isnan_or_builtin(__x.imag()) ? __x.imag() : _Tp(0), copysign(__x.real(), __x.imag()));
}
return polar(sqrt(abs(__x)), arg(__x) / _Tp(2));
}
```

It's mostly a call to `std::polar()` which comes from the polar way of determining a complex square root. It calls `atan2()` (via `arg()`), `sin()` and `cos()`. If `-ffast-math` is enabled, it uses `sincos()`. LLVM/Clang on OSX uses library functions to evaluate trigonometry which slows things down and means the complex square root cannot be fully inlined. The whole thing is a mess because some CPU models have hardware trigonometry instructions that are slower than scalar polynomial approximations. Maybe that's why Clang on OSX uses the library calls - to use the fastest code depending on the CPU.

To make matters worse, precision can be problematic:

```
#include <iostream>
#include <complex>
using namespace std;
int main() {
cout << sqrt(complex<double>(-1)) << endl;
}
```

If I run this in XCode 12, I get `(6.12323e-17,1)`. The small error is annoying for filter math as it makes the number appear not pure imaginary.

So I had to come up with a replacement which avoids using `std::polar()`.
For the complex value $c=a+ib$, the components of the root $sqrt(c) = x + iy$ can be calculated like this:

$c = sqrt(c)^2 = (x+iy)^2 = x^2 - y^2 + i*2xy = a + ib $

$=>{(a,=,x^2 - y^2),(b,=,2xy):}$

${(4x^4 - 4ax^2 - b^2,=,0),(4y^4 + 4ay^2 - b^2,=,0):}$

$=> {(x,=,+-sqrt(+-0.5(sqrt(a^2+b^2) + a))), (y,=,+-sqrt(+-0.5(sqrt(a^2+b^2) - a))):}$

We are interested in the real-positive root of $c$, so $x$ has to be real and positive. From $b=xy$ follows that $y$ needs to have the same sign as $b$. With $r = |c|$ the final formula is:

${(x,=,sqrt(0.5(r + a))), (y,=,sqrt(0.5(r - a))*sgn(b)):}$

$r$ is of course always positive and larger than $a$, therefore the square roots are always real. The entire process requires calculating 3 square roots and a few multiplications and additions, surely that's going to be faster than the `atan2()` and `sincos()` of the `libc++` implementation.

I saw another library implementation using just two square roots and a division, but prefer the presented method as it doesn't have to deal with special cases (division by zero). Less conditional code means more options for the compiler to rearrange and optimize inlined code.

The resulting process also looks quite symmetrical for the real and imaginary parts of c. I like this as I hope to write a complex library for 64-bit float precision complex numbers that does everything using 128-bit SIMD registers in the future. This method should work just fine for that, and maybe a smart compiler can vectorize it already.

```
#include <complex>
template<typename T>
inline std::complex<T> sqrt_notrig(std::complex<T> c) noexcept {
T a = c.real();
T b = c.imag();
T r = std::abs(c);
return {std::sqrt(T(0.5)*(r + a)), std::copysign(std::sqrt(T(0.5)*(r - a)), b)};
}
```

View this routine on the godbolt.org compiler explorer here. Unfortunately, `sqrtpd` (2x64-bit double square root SSE2 instruction) doesn't seem to be produced by any compiler, let me know if you get one to emit it.

Bonus content: a branchless routine to calculate the primary complex root of any real value:

```
template<typename T>
inline std::complex<T> sqrt_to_complex(T x) noexcept {
T r = T(0.5)*std::sqrt(std::fabs(x));
T a = std::copysign(r, x);
return {r + a, r - a};
}
```

##### Comments

##### Latest Blog Posts

Oct 27th, 2022Clang-Format Xcode Automator Quick Action

Mar 16th, 2022Feb 19th, 2022Mar 17th, 2021##### Tags

math(6)dsp(3)

cpp(3)

filters(2)

blog(2)

webdesign(1)

gui(1)

macOS(1)

optimization(1)

lessonslearned(1)