apulSoft Blog
Motivation
When working with FFTs and phase, conversions between complex phase responses and magnitude/phase responses are sometimes unavoidable. This leads to calculating lots of sine and cosine values of phase angles. I keep looking for the fastest solutions to be used with large buffers and SIMD. Until recently,
Working on a new plugin that requires even more of these calls than previous ones, I found the Sleef call to be the main performance bottleneck in my code. And I do not need 3.5 ulp of precision and require no handling of special cases like inf or nan.
Optimized Approximation
Looking at the Sleef code, I saw that there is code to take advantage of the symmetries of sine and cosine, which leads to branches/selects in the code. Doing this in SIMD routines means doing double the work and selectively throwing half of the results away = suboptimal performance.
Looking for a truly branchless solution, I stumbled upon this very nice blog post by Colin at mooooo: https://mooooo.ooo/chebyshev-sine-approximation/
It contains Rust code to calculate $sin(x)$ to 5 ulp precision by approximating a full period of $sin(x)$, which means no branches. Colin found a nice way to solve the floating point relative error issue that happens when approximating functions that intersect the x-axis. Instead of approximating $sin(x)$ using Chebyshev polynomials, he approximates $sin(x)/((x - pi)(x + pi))$. This avoids going through zero at $+-pi$, and still keeps the function to approximate symmetrical. This is very beneficial, as then half the polynomial coefficients become zero.
I found the same idea can be applied to $cos(x)$ by approximating $cos(x)/((x - 0.5pi)(x + 0.5pi))$.
I recently wrote a useful class to create Chebyshev-Polynomial-Approximations (will share in the next blog post). As I do not need as much precision as Colin, I made my own 4th-order polynomials for sin and cos. Here are plots of their relative errors:
Green: sin(x), cos(x)
Red: relative error * $10^6$


c++ Implementation
Below is the fully branchless c++ sincos() for float values between $-pi$ and $pi$ that I use with SIMD types (with fma). The $(x-a)(x+a)$ terms could of course be mathematically simplified to $x^2-a^2$, but that would work against the increased precision near the $y=0$ spots thanks to Colin's factorization. I found this to be faster than various implementations that try to share more work between $sin(x)$ and $cos(x)$. Using Estrin's scheme leads to operations that can be pipelined nicely.
// could use std::fma if faster - depends on cpu/compiler configuration
inline float _fma(float a, float b, float c) { return a * b + c; }
// Approximate sin(x) and cos(x) between -pi and pi
// relative err |f(x)/sin(x) - 1|
// sin x: 1.32e-6 near 0
// cos x: 2.07e-6 at +-2.99
void approx_sincos (float x, float &si, float &co){
constexpr float s0 = -0.10132104963779f; // x
constexpr float s1 = 0.00662060857089096f; // x^3
constexpr float s2 = -0.000173351320734045f; // x^5
constexpr float s3 = 2.48668816803878e-06f; // x^7
constexpr float s4 = -1.97103310997063e-08f; // x^9
constexpr float c0 = -0.405284410277645f; // 1
constexpr float c1 = 0.0383849982168558f; // x^2
constexpr float c2 = -0.00132798793179218f; // x^4
constexpr float c3 = 2.37446117208029e-05f; // x^6
constexpr float c4 = -2.23984068352572e-07f; // x^8
constexpr float pi = 3.1415927410125732f;
constexpr float halfPi = 1.570796326794897f;
auto x2 = x * x;
// evaluate two 4th-order polynomials of (x^2) using estrin's scheme.
auto x4 = x2 * x2;
auto x8 = x4 * x4;
auto poly1 = _fma(x8, s4, _fma(x4, _fma(s3, x2, s2), _fma(s1, x2, s0)));
auto poly2 = _fma(x8, c4, _fma(x4, _fma(c3, x2, c2), _fma(c1, x2, c0)));
si = (x - pi) * (x + pi) * x * poly1;
co = (x - halfPi) * (x + halfPi) * poly2;
}
Comments
Latest Blog Posts
May 7th, 2025Branchless sincos() float Approximation
Jan 1st, 2025Headphone target curves using AutoEq & apQualizr2
Aug 21st, 2024Windows .msi Plugin Installer using WiX v5
Jun 1st, 2024Matched Digital Allpass IIR Filters
Tags
math(8)filters(4)
dsp(4)
cpp(4)
blog(2)
optimization(2)
webdesign(1)
gui(1)
macOS(1)
apqualizr2(1)
lessonslearned(1)
Windows(1)
WiX(1)