如何告诉GCC一个指针参数始终是双字对齐?指针、参数、GCC、是双字

2023-09-11 07:42:43 作者:丨极速灬巅峰彡丨

在我的节目,我有一个函数,做一个简单的向量加法 C [0:15] = A [0:15] + B [0:15] 。该函数原型为:

In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]. The function prototype is:

void vecadd(float * restrict a, float * restrict b, float * restrict c);

在我们的32位嵌入式架构有负载的加载/存储选项/存储双字,如:

On our 32-bit embedded architecture there is a load/store option of loading/storing double words, like:

r16 = 0x4000  ;
strd r0,[r16] ; stores r0 in [0x4000] and r1 in [0x4004]

GCC的优化识别环路的矢量性质并产生的code两个分支 - 一个用于其中3数组是双字对齐的情况下(因此它使用双加载/存储指令),而另一该阵列是字对齐(它使用单加载/存储选件)的情况下。

The GCC optimizer recognizes the vector nature of the loop and generates two branches of the code - one for the case where the 3 arrays are double word aligned (so it uses the double load/store instructions) and the other for the case that the arrays are word-aligned (where it uses the single load/store option).

的问题是,该地址对齐检查是昂贵相对于添加部分,我想通过暗示编译器对a,b和c是始终8对齐,以消除它。是否有修改器添加到指针声明把这事告诉编译器?

The problem is that the address alignment check is costly relative to the addition part and I want to eliminate it by hinting the compiler that a, b and c are always 8-aligned. Is there a modifier to add to the pointer declaration to tell this to the compiler?

这被用于调用该函数的阵列已对准的(8)的属性,但它不反映在C本身的功能$ C $。是否有可能将此属性添加到函数的参数?

The arrays that are used for calling this function have the aligned(8) attribute, but it is not reflected in the function code itself. is it possible to add this attribute to the function parameters?

推荐答案

下面我已经在我的系统中发现了一块例如code,我尝试以下解决方案,它包含的思想从几给出的答案早期:基本上,创建一个小阵花车有64位类型的工会 - 在这种情况下花车的SIMD向量 - 和调用函数与投操作浮动阵列:

Following a piece of example code I've found on my system, I tried the following solution, which incorporate ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:

typedef float f2 __attribute__((vector_size(8)));
typedef union { f2 v; float f[2]; } simdfu;

void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);

float a[16] __attribute__((aligned(8)));
float b[16] __attribute__((aligned(8)));
float c[16] __attribute__((aligned(8)));

int main()
{
    vecadd((f2 *) a, (f2 *) b, (f2 *) c);
    return 0;
}

现在的编译器不产生4对齐的分支。

Now the compiler does not generate the 4-aligned branch.

不过, __ builtin_assume_aligned()将是preferable解决方案,preventing演员和可能出现的副作用,如果它只是工作...

However, the __builtin_assume_aligned() would be the preferable solution, preventing the cast and possible side effects, if it only worked...

编辑:我注意到,内置函数实际上是越野车在我们的实现(即,不仅是它不工作,但后来它会导致在code计算错误

I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.