Page 1 of 1

Could this technique be faster ?

PostPosted: Wed Apr 13, 2022 3:42 pm
by Tepeix
Hi,

What do you think of this technique ? Could it be faster ?
Here's an example with a little sin osc code.

Normally we must load 6 variables every cycle.
But there's 2 float and 2 int that have the same value for all SSE channels.

The idea is to use the SSE to store the 2 float an 2 int in 1 float and 1 int in stage0.
So we could load 2 variable instead of 4.
Then in stage 2 we have to copy to another register and shufps. But i suppose this could be faster ?

Well in this example it does not change so much but maybe with code that need more variable ?
Also, do you think that the aliasing is ok with this sin approximation ?

Thanks for any response !)

Re: Could this technique be faster ?

PostPosted: Wed Apr 13, 2022 6:52 pm
by Tepeix
I upgraded a little. Specially for the triple sin.

But taking the risk of a int/float confusion.. Is it possibly dangerous ????

That was not so much sense to extract the variable at every use..
Now the five variable are extracted once, (letting only 3 left for calculation)

Also at each cycle it have only to read 1 variable to get in the 4 SSE channels 2 float and also 2 int.
So it's a float variable that contain 2 int, (abs and sign) and 2 float (1,0.2222).
That's the int/float confusion think..
There's no conversion, but only shufps extracting.

Re: Could this technique be faster ?

PostPosted: Thu Apr 14, 2022 1:11 pm
by martinvicanek
Note that shufps won't work well in the poly section - the natural habitat of most oscillators.

Re: Could this technique be faster ?

PostPosted: Thu Apr 14, 2022 2:24 pm
by Tepeix
I didn't think about that.
But it seams to be ok in this special case.
The little synth i make work well.

I suppose it's because every shufps here take the same value for every channels.
When there's a 5e voice (or 9, 13,..) it have to load variable as 4 channels.
Even if there's no other's voice to shufps with.
Well maybe the variables must also be fixed and never change..

I would try to do more test but it seams to work well.

Re: Could this technique be faster ?

PostPosted: Thu Apr 14, 2022 7:45 pm
by Tepeix
..well, finally i doesn't see so much change with loading normally the four variable or using this special technique..
Here's a test with 127 notes played at once.. Maybe there's -0.2% cpu less for 6% cpu... Not even sure ;) :oops: