Page 4 of 4

Re: Audio Compression & Limiting - low CPU%

PostPosted: Fri Feb 21, 2020 3:25 pm
by adamszabo
trogluddite wrote:The only restriction is that the number of loop iterations has to be the same for all SSE channels; other than that you can put whatever code you like as the loop body, and there's no restriction on what variables and registers you use there (so long as you keep the matching eax push/pops). The first example, with the fixed loop iterations, is exactly the same assembly that gets produced by the DSP loop(x){...} instruction.


What I mean is having each loop get a different variable,so on each iteration it gets a different value from an array or something like in ruby. Is that what you explain? Maybe I didnt understand it correctly

Re: Audio Compression & Limiting - low CPU%

PostPosted: Fri Feb 21, 2020 5:32 pm
by trogluddite
Ah, sorry, I think I did misinterpret your question. Yes, that would be possible - in fact, actually quite easy in many cases, because the 'eax' register holds the loop count, and 'eax' is nearly always needed in array index calculations anyway. The only hassle is that it's tricky to get 'eax' to count forwards for variable loop lengths, as the opcodes for subtracting and comparing 'eax' with a value from memory aren't available (in FS3.0.6, at least).

Here's a quick example which sums the values in a Ruby Frame...
ASM_looped_array_access.fsm
(1.73 KiB) Downloaded 891 times

Note that, in this example, I've also type-punned the Frame size, so that the ASM can just read it directly as an integer rather than converting it within the ASM - though, you might not want to do that if the value were needed as part of the calculations (if the example were calculating the average, say).

Re: Audio Compression & Limiting - low CPU%

PostPosted: Fri Feb 21, 2020 6:55 pm
by martinvicanek
Thank you, Trog, for this clear example!

Regarding forward looping and comparison of eax against variable index range, I know two methods which work in FS 3.0.8.1:

Code: Select all
int N=9;
mov eax,0; sub eax,N[0];

loopStart:
  add eax,N[0];
  <use eax as index>
  add eax,1; sub eax,N[0];
cmp eax,0; jl loopStart;


Code: Select all
int N=9;
push ebx; mov ebx,N[0];
mov eax,0;

loopStart:
  <use eax as index>
  add eax,1;
cmp eax,ebx; jl loopStart;

pop ebx;

Re: Audio Compression & Limiting - low CPU%

PostPosted: Fri Mar 06, 2020 5:35 pm
by adamszabo
trogluddite wrote:Here are templates for assembly loops with either a fixed loop count or the count set by a stream input.


So I tried to do a looping addition with different stream values, I would like to loop and add the inputs in1, in2, in3, but It didnt work, what am I missing here?

Code: Select all
// Number of loop iterations
streamin LoopCount;
streamin in1;
streamin in2;
streamin in3;

streamout out;

// For count conditioning.
int IntCount = 0;
float F1p0   = 1.0;
float F0 = 0;
float F1 = 1;

float values[4]; //values 3 + 1

// write sources to buffer
mov eax,16; movaps xmm0,in1; movaps values[eax],xmm0;
add eax,16; movaps xmm0,in2; movaps values[eax],xmm0;
add eax,16; movaps xmm0,in3; movaps values[eax],xmm0;

// Ensure count is positive and convert to integer.
movaps   xmm0, LoopCount;
maxps    xmm0, F1p0;
cvtps2dq xmm0, xmm0;
movaps   IntCount, xmm0;

// Initialize loop.
push eax;
movaps xmm1, F0;  // Init sum to zero.

mov  eax, IntCount[0];
LoopStart:
  push eax;
  //Code

  movaps xmm2,values[eax];
  addps xmm1,xmm2;
 
  pop eax;
// Test loop count and iterate.
add eax, -1;
jnz LoopStart;

movaps out, xmm1;
pop eax;

Re: Audio Compression & Limiting - low CPU%

PostPosted: Fri Mar 06, 2020 8:20 pm
by trogluddite
Your values[eax] isn't stepping 16 bytes per index when you de-reference it in the loop, so it's an unaligned read stepping in single bytes. You need shl eax, 4 before the de-reference to align it.

BTW - you just taught me something new. I had no idea that movaps with an [eax] offset was allowed! Thanks! :D

Re: Audio Compression & Limiting - low CPU%

PostPosted: Sat Mar 07, 2020 3:25 am
by adamszabo
Awesome it works now thanks! I learned something new and you learned as well :D
As far as I know movaps uses less cpu then fstp, I hope someone can confirm, thats why I used it instead of that.

Re: Audio Compression & Limiting - low CPU%

PostPosted: Sat Mar 07, 2020 7:16 pm
by trogluddite
Yes, I imagine it would be a lot quicker. The main reason for using the FPU instructions is that, very often, the four SSE channels are not looking up the same array index, especially in poly code - so you have to laboriously do everything one channel at a time. But if the SSE channels will always point to the same index, that's a rather wasteful way of doing things.