Page 2 of 2

Re: optimisation - tools (2019)

PostPosted: Tue Jun 11, 2019 3:05 pm
by trogluddite
@tulamide
Thanks for pointing that out - I'm curious to do a few experiments of my own now. I must admit I was merely parroting "gospel" from the old SM days that I don't recall having ever tested empirically.

payaDSP wrote:AH ZX spectrum

You're making me nostalgic now! :lol:
My first serious electronic/computer music project was an audio sampling system for my Memotech MTX512 (also a Z80 machine). It was embarrassingly crude by today's standards - a DIY peripheral with 8-bit A/D and D/A converters and some basic audio amps, and assembly code where I had to use no-op instructions to make sure that every signal path was the same length to keep the sample timing stable. I've no idea what my top sampling rate was, but I bet not much more than a couple of kHz, and only enough memory for a second or two of sampling time. I've been playing little musical riffs using the sound of myself burping ever since! :lol:

Re: optimisation - tools (2019)

PostPosted: Tue Jun 11, 2019 6:26 pm
by payaDSP
Hi
little new test :
i keep simplex stock inversor (VCA x -1)
i made 6 nested modules, and 12 nested modules
and again 12 x linear modules.
testing with 100 test/sec, only this module active, no other app running (W7 pro)
I think there is a problem in measuring:
at start (nested in A, linear in B): A is faster #10% (surprising , but this is the aim of the test)
BUT waiting...
gain is decreasing with le number of test, so at test Nb=800 gain is 0.9% (A faster)
at nb test=2000 : gain #0.03%
at nb test=3000: :ugeek: gain#0.2%, BUT THIS IS B WICH IS FASTER NOW !!!
at nb test=5000: B is now faster at 0.3%

what is the way to explain the slow changes of performances ?
it seems there is a problem with tool to evaluate speed.

please redo test for your own
i have tested 6 nested VS 6 linear and 6 nested *2 VS 12 linear

Re: optimisation - tools (2019)

PostPosted: Tue Jun 11, 2019 7:15 pm
by trogluddite
I imagine that the unpredictability that you're seeing is down to some of the CPU/memory effects that I mentioned in my earlier post. The number of clock cycles it takes to access a variable varies depending on where its value is found. If the required value is in a CPU cache, it's much faster to access than if it needs to be fetched from main memory as part of a new cache line (which most often also purges the old data from an existing cache line). The CPU caches are limited in size and shared by all running threads, so there will be slight variations for each test run which, for all practical purposes, are completely unpredictable. However, there are optimisations sometimes which can increase the chances that values you need will be in the caches (in general, keep data that will be accessed close together in time close together in memory address.)

Such random variations due to the run-time context are one of the reasons that I rarely bother with optimisations that save only fractional CPU percentages these days - they can be wiped out very easily in practice, and should be balanced against the inconvenience of more convoluted code. The days when you could just count the op-codes to work out running times are long gone, I'm afraid.