/ Forside / Teknologi / Udvikling / C/C++ / Nyhedsindlæg
Login
Glemt dit kodeord?
Brugernavn

Kodeord


Reklame
Top 10 brugere
C/C++
#NavnPoint
BertelBra.. 2425
pmbruun 695
Master_of.. 501
jdjespers.. 500
kyllekylle 500
Bech_bb 500
scootergr.. 300
gibson 300
molokyle 287
10  strarup 270
memset in a loop ..or not?
Fra : Jake


Dato : 15-09-10 15:08

I have a workbuffer with values that needs to be re-arranged,
so...initially...I did it like this:

for (i = (N - 1); i >= 0; i--)
{
workbuffer[Q * i] = workbuffer[i];
memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
}

but I was told not to use memset. I don't know exactly why I am not allowed
to use memset in a loop.
I guess it's not efficient enough? So I changed the code to this:

for (i = (N - 1); i >= 0; i--)
{
workbuffer[Q * i] = workbuffer[i];
for (j = 0; j < (Q - 1); j++)
{
workbuffer[(Q * i) + 1 + j] = 0;
}
}

Let's say we have a 16 cell workbuffer B.

Four values have been stored in the first 4 cells in the workbuffer: B[0],
B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
code must re-arrange the 4 values so the workbuffer looks like this:

B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0

The code is for an interpolator and in the above example N is the number of
samples in the workbuffer before re-arrangement. So N would be 4! And Q
would be an interpolation factor equal to 4.

Any suggestions for improvement?

Comments about not using memset in a loop are also welcomed.

Thank you.


 
 
Bertel Brander (15-09-2010)
Kommentar
Fra : Bertel Brander


Dato : 15-09-10 19:00

Den 15-09-2010 16:07, Jake skrev:
> I have a workbuffer with values that needs to be re-arranged,
> so...initially...I did it like this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
> }
>
> but I was told not to use memset. I don't know exactly why I am not allowed
> to use memset in a loop.

The compiler has every chance to make memset at least as efficient
as anything else you can do, so if you need to set some memory
to something, go ahead and use memset.
It is in general not a good idea to loop backwards from N to 0.

> I guess it's not efficient enough? So I changed the code to this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> for (j = 0; j < (Q - 1); j++)
> {
> workbuffer[(Q * i) + 1 + j] = 0;
> }
> }
>
> Let's say we have a 16 cell workbuffer B.
>
> Four values have been stored in the first 4 cells in the workbuffer: B[0],
> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
> code must re-arrange the 4 values so the workbuffer looks like this:
>
> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>
> The code is for an interpolator and in the above example N is the number of
> samples in the workbuffer before re-arrangement. So N would be 4! And Q
> would be an interpolation factor equal to 4.
>
> Any suggestions for improvement?

For small blocks of memory, it can be a good idea to
"unroll" loops, so if Q in your case is small, it might
be better to:

for(i = 0; i < N; ++i)
{
workbuffer[Q * i] = workbuffer[i];
workbuffer[(Q * i) + 1 + 0] = 0;
workbuffer[(Q * i) + 1 + 1] = 0;
workbuffer[(Q * i) + 1 + 2] = 0;
workbuffer[(Q * i) + 1 + 3] = 0;
}

But as for any optimization, first check if you need
to do the optimization and then measure what is the
most efficient solution.

Arne Vajhøj (16-09-2010)
Kommentar
Fra : Arne Vajhøj


Dato : 16-09-10 02:31

On 15-09-2010 13:59, Bertel Brander wrote:
> Den 15-09-2010 16:07, Jake skrev:
>> I guess it's not efficient enough? So I changed the code to this:
>>
>> for (i = (N - 1); i >= 0; i--)
>> {
>> workbuffer[Q * i] = workbuffer[i];
>> for (j = 0; j < (Q - 1); j++)
>> {
>> workbuffer[(Q * i) + 1 + j] = 0;
>> }
>> }
>>
>> Let's say we have a 16 cell workbuffer B.
>>
>> Four values have been stored in the first 4 cells in the workbuffer:
>> B[0],
>> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
>> code must re-arrange the 4 values so the workbuffer looks like this:
>>
>> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>>
>> The code is for an interpolator and in the above example N is the
>> number of
>> samples in the workbuffer before re-arrangement. So N would be 4! And Q
>> would be an interpolation factor equal to 4.
>>
>> Any suggestions for improvement?
>
> For small blocks of memory, it can be a good idea to
> "unroll" loops, so if Q in your case is small, it might
> be better to:
>
> for(i = 0; i < N; ++i)
> {
> workbuffer[Q * i] = workbuffer[i];
> workbuffer[(Q * i) + 1 + 0] = 0;
> workbuffer[(Q * i) + 1 + 1] = 0;
> workbuffer[(Q * i) + 1 + 2] = 0;
> workbuffer[(Q * i) + 1 + 3] = 0;
> }

I consider manual loop unrolling as a thing of the
past (late 80's early 90's).

Today I would expect the compiler to do that type
of optimizations.

(possible controlled by a compiler directive)

Arne

Arne Vajhøj (16-09-2010)
Kommentar
Fra : Arne Vajhøj


Dato : 16-09-10 02:29

On 15-09-2010 10:07, Jake wrote:
> I have a workbuffer with values that needs to be re-arranged,
> so...initially...I did it like this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
> }
>
> but I was told not to use memset. I don't know exactly why I am not allowed
> to use memset in a loop.

I think you should ask why.

> I guess it's not efficient enough? So I changed the code to this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> for (j = 0; j < (Q - 1); j++)
> {
> workbuffer[(Q * i) + 1 + j] = 0;
> }
> }
>
> Let's say we have a 16 cell workbuffer B.
>
> Four values have been stored in the first 4 cells in the workbuffer: B[0],
> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
> code must re-arrange the 4 values so the workbuffer looks like this:
>
> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>
> The code is for an interpolator and in the above example N is the number of
> samples in the workbuffer before re-arrangement. So N would be 4! And Q
> would be an interpolation factor equal to 4.
>
> Any suggestions for improvement?

I am skeptical about this being faster than memset.

I think it is safe to assume that the memset code has been
optimized - it can not be less optimized than your loop.

memset may use a special instruction for the specific CPU
architecture instead of a loop.

The only drawback of memset I can think of is function call
overhead. But then many compilers allow inlining of that call.

Arne


Søg
Reklame
Statistik
Spørgsmål : 177587
Tips : 31968
Nyheder : 719565
Indlæg : 6409127
Brugere : 218888

Månedens bedste
Årets bedste
Sidste års bedste