Alternative to loops in R [duplicate]

Question

Possible Duplicate:
Speed up the loop operation in R

I have a few questions regarding loops. I know that R works faster with vectorized calculations, and I would like to change the below code to take advantage of this. Looking into some other answers on the forum the sapply function seems to be able to replace the inside for loop, but I am generating a vector of zeros so there is an error. Tao remains 1000, and I think this is creating the problem.

My primary concern is speed, as I need to create a loop around the entire algorithm and plot in different V and n sizes for some further analysis.

Thanks for your help

Alternative loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    sapply(((j-n+1):j),function (tao) signal[j] = signal[j] + abs(V_s[tao] - V_b[tao]))

    signal[j] = (signal[j] / (n * V) )

}

Original loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    for( tao in ((j-n+1):j))    {

        signal[j] = (signal[j] + abs(V_s[tao] - V_b[tao]))

    }
        signal[j] = (signal[j] / (n * V) )

}

Is there a reason you use matrices instead of vectors, whereas V_s, V_b and signal seems to have only one column ? — juba, Jan 25, 2013 at 10:29
If you put browser() inside the for(tao) loop, you'll be able to inspect the inner workings of the function and see what's going on. — Roman Luštrik, Jan 25, 2013 at 10:37
@juba , I used the matrix formatting in a later approach, but essentially yes, it is a vector. — Morten, Jan 25, 2013 at 10:47
General advice on speeding up R code: stackoverflow.com/a/8474941/636656 — Ari B. Friedman, Jan 25, 2013 at 10:58

Thilo · Accepted Answer · 2013-01-25 11:32:12Z

Using filters, you can do your computation even without any loop (and sapply is nothing more than a hidden loop).

absdif <- abs(V_s - V_b)
signal <- filter(absdif[1:L], rep(1/(n*V), n), sides=1)
signal[is.na(signal)] <- 0

Understanding what is happening in the second line is not trivial when your not used to filters, though. Let's have a closer look:

First we compute the absolute differences of V_s and V_b, which you loop uses frequently. Then comes the filter. Your computation is nothing more than summing up the the n past values at each time value j. Thus, we have something like

signal[j] <- sum(absdif[j-n+1:j])

That is exactly what convolution filters do - summing up some values - in the general form by multiplication with some weight. As weight we choose 1/(n*V), for all values, which corresponds to the normalization you do in your outer loop. The last argument, sides=1 simply tells the filter to take values only from the past (sides=2 would mean sum(absdif[(j-n/2):(j+n/2)])).

The last line just fills up the NA values at the beginning (where the filter does not have enough data to compute the sum - this equals to skipping the first n values).

Finally, some timing:

Your full-loop solution:

   User      System       total 
  0.037       0.000       0.037

The solution of juba:

   User      System       total 
  0.007       0.000       0.008

The solution using filters:

   User      System       total 
  0.000       0.000       0.001

Note that the concept of filters is really well researched and can be done incredibly fast.

Edit: As noted in ?filter, R does not use Fast fourier transform with the standard filter command. Usually, FFT is the most efficient way to implement convolutions. However, even this can be done by replacing the filter command with

signal <- convolve(absdif[1:L], rep(1/(n*V), n), type='filter')

Note that now the first n entries are stripped of instead of set to NA. The result is the same, however. Timing this time is of no use - the total time is below the three digit output of system.time... However, note the following remark in the R help of filter:

convolve(, type="filter") uses the FFT for computations and so may be faster for long filters on univariate series, but it does not return a time series (and so the time alignment is unclear), nor does it handle missing values. filter is faster for a filter of length 100 on a series of length 1000, for example

@Paul Well, vectorization always is just a hidden loop, but with different steps of optimization in between. And filters should be on the same level of optimization than, say, the abs operator is. However, while thinking about it, we can still do better. See my edit in a second. — Thilo, Jan 25, 2013 at 11:27
Thanks for the answer, I ran your suggestions and received a different output than I was expecting. I should substitute your solution with both my for loops? — Morten, Jan 25, 2013 at 11:59

juba · Accepted Answer · 2013-01-25 10:53:56Z

3

Vectorizing calculations doesn't always mean using an *apply function.

For example, you can simplify and speed up things by replacing your second for loop with vector indexing :

for(j in (n:L)){
  sel <- (j-n+1):j
  signal[j] <- sum(abs(V_s[sel] - V_b[sel])) / (n*V)
}

For this solution, the execution time on my system is :

utilisateur     système      écoulé 
      0.008       0.004       0.009

Whereas for your for loops it is :

utilisateur     système      écoulé 
       0.06        0.00        0.06

By the way, you should't use the tao name for two different things.

edited Jan 25, 2013 at 10:53

answered Jan 25, 2013 at 10:48

juba

48.5k14 gold badges114 silver badges118 bronze badges

Exactly. Vectorization almost always results in speed improvements. *apply functions rarely do.
– Ari B. Friedman
Jan 25, 2013 at 10:57
This has shortened the processing time significantly, thanks for your help and input.
– Morten
Jan 25, 2013 at 12:26

Add a comment |

agstudy · Accepted Answer · 2013-01-25 10:48:10Z

0

Assuming your explicit loop is a correct calc, try this:

 signal[j]<- signal[j] + 
              sapply((j-n+1):j, 
                   FUN = function(iter){ 
                           abs(V_s[iter] - V_b[iter])
                   }, V_s = V_s, V_b = V_b)

Note that sapply is returning the iter-th index absolute difference between V_s and V_b. This is then added to signal[j]

edited Jan 25, 2013 at 10:48

agstudy

121k18 gold badges203 silver badges265 bronze badges

answered Jan 25, 2013 at 10:27

Aditya Sihag

5,1275 gold badges33 silver badges44 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Alternative to loops in R [duplicate]

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
r
for-loop
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged rfor-loop or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
for-loop
or ask your own question.