3

Possible Duplicate:
Speed up the loop operation in R

I have a few questions regarding loops. I know that R works faster with vectorized calculations, and I would like to change the below code to take advantage of this. Looking into some other answers on the forum the sapply function seems to be able to replace the inside for loop, but I am generating a vector of zeros so there is an error. Tao remains 1000, and I think this is creating the problem.

My primary concern is speed, as I need to create a loop around the entire algorithm and plot in different V and n sizes for some further analysis.

Thanks for your help

Alternative loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    sapply(((j-n+1):j),function (tao) signal[j] = signal[j] + abs(V_s[tao] - V_b[tao]))

    signal[j] = (signal[j] / (n * V) )

} 

Original loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    for( tao in ((j-n+1):j))    {

        signal[j] = (signal[j] + abs(V_s[tao] - V_b[tao]))

    }
        signal[j] = (signal[j] / (n * V) )

}
4
  • Is there a reason you use matrices instead of vectors, whereas V_s, V_b and signal seems to have only one column ?
    – juba
    Jan 25, 2013 at 10:29
  • 1
    If you put browser() inside the for(tao) loop, you'll be able to inspect the inner workings of the function and see what's going on. Jan 25, 2013 at 10:37
  • @juba , I used the matrix formatting in a later approach, but essentially yes, it is a vector.
    – Morten
    Jan 25, 2013 at 10:47
  • 2
    General advice on speeding up R code: stackoverflow.com/a/8474941/636656 Jan 25, 2013 at 10:58

3 Answers 3

12

Using filters, you can do your computation even without any loop (and sapply is nothing more than a hidden loop).

absdif <- abs(V_s - V_b)
signal <- filter(absdif[1:L], rep(1/(n*V), n), sides=1)
signal[is.na(signal)] <- 0

Understanding what is happening in the second line is not trivial when your not used to filters, though. Let's have a closer look:

First we compute the absolute differences of V_s and V_b, which you loop uses frequently. Then comes the filter. Your computation is nothing more than summing up the the n past values at each time value j. Thus, we have something like

signal[j] <- sum(absdif[j-n+1:j])

That is exactly what convolution filters do - summing up some values - in the general form by multiplication with some weight. As weight we choose 1/(n*V), for all values, which corresponds to the normalization you do in your outer loop. The last argument, sides=1 simply tells the filter to take values only from the past (sides=2 would mean sum(absdif[(j-n/2):(j+n/2)])).

The last line just fills up the NA values at the beginning (where the filter does not have enough data to compute the sum - this equals to skipping the first n values).

Finally, some timing:

Your full-loop solution:

   User      System       total 
  0.037       0.000       0.037 

The solution of juba:

   User      System       total 
  0.007       0.000       0.008 

The solution using filters:

   User      System       total 
  0.000       0.000       0.001 

Note that the concept of filters is really well researched and can be done incredibly fast.

Edit: As noted in ?filter, R does not use Fast fourier transform with the standard filter command. Usually, FFT is the most efficient way to implement convolutions. However, even this can be done by replacing the filter command with

signal <- convolve(absdif[1:L], rep(1/(n*V), n), type='filter')

Note that now the first n entries are stripped of instead of set to NA. The result is the same, however. Timing this time is of no use - the total time is below the three digit output of system.time... However, note the following remark in the R help of filter:

convolve(, type="filter") uses the FFT for computations and so may be faster for long filters on univariate series, but it does not return a time series (and so the time alignment is unclear), nor does it handle missing values. filter is faster for a filter of length 100 on a series of length 1000, for example

11
  • Nice solution, didn't know about this use of filter().
    – juba
    Jan 25, 2013 at 11:13
  • +1, although filter is also probably a loop in disguise... Jan 25, 2013 at 11:16
  • @PaulHiemstra Maybe, but a very efficient one it seems :)
    – juba
    Jan 25, 2013 at 11:24
  • @Paul Well, vectorization always is just a hidden loop, but with different steps of optimization in between. And filters should be on the same level of optimization than, say, the abs operator is. However, while thinking about it, we can still do better. See my edit in a second.
    – Thilo
    Jan 25, 2013 at 11:27
  • Thanks for the answer, I ran your suggestions and received a different output than I was expecting. I should substitute your solution with both my for loops?
    – Morten
    Jan 25, 2013 at 11:59
3

Vectorizing calculations doesn't always mean using an *apply function.

For example, you can simplify and speed up things by replacing your second for loop with vector indexing :

for(j in (n:L)){
  sel <- (j-n+1):j
  signal[j] <- sum(abs(V_s[sel] - V_b[sel])) / (n*V)
}

For this solution, the execution time on my system is :

utilisateur     système      écoulé 
      0.008       0.004       0.009 

Whereas for your for loops it is :

utilisateur     système      écoulé 
       0.06        0.00        0.06 

By the way, you should't use the tao name for two different things.

2
  • Exactly. Vectorization almost always results in speed improvements. *apply functions rarely do. Jan 25, 2013 at 10:57
  • This has shortened the processing time significantly, thanks for your help and input.
    – Morten
    Jan 25, 2013 at 12:26
0

Assuming your explicit loop is a correct calc, try this:

 signal[j]<- signal[j] + 
              sapply((j-n+1):j, 
                   FUN = function(iter){ 
                           abs(V_s[iter] - V_b[iter])
                   }, V_s = V_s, V_b = V_b)

Note that sapply is returning the iter-th index absolute difference between V_s and V_b. This is then added to signal[j]

Not the answer you're looking for? Browse other questions tagged or ask your own question.