93

data.table objects now have a := operator. What makes this operator different from all other assignment operators? Also, what are its uses, how much faster is it, and when should it be avoided?

moodymudskipper's user avatar
moodymudskipper
47.7k12 gold badges131 silver badges185 bronze badges
askedAug 11, 2011 at 17:01
Ari B. Friedman's user avatar

1 Answer1

100

Here is an example showing 10 minutes reduced to 1 second (from NEWS onhomepage). It's like subassigning to adata.frame but doesn't copy the entire table each time.

m = matrix(1,nrow=100000,ncol=100)DF = as.data.frame(m)DT = as.data.table(m)system.time(for (i in 1:1000) DF[i,1] <- i)     user  system elapsed   287.062 302.627 591.984 system.time(for (i in 1:1000) DT[i,V1:=i])     user  system elapsed     1.148   0.000   1.158     ( 511 times faster )

Putting the:= inj like that allows more idioms :

DT["a",done:=TRUE]   # binary search for group 'a' and set a flagDT[,newcol:=42]      # add a new column by reference (no copy of existing data)DT[,col:=NULL]       # remove a column by reference

and :

DT[,newcol:=sum(v),by=group]  # like a fast transform() by group

I can't think of any reasons to avoid:= ! Other than, inside afor loop. Since:= appears insideDT[...], it comes with the small overhead of the[.data.table method; e.g., S3 dispatch and checking for the presence and type of arguments such asi,by,nomatch etc. So for insidefor loops, there is a low overhead, direct version of:= calledset. See?set for more details and examples. The disadvantages ofset include thati must be row numbers (no binary search) and you can't combine it withby. By making those restrictionsset can reduce the overhead dramatically.

system.time(for (i in 1:1000) set(DT,i,"V1",i))     user  system elapsed     0.016   0.000   0.018
answeredAug 11, 2011 at 17:18
Matt Dowle's user avatar
Sign up to request clarification or add additional context in comments.

15 Comments

Thanks for developing this package. I have a feeling I'm going to be revising alot of my code to use this package.
On chat I was asked to self ask/answer (which apparently isencouraged) - that question ishere
@MatthewDowle Want to include an explanation of when not to use := and to use set() instead?
@MatthewDowle I'd +1 again if I could.
@jabberwocky No problem.set(DT, i, "V1", i) sets the"V1" column whilstset(DT, i, colVar, i) sets the column name contained in thecolVar variable (e.g. ifcolVar = "V1" was done earlier). The quotes indicate to take the column name literally rather than lookup the variable.
|

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.