4

data.table's:= operator isdocumented as:

... adds or updates or removescolumn(s) by reference. It makes no copies of any part of memory atall.

So what happens here?

dt <- data.table(a = 1:5, b = 6:10)address(dt$b)# [1] "0000021cca78db58"dt[, b := 2*a]address(dt$b)# [1] "0000021cc77ade10"

How come the address ofb column changes?

I'm using R 3.6.1 and data.table 1.12.8.

Gregor Thomas's user avatar
Gregor Thomas
147k22 gold badges185 silver badges320 bronze badges
askedJul 29, 2020 at 20:02
Ofek Shilon's user avatar
1

1 Answer1

6

You (or perhaps the column) just gotplonked ;) The plonk behaviour is rather thoroughly described in the help text (?`:=`):

Unlike<- fordata.frame, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given (whether or not fractional data is truncated). The motivation for this is efficiency. It is best to get the column types correct up front and stick to them. Changing a column type is possible but deliberately harder: provide a whole column as the RHS. This RHS is then plonked into that column slot and we call this plonk syntax, or replace column syntax if you prefer. By needing to construct a full length vector of a new type, you as the user are more aware of what is happening, and it's clearer to readers of your code that you really do intend to change the column type.

However, the relationship between plonking andmemory is currentlynot explicitly addressed in the docs (but see below). Hence questions like yours and by others (on github::= does not update by reference existing column if i is missing,:= doesn't always assign in-place).

There are a lot of interesting points in the github posts, but rather than me reiterating them, please just go there and enjoy! Onequote from Matt Dowle though, which I believe nicely justifies the plonk behaviour:

Instead of 5 column allocatons, there's just one now for thea+a expression (the RHS, which gets created anyway) which is then plonked into the column slot by reference i.e.address(DT) doesn't change butaddress(DT$a) will change. That's correct behaviour, and most efficient, to save copying the whole RHS into the existing column (which is only possible if they're the same type anyway). Since the RHS is as long as the number of rows, it is just plonked in.

(Disclaimer: things may have changed in bothdata.table andR since that post, but I think the main message is still valid.)


Regarding documentation, there is an open PR (update and clarify := docs), where a more explicit description of plonk and memory is suggested:

When a column isplonked, the original column is not updated by reference, because that needs to update every single element of that column.


Have I been plonked? Yes! For me it wasn't memory, but column classes which caused some head scratching, and I ended up here:Why is data.table casting column classes when I assign all columns by reference. After reading your question, I returned to that post and realized thatthe very nice answer by Matt not "only" addresses class but also memory. I think it's worth repeating here (mybold and comment in[]):

iflength(RHS) == nrow(DT) then the RHS (and whatever its type) isplonked into that column slot. Even if those lengths are 1. Iflength(RHS) < nrow(DT), the memory for the column (and its type) iskept in place [implicitly memorynot kept in place whenlength(RHS) == nrow(DT), I assume] but the RHS is coerced and recycled toreplace the (subset of) items in that column.

If I need to change a column's type in a large table I write:

DT[, col := as.numeric(col)]

hereas.numeric allocates a new vector, coerces "col" into thatnew memory, which is then plonked into the column slot. It's asefficient as it can be. The reason that's a plonk is becauselength(RHS) == nrow(DT).

answeredJul 30, 2020 at 12:12
Henrik's user avatar
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.