I am processing adata.table instance and looking to create an extra column using:= this worked fine until I did some double indexing.
For the following instance of adata.table:
example_data= structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L), .Label = c("AUDUSD", "EURUSD", "GBPUSD", "NZDUSD", "USDCAD","USDJPY"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014"), class = "factor"), V3 = c("1.RData", "10.RData", "11.RData", "12.RData", "2.RData", "3.RData", "4.RData", "5.RData", "6.RData", "7.RData")), .Names = c("V1", "V2", "V3"), class = c("data.table", "data.frame"), row.names = c(NA, -10L))which gives this data:
example_data V1 V2 V3 1: AUDUSD 2007 1.RData 2: AUDUSD 2007 10.RData 3: AUDUSD 2007 11.RData 4: AUDUSD 2007 12.RData 5: AUDUSD 2007 2.RData 6: AUDUSD 2007 3.RData 7: AUDUSD 2007 4.RData 8: AUDUSD 2007 5.RData 9: AUDUSD 2007 6.RData10: AUDUSD 2007 7.RData
I am looking to split the "V3" column on a "." and get the preceding number as a character in a new column in the same table.
Doing this is straightforward in normal R:
example_data$MONTH = apply(example_data,1, function(x) { strsplit(as.character(x[["V3"]]),"\\.")[[1]][1]})
I thought that doing this indata.table would be even more straightforward:
example_data[,MONTH:=strsplit(as.character(V3),"\\.")[[1]][1]]
However the double indexing is not being interpreted as I intended, because it is changing all values to the outcome of the first row. Removing the indexing does perform the correct operation (just not extracting and placing the data in the right place):
example_data[,strsplit(as.character(V3),"\\.")]
I also attempted tointernalize the indexing by applying a function but got to the same wrong result:
myfunc <- function(x) { strsplit(as.character(x),"\\.")[[1]][1] }example_data[,MONTH:=myfunc(V3)]
I can always use the standard R solution but if anyone knows of adata.table based solution that would be appreciated. I am not interested in other standard R or(d)plyr based alternatives (they are great - just not what I am asking).
- I will,
data.tableis so much quicker than some other options it is worth the syntax learning curve.crogg01– crogg012014-11-15 19:57:06 +00:00CommentedNov 15, 2014 at 19:57
2 Answers2
You should usesub instead ofstrsplit:
example_data[ , MONTH := sub("\\..*", "", V3)] V1 V2 V3 MONTH 1: AUDUSD 2007 1.RData 1 2: AUDUSD 2007 10.RData 10 3: AUDUSD 2007 11.RData 11 4: AUDUSD 2007 12.RData 12 5: AUDUSD 2007 2.RData 2 6: AUDUSD 2007 3.RData 3 7: AUDUSD 2007 4.RData 4 8: AUDUSD 2007 5.RData 5 9: AUDUSD 2007 6.RData 610: AUDUSD 2007 7.RData 7However, it works withstrsplit too:
example_data[ , MONTH := unlist(strsplit(V3, "\\..*"))]3 Comments
strsplit() returns a list. You might have tounlist().unlist.cSplit approach from "splitstackshape" doesn't save us any work... :-(cSplit(example_data, "V3", "\\..*", drop = FALSE, type.convert = FALSE, fixed = FALSE)Just pushed two functionstranspose() andtstrsplit() indata.table v1.9.5.
With this we can do:
require(data.table)setDT(example_data)[, col := tstrsplit(V3, ".", fixed=TRUE)[[1L]]]# V1 V2 V3 col# 1: AUDUSD 2007 1.RData 1# 2: AUDUSD 2007 10.RData 10# 3: AUDUSD 2007 11.RData 11# 4: AUDUSD 2007 12.RData 12# 5: AUDUSD 2007 2.RData 2# 6: AUDUSD 2007 3.RData 3# 7: AUDUSD 2007 4.RData 4# 8: AUDUSD 2007 5.RData 5# 9: AUDUSD 2007 6.RData 6# 10: AUDUSD 2007 7.RData 7tstrsplit is a wrapper fortranspose(strsplit(...)).transpose() can also be used onlists,data frames anddata tables. Please check the documentation and examples for more.
1 Comment
Explore related questions
See similar questions with these tags.

