r - Predictive power of date variable reduces when changed from as.Date to as.numeric -
i'm building regression model several date , numeric variables. quick check on 1 of date variables
lm.fit = lm(label ~ firstday, data = rawdata) summary(lm.fit)$r.squared
to gauge predictive influence on model. accounted 41% of variance. attempted change date numeric can work better variable. used command
as.numeric(as.posixct(rawdata$firstday, format = "%y-%m-%d"))
doing reduced variance 10% - not want. doing wrong , how go it?
i've looked @ https://stats.stackexchange.com/questions/65900/does-it-make-sense-to-use-a-date-variable-in-a-regression answer not clear me.
edit 1:
a reproducible code sample of did shown below:
label = c(0,1,0,0,0,1,1) firstday = c("2016-04-06", "2016-04-05", "2016-04-04", "2016-04-03", "2016-04-02", "2016-04-02","2016-04-01") lm.fit <- lm(label ~ firstday) summary(lm.fit)$r.squared [1] 0.7083333
on changing numeric:
firstday = as.numeric(as.posixct(firstday, format="%y-%m-%d"))
i get
lm.fit <- lm(label ~ firstday) summary(lm.fit)$r.squared [1] 0.1035539
it's because original list of dates list of items, without date sequence information.
see below how change them arbitrary letters same result. third code snippet returns same r2 first code snippet.
label <- c(0,1,0,0,0,1,1) firstday1<- c("2016-04-06","2016-04-05","2016-04-04","2016-04-03","2016-04-02","2016-04-02","2016-04-01") str(firstday1) lm.fit1 <- lm(label~firstday1) summary(lm.fit1)$r.squared [1] 0.7083333 firstday2 <- as.numeric(as.posixct(firstday1,format="%y-%m-%d")) str(firstday2) lm.fit2 <- lm(label ~ firstday2) summary(lm.fit2)$r.squared [1] 0.1035539 firstday3<- c("a","b","c","d","e","e","f") str(firstday3) lm.fit3 <- lm(label~firstday3) summary(lm.fit3)$r.squared [1] 0.7083333
Comments
Post a Comment