落园 – Page 48 – 来者皆客

不说人话的R报错信息

此文为译文，谨此来纪念那些被R不知所云的稀奇古怪的报错折磨过的凄凉岁月...

Translating Weird R Errors
January 20, 2013
By Slawa Rokicki

原文写的很风趣，时间所限我就简单的翻译一下了。

1. 其实我只是拼错了变量名...

运行这段代码：

prob1<-as.data.frame(cbind(c(1,2,3),c(5,4,3)))
colnames(prob1)<-c("Education","Ethnicity")
table(prob1$education, prob1$Ethnicity)

然后R会报错：

all arguments must have the same length

莫名其妙有木有？其实正确的应该是：

table(prob1$Education, prob1$Ethnicity)

我只是忘了大写了...囧。

2. 我只是调用了不存在的变量....

比如我运行：

prob1$gender_recode <-as.numeric(prob1$Gender==2)

然后就会报错：

replacement has 0 rows, data has 3

但是这样就没问题：

prob1$Educ_recode<-as.numeric(prob1$Education==2)

原因只是gender这个变量不存在....你就不能直接告诉我找不到变量么？

3. 找不到变量？

我这次确保Education是有的，但是居然还是报错？

nrow(prob1[prob1$Education!=1])

报错：

undefined columns selected

而人家只是少打了一个逗号而已嘛...

nrow(prob1[prob1$Education!=1,])

哎，你就不能直接报语法错误嘛！

原文附下：

I love R. I think it's intuitive and clever and overall a great language. But I do get really annoyed sometimes at the completely ridiculous, cryptic error messages it often gives me. This post will go over some of those seemingly nonsensical errors so you don't have to go crazy trying to find the bug in your code.

1. all arguments must have the same length

To start with, I just make up some quick data:

prob1<-as.data.frame(cbind(c(1,2,3),c(5,4,3)))
colnames(prob1)<-c("Education","Ethnicity")

And now I just want to do a simple table but I get this error:

all arguments must have the same length

What the heck. I look back at my dataset and make sure that both those variables are the same length, which they do. The problem here is that I misspelled "Education". There's a missing "a" in there and instead of telling me that I referenced a variable that doesn't exist, R bizarrely tells me to check the length of my variables. Remember: Anytime you get an error, check to make sure you've spelled everything right.

If I do this, everything works out great:

table(prob1$Education, prob1$Ethnicity)

2. replacement has 0 rows, data has 3

A very similar problem, with a very different error message. Let's say I forgot what columns were in my prob1 data and I thought I had a Sex indicator in there. So I try to recode it like this:

This error message is also pretty unhelpful. The syntax is totally correct; the problem is that I just don't have a variable named Sex in my dataset. If I do this instead to recode education, a variable that exists, everything is fine:

prob1$Educ_recode<-as.numeric(prob1$Education==2)

3. undefined columns selected

Ironically, the error we so badly wanted before comes up but for a completely different reason. See if you can find the problem here. I'll take that same little dataset and I just want to know how many rows there are in which Education is not equal to 1.

So, if I want to know the number of rows of the dataframe prob1, I do:

nrow(prob1)

and if I want to know how many have a value of Education not equal to 1, I do the following (incorrectly) and get an error:

Now I check my variable name and I've definitely spelled Education right this time. The problem, actually, is not that I have referenced a column that doesn't exist but I've messed up the syntax to the nrow() function, in that I haven't defined what columns I want to subset. When I do,

prob1[prob1$Education!=1]

this doesn't make any sense, because I'm saying to subset prob1 but to do this I have to specify which rows I want and which columns I want. This just lists one condition in the brackets and it's unclear whether it's for the rows or columns. See my post on subsetting for more details on this.

If I do it the following way, all is good since I'm saying to subset prob1 with only rows with education !=1 and all columns:

nrow(prob1[prob1$Education!=1,])

Tags R, 报错, 缺失变量, 翻译, 语法错误

我的生活状态

Photo of the year, 2012

Post author By Liyun
Post date January 15, 2013

过去的一年拍了很多很漂亮的照片，我只是没来由的很喜欢这张...虽然当时是匆匆偷拍 -_-|| （先向这位我不认识的路人甲致歉一下）。依旧，点击可见大图。

有的时候，可能不是照片本身的色彩啊，取景啊什么的，更多的是里面的韵味、想象力和那种温文尔雅的气质吧。让人不禁心醉的气场。

Tags 气场, 温文尔雅, 照片

游来游去

无锡，无影无踪。

无锡这个城市真的很上相...所以还是忍不住发几张照片。曾经那么辉煌的无锡，也渐渐的开始没落了。很多古迹保存的都不怎么样了，只能潦草看看。不过貌似每次去无锡的时间都不好，上次是盛夏，这次是寒冬，总是一种阴霾的感觉。或许春天樱花盛开的时候，会比较的漂亮吧。

不过，大致近期应该不会再去了吧...已经没什么惦念和留恋的了，无影无踪的就飘过去了。连多一点文字，都小心翼翼的吝啬起来。万般诗情画意，千许浮华词藻，也得有所撩拨才能绽放。心如止水，亦如坚冰。

若无所留恋，便一如匆匆访问过的那些城市一般，沉淀在记忆力的只是若干符号。有人之处，方有快乐。

nEO_IMG_DSC07089
一缕炊烟，清名桥侧。

nEO_IMG_DSC07097
很喜欢这个咖啡书吧。

nEO_IMG_DSC07084
新旧之别：只见新人笑，谁闻旧人哭？

nEO_IMG_DSC07141
破旧的高院:庭院深深深几许。杨柳堆烟,帘幕无重数。

nEO_IMG_DSC07068
水乡依依。

很喜欢的蔓藤铺满墙。

nEO_IMG_DSC07063
又见猫空：你站在桥上看风景,看风景的人在楼上看你。

nEO_IMG_DSC07134
砖窑洞

nEO_IMG_DSC07143
残败的花盆

nEO_IMG_DSC07104
狭窄破落的弄堂。

nEO_IMG_DSC07160
道家之道。

nEO_IMG_DSC07149
繁荣的虚假的南禅寺

Tags 摄影, 无锡, 水乡, 记忆, 踪迹

读书有感

≪统计学习精要(The Elements of Statistical Learning)≫课堂笔记（十三）

Post author By Liyun
Post date January 13, 2013
3 Comments on ≪统计学习精要(The Elements of Statistical Learning)≫课堂笔记（十三）

本学期最后一堂课的笔记...就这样，每周上班的时候都没有惦念的了，我是有多么喜欢教室和课堂呀。或者说，真的是太习惯学校的生活方式了吧...

这一节主要是在上一节的基础上，介绍一些可加模型或者树模型的相关（改进）方法。

MARS

MARS全称为Multivarible Adaptive Regression Splines，看名字就能猜出来大致他是做啥的。MARS这家伙与CART一脉相承（话说CART的竞争对手就是大名鼎鼎的C4.5）。不过，还是先说一下MARS到底是怎么玩的吧。

数据集依旧记作，然后就是splines的思想：我们定义，其中和，画出图形来就是:

这样就可以定义I函数了：，以及，越来越有spines味道了是不是？

之后就是定义f函数：，然后有意思的就来了：是中函数或者几个函数的乘积，选定了之后我们就可以用最小二乘法来求解相应的了。然后在接下来的每一步，我们都添加这样，一步步的，就开始增长。当我们用完了之后，显然有

over-fit的嫌疑，所以开始逐步的减少一些——考虑移除那些对减少残差平方和贡献比较小的项目。沿着cross-validation的思路，就可以定义函数。

PRIM

PRIM的全称为Patient Rule Induction Method，呃看名字貌似是一种比较耐心的一步步递归的方法。果不其然，最开始就是我们要先定义“削皮”：选取区间内任意的，比如0.1，然后开始削皮～削皮的策略大概就是，选定一个维度，去掉这个维度比如最大10\%或者最小10\%的样本，然后看剩余部分的y均值有没有增长。总共有p个维度，所以我们有中削皮法。选择其中上升最高的方法，削皮。然后继续来一遍，直到不能再增长的时候，停止，最终得到一块“精华”（贪心的算法）。之后，我们又要开始粘贴，即再贴上去一块儿，看看是否能涨。这样我们得到一个区，区域均值为。