查看原文
其他

双重聚类cluster咋做? 线性, logit, tobit可以双聚类吗?

因果推断研究小组 计量经济圈 2021-10-23


凡是搞计量经济的,都关注这个号了

箱:econometrics666@sina.cn

所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

上一日,计量经济圈发布了这个“Stata16版本可以下载使用了!!!”。由于某些客观因素限制,暂时不能直接对圈友开放,社群里放着的是Windows和Mac双系统的Stata 16版本软件安装包。


当然,Stata16版本的功能肯定是强大很多很多了,do文档允许自动填充变量,而且built-in的语句都高亮显示,Python, Markdown模式可自由切换。具体可看看这个:Stata16新增功能有哪些? 满满干货拿走不谢


今天,咱们引荐一种新方法,主要用来进行双聚类,比如同时在公司层面和时间层面对标准误进行聚类,从而解决组内相关性问题所导致的标准误偏差。中间这一层介绍可以忽略,直接到问后面的链接处下载这些ado程序,然后放到你的directory里运行cluster2, logit2, probit2, tobit2即可。

适当做一些扩展:

1.在什么级别上标准误聚类, 个体, 县, 省或行业, 时间?

2.啥时候使用聚类标准误, 以及数据聚类的修正方法?


Clustered Standard Errors – Two dimensions


The routines currently written into Stata allow you to cluster by only one variable (e.g. one dimension such as firm or time). 


Papers by Thompson (2006) and by Cameron, Gelbach and Miller (2006) suggest a way to account for multiple dimensions at the same time. This approach allows for correlations among different firms in the same year and different years in the same firm, for example. See their papers and mine for more details and caveats. 


I have written a Stata ado file to implement this estimation procedure. It runs a regression and calculates standard errors which account for two dimensions of within cluster correlation. 

The variables which record the two dimensions (e.g. a firm identifier and a time identifier) are specified in the required options: flcuster( ) and tcluster( ). 


There are also versions of the Stata ado file that estimates logit (logit2.ado), probit (probit2.ado), or tobit (tobit2.ado) models with clustering on two dimensions. 


The format is similar to the cluster2.ado command.

cluster2 dependent_variable independent_variables, fcluster(cluster_variable_one)  tcluster(cluster_variable_two)


If there are multiple observations per firm-year (e.g. loan data sets which have multiple loans per firm in a given year), then the method described in my paper needs to be modified. In this case, instead of subtracting off the White variance matrix, you need to subtract off the variance matrix clustered by firm-year (i.e. for correlation among observations with the same firm AND the same year -- see Cameron, Gelbach, and Miller (2006) for details). The program has been modified to automatically check for this condition and use the correct third matrix. The program is also now compatible with the outreg procedure.


The code for estimating clustered standard errors in two dimensions using R is available here.

下面这些短链接文章属于合集,可以收藏起来阅读,不然以后都找不到了。

2年,计量经济圈公众号近1000篇文章,

Econometrics Circle

数据系列:空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 |  夜间灯光 | 官员方言  | 微观数据 |

计量系列:匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节 | 时间序列 | RDD断点 | 合成控制 | 

数据处理:Stata | R | Python | 缺失值 | Stata16版本 | CHIP/ CHNS/CHARLS/CFPS/CGSS等 |


干货系列:能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI | SSCI查询 |

计量经济圈组织了一个计量社群,有如下特征:热情互助最多、前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。


: . Video Mini Program Like ,轻点两下取消赞 Wow ,轻点两下取消在看

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存