查看原文
其他

混合效应模型MEM

圈圈汇编 计量经济圈 2019-06-30



S1 定 

 

1 Mixed effects models

(1) Mixed-effects models are characterized as containing both fixed effects and random effects.

混合效应模型是指既包含固定效应又包括随机效应的模型。


(2) Mixed-effects models or, more simply, mixed models are statistical models that incorporate both fixed-effects parameters and random effects.

混合效应模型, 或者简单些, 称作混合模型, 就是既包含固定效应参数, 又包含随机效应的统计模型。


[The fixed effects are analogous to standard regression coefficients and are estimated directly. The random effects are not directly estimated (although they may be obtained postestimation) but are summarized according to their estimated variances and covariances. Random effects may take the form of either random intercepts or random coefficients, and the grouping structure of the data may consist of multiple levels of nested groups. As such, mixed-effects models are also known in the literature as multilevel models and hierarchical models. Mixed-effects commands fit mixed-effects models for a variety of distributions of the response conditional on normally distributed random effects. 固定效应类似于标准回归系数, 直接估计得到。随机效应不是直接估计(尽管它可能取自事后估计), 而是从它们的方差和协方差估计值中总结而来。随机效应以随机截距或者随机系数的形式呈现, 数据的组织结构可能包括嵌套分组的多重水平。这样, 在文献中, 混合效应模型还被称为多水平模型分层模型。用于拟合反应分布之变异的混合效应模型的混合效应命令以符合正态分布的随机效应为条件。]

 

2 Effects

Parameters associated with the particular levels of a covariate are sometimes called the “effects” of the levels.

有时候, 把与某个协变量的特定水平相联系的参数称为该水平的“效应”。


[If the set of possible levels of the covariate is fixed and reproducible we model the covariate using fixed-effects parameters. 如果由协变量可能的水平组成的集合固定不变且可重复, 我们使用固定效应参数建立关于协变量的模型。


If the levels that we observed represent a random sample from the set of all possible levels we incorporate random effects in the model. 如果我们观测的水平代表了所有可能水平中的一个随机样本, 我们就把随机效应包含在模型中。]

 

3 Bias: inaccuracy of estimation, specifically the expected difference between an estimate and the true value.

偏差:估计的不精确性, 特指估计值与真值之差。

 

4 Block random effects: effects that apply equally to all individuals within a group (experimental block, species, etc.), leading to a single level of correlation within groups.

区组随机效应:对同一组(试验小区, 物种, 等等)内的所有个体施加相等的效应, 它导致组内相关关系的一个单一水平。

 

5 Continuous random effects: effects that lead to between-group correlations that vary with distance in space, time or phylogenetic history.

连续随机效应:引起组与组之间相关关系的效应, 这种相关性在空间、时间或者系统发生史的距离上有变化。

 

6 Crossed random effects: multiple random effects that apply independently to an individual, such as temporal and spatial blocks in the same design, where temporal variability acts on all spatial blocks equally.

交叉随机效应:是一种多重随机效应, 它独立施加于某个体之上, 例如同一设计中的时间和空间区组, 时间变化会对所有的空间区组起相同的作用。

 

7 Exponential family: a family of statistical distributions including the normal, binomial, Poisson, exponential and gamma distributions.

指数族:包括正态、二项式、泊松、指数和伽玛分布在内的统计分布族。

 

8 Fixed effects: factors whose levels are experimentally determined or whose interest lies in the specific effects of each level, such as effects of covariates, differences among treatments and interactions.

固定效应:由试验方法决定水平的因子, 或者由兴趣点在于每个水平的特定效应决定的因子, 例如协变量的效应, 处理和交互作用之间的差异。

 

9 Generalized linear models (GLMs): statistical models that assume errors from the exponential family; predicted values are determined by discrete and continuous predictor variables and by the link function (e.g. logistic regression, Poisson regression) (not to be confused with PROC GLM in SAS, which estimates general linear models such as classical ANOVA.).

广义线性模型:假设误差来自指数族的统计模型;预测值由离散和连续预测变量以及连接方程(例如逻辑斯蒂回归, 泊松回归)(不要被SAS中的PROC GLM所迷惑, 它用于估计诸如经典的ANOVA之类的广义线性模型)所确定。

 

10 Individual random effects: effects that apply at the level of each individual (i.e. ‘blocks’ of size 1).

个体随机效应:施加于每个个体的水平的效应(例如大小为1的区组)

 

11 Linear mixed models (LMMs): statistical models that assume normally distributed errors and also include both fixed and random effects, such as ANOVA incorporating a random effect.

线性混合模型:假设误差服从正态分布, 且既包括固定效应又包括随机效应的统计模型, 例如包含随机效应的ANOVA。

 

12 Link function: a continuous function that defines the response of variables to predictors in a generalized linear model, such as logit and probit links. Applying the link function makes the expected value of the response linear and the expected variances homogeneous.

连接方程:广义线性模型中的连续方程, 由它定义变量对于预测因子的反应, 例如logit和probit连接(方程)。应用连接方程, 能使反应的期望值呈线性, 且期望方差齐性。

 

13 Nested models: models that are subsets of a more complex model, derived by setting one or more parameters of the more complex model to a particular value (often zero).

嵌套模型:模型是更复杂模型的子集, 通过设定更复杂模型的一个或者更多参数值为特定值(通常为零)而得。

 

14 Nested random effects: multiple random effects that are hierarchically structured, such as species within genus or subsites within sites within regions.

嵌套随机效应:分层次结构的多重随机效应, 例如属内的物种, 或者区域内的样地内的子样地。

 

15 Overdispersion: the occurrence of more variance in the data than predicted by a statistical model.

过离散:在数据中出现比用统计模型预测的更大的方差。

 

16 Pearson residuals: residuals from a model which can be used to detect outliers and nonhomogeneity of variance.

皮尔逊残差:模型残差, 可用于检查异常值和方差的非齐性。

 

17 Random effects: factors whose levels are sampled from a larger population, or whose interest lies in the variation among them rather than the specific effects of each level. The parameters of random effects are the standard deviations of variation at a particular level (e.g. among experimental blocks). The precise definitions of ‘fixed’ and ‘random’ are controversial; the status of particular variables depends on experimental design and context.

随机效应:因子的水平来自从较大总体中抽样, 或者感兴趣的是因子本身而不是特定的每个水平的效应, 这些因子就是随机效应。随机效应的参数是在特定水平上的变异的标准差(例如, 在试验区组之间)。“固定”和“随机”的精确定义是传统的;特定变量的状态依赖于试验设计和背景。

  

S2

1 Bayesian statisticsA statistical framework based on combining data with subjective prior information about parameter values in order to derive posterior probabilities of different models or parameter values.

贝叶斯统计学:一种以具有关于参数值的主观先验信息的融合数据为基础的, 以得到不同模型或者参数值的后验概率为目的的统计学体系。

 

2 Frequentist (sampling-based) statistics: a statistical framework based on computing the expected distributions of test statistics in repeated samples of the same system. Conclusions are based on the probabilities of observing extreme events.

频率论(基于抽样)的统计学:以计算同一系统重复样本统计检验的期望分布为基础的统计学方法体系。它基于极端事件的观测概率得出结论。

 

3 Information criteria and information-theoretic statistics: a statistical framework based on computing the expected relative distance of competing models from a hypothetical true model.

信息准则和信息论统计学:基于计算竞争模型的期望相对距离的一套统计学方法体系, 其竞争模型来自假定的真实模型。

 

4 Markov chain Monte Carlo (MCMC): a Bayesian statistical technique that samples parameters according to a stochastic algorithm that converges on the posterior probability distribution of the parameters, combining information from the likelihood and the posterior distributions.

马尔科夫链蒙特卡洛:一种贝叶斯统计技术, 按照随机算法抽样检验参数, 随机算法收敛于参数的后验概率分布, 融合了似然和后验分布的信息。

[In statistics, Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods) are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps. 在统计学中, MCMC方法(包括随机游走蒙特卡洛方法)是以建立“马尔科夫链”为基础得到概率分布, 然后从中抽样的一类算法, 马尔科夫链具有想要得到的分布, 该分布类似于它的平衡分布。经过大量的步骤后, 就把链的状态作为欲求分布的一个样本。样本质量的提高是步骤数量的函数。]

 

5 Maximum likelihood (ML): a statistical framework that finds the parameters of a model that maximizes the probability of the observed data (the likelihood). (See Restricted maximum likelihood.)

极大似然:寻求模型参数的一种统计学方法, 使观测数据(可能性)的概率最大。(参考有约束的极大似然)

 

6 Model selection: any approach to determining the best of a set of candidate statistical models. Information-theoretic tools such as AIC, which also allow model averaging, are generally preferred to older methods such as stepwise regression.

模型选择:从候选者中确定最佳模型的方法。例如信息论工具AIC, 还允许求模型均值, 而一般情况下使用的老方法是逐步回归。

 

7 Restricted maximum likelihood (REML): an alternative to ML that estimates the random-effect parameters (i.e. standard deviations) averaged over the values of the fixed-effect parameters; REML estimates of standard deviations are generally less biased than corresponding ML estimates.

限制极大似然法: 极大似然法替代方法, 用于估计固定效应参数(也就是标准差), 求平均值over固定效应参数;通常情况下, 与对应的极大似然估计相比, REML估计的标准差的偏差更小。

 

[Restricted maximum likelihood estimation (REML) aims to maximize likelihood over a restricted parameter space. In a general linear model with multivariate normal error distribution, REML leads to unbiased estimators. Transformation of the data enables the log- likelihood to be split so that variances are estimated from error contrasts. There is a connection with Bayesian methods. The method has been applied to a random-coefficients model for longitudinal data, and to situations where the parameters satisfy order restrictions. 限制极大似然估计, 目的是令likelihood在一个受约束的参数空间上最大化。在具有多元正态误差分布的广义线性模型中, 由REML可以导出无偏估计量。数据变换可以使log-likelihood分开, 以便从误差的对照中估计方差。该方法与贝叶斯方法有联系, 已经用于面向纵向数据的随机参数模型, 和符合顺序约束的参数。]

 

In statistics, the restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect. [ 1] 统计学中, 限制(或者剩余, 或者简化)极大似然法是极大似然估计的一个特殊形式, 它不以全部信息的极大似然拟合为基础, 取而代之的是使用通过数据转换计算得到的似然函数, 使不想要的参数没有效应.

 

In the case of variance component estimation, the original data set is replaced by a set of contrasts calculated from the data, and the likelihood function is calculated from the probability distribution of these contrasts, according to the model for the complete data set. In particular, REML is used as a method for fitting linear mixed models. In contrast to the earlier maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters [2]. 在方差分量估计时, 从原始数据中计算得到对照, 用对照取代原始数据, 并从对照的概率分布中计算得到似然函数. REML作为一种拟合线性混合模型的方法, 用的较多. 与早期的极大似然估计相比, REML 可以得到方差和协方差的无偏估计.

 

The idea underlying REML estimation was put forward by M. S. Bartlett in 1937.[1][3] The first description of the approach applied to estimating components of variance in unbalanced data was by Desmond Patterson and Robin Thompson[1][4] of the University of Edinburgh, although they did not use the term REML. A review of the early literature was given by Harville.[5] REML估计的思路由M. S. Bartlett(1937)提出. 爱丁堡大学的Desmond Patterson and Robin Thompson[1][4]首先描述了这种方法在估计不平衡数据的方差分量时的应用, 尽管没有使用REML这一术语. 早期的综述文献见Harville.[5]

 

REML estimation is available in a number of general-purpose statistical software packages, including Genstat (the REML directive), SAS (the MIXED procedure), SPSS (the MIXED command), Stata (the xtmixed command), and R (the lme4 and older nlme packages), as well as in more specialist packages such as MLwiN, HLM, ASReml, Statistical Parametric Mapping and CropStat. REML估计在大量的多用途统计软件包中使用, 包括Genstat, SAS, SPSS, Stata, R; 在一些专用软件包中也有它的身影, MLwiN, HLM, ASReml, …

 

References

[1] Dodge, Yadolah (2006). The Oxford Dictionary of Statistical Terms. Oxford [Oxfordshire]: Oxford University Press. ISBN 0-19-920613-9. (see REML)

[2] Baker, Bob. Estimating variances and covariances (broken, original link) available at the Wayback Machine

[3] Bartlett, M. S. (1937). "Properties of Sufficiency and Statistical Tests". Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 160 (901): 268. doi:10.1098/rspa.1937.0109. edit

[4] H. D.; Thompson, R. (1971). "Recovery of inter-block information when block sizes are unequal". Biometrika 58 (3): 545. doi:10.1093/biomet/58.3.545. edit

[5] Harville, D. A. (1977). "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems". Journal of the American Statistical Association 72 (358): 320–338. doi:10.2307/2286796. edit]

 

 

 

*****************************  R Language  ****************************

>library(lme4)

Loading required package: Matrix
Loading required package: lattice

.......

 

lme4 is a package developed by Douglas Bates and Martin Maechler for fitting linear and
generalized linear mixed-effect models. For more details please GOOGL it.

 

 

**********************  SAS NLMIXED procedure  ****************************

Introduction

The NLMIXED procedure fits nonlinear mixed models, that is, models in which both fixed and random effects enter nonlinearly. These models have a wide variety of applications, two of the most common being pharmacokinetics and overdispersed binomial data. PROC NLMIXED enables you to specify a conditional distribution for your data (given the random effects) having either a standard form (normal, binomial, Poisson) or a general distribution that you code using SAS programming statements.

NLMIXED程序拟合非线性混合模型, 就是固定和随机效应以非线性形式进入的模型. 这些模型用途广泛, 最常见的是药物动力学和过离散的 binomial data (译作"二进制数据","二元数据",只能取值0或者1的数据). 你可以用PROC NLMIXED 为数据(考虑随机效应)指定一个条件分布, 可以是标准的正态分布二项式泊松分布形式, 也可以是用SAS编程语言编写的代码.


PROC NLMIXED fits nonlinear mixed models by maximizing an approximation to the likelihood integrated over the random effects. Different integral approximations are available, the principal ones being adaptive Gaussian quadrature and a first-order Taylor series approximation. A variety of alternative optimization techniques are available to carry out the maximization; the default is a dual quasi-Newton algorithm.

PROC NLMIXED 通过对随机效应似然积分近似值的最大化拟合非线性混合模型. 可用的积分近似方法各不相同, 最主要的是“适应高斯求积”和“一阶泰勒级数近似”. 有多种用于执行求极大值的最优化技术; “dual quasi-Newton”是默认算法.


Successful convergence of the optimization problem results in parameter estimates along with their approximate standard errors based on the second derivative matrix of the likelihood function. PROC NLMIXED enables you to use the estimated model to construct predictions of arbitrary functions using empirical Bayes estimates of the random effects. You can also estimate arbitrary functions of the nonrandom parameters, and PROC NLMIXED computes their approximate standard errors using the delta method.

最优化问题成功收敛就可以得到参数估计及其近似标准误差结果, 运算基础是似然函数的二阶导数矩阵. 有了PROC NLMIXED , 使用随机效应的经验贝叶斯估计, 你可以通过估计得到的模型建立任意预测方程.同时, 你可以估计非随机参数的任意方程, PROC NLMIXED用delta方法计算它们近似的标准误差.


关注微信公众号:charitydove(计量经济学圈)


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存