#LyX 1.6.8 created this file. For more info see http://www.lyx.org/ \lyxformat 345 \begin_document \begin_header \textclass literate-article \begin_preamble \usepackage{ae} \renewcommand{\rmdefault}{ppl} \renewcommand{\sfdefault}{aess} \renewcommand{\ttdefault}{aett} \setlength{\parskip}{\smallskipamount} \end_preamble \use_default_options true \language english \inputencoding auto \font_roman default \font_sans default \font_typewriter default \font_default_family default \font_sc false \font_osf false \font_sf_scale 100 \font_tt_scale 100 \graphics default \paperfontsize default \spacing single \use_hyperref true \pdf_title "A Demo of LyX and pgfSweave" \pdf_author "Yihui Xie" \pdf_subject "A Demo of LyX and pgfSweave" \pdf_keywords "LyX, pgfSweave, Sweave, R language, LaTeX" \pdf_bookmarks true \pdf_bookmarksnumbered true \pdf_bookmarksopen true \pdf_bookmarksopenlevel 1 \pdf_breaklinks false \pdf_pdfborder false \pdf_colorlinks false \pdf_backref false \pdf_pdfusetitle true \papersize default \use_geometry true \use_amsmath 1 \use_esint 1 \cite_engine natbib_authoryear \use_bibtopic false \paperorientation portrait \leftmargin 2.5cm \topmargin 3cm \rightmargin 2.5cm \bottommargin 3cm \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip smallskip \quotes_language english \papercolumns 1 \papersides 1 \paperpagestyle default \tracking_changes false \output_changes false \author "" \author "" \end_header \begin_body \begin_layout Title A Demo of LyX and pgfSweave \end_layout \begin_layout Author Yihui Xie \end_layout \begin_layout Standard \begin_inset Flex Sweave Options status open \begin_layout Plain Layout prefix.string=Rfig,echo=FALSE,results=hide,cache=FALSE,pdf=FALSE,eps=FALSE,tidy=T RUE,external=TRUE \end_layout \end_inset \end_layout \begin_layout Scrap <>= \end_layout \begin_layout Scrap options(width=65, digits=3, keep.blank.line=FALSE, SweaveHooks=list(fig=function() { \end_layout \begin_layout Scrap par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3) \end_layout \begin_layout Scrap })) \end_layout \begin_layout Scrap @ \end_layout \begin_layout Scrap \end_layout \begin_layout Abstract LaTeX is, to some degree, the \emph on de facto \emph default standard for the typesetting especially in academic publications, and R \begin_inset CommandInset citation LatexCommand citep key "R-lang" \end_inset , according to its own introduction, is the \begin_inset Quotes eld \end_inset lingua franca \begin_inset Quotes erd \end_inset of data analysis and statistical computing. Further, Sweave has been regarded as a typical tool in the reproducible research. Despite the shinning merits of all these tools that other experts told us, we often find ourselves in the position of the little boy doubting if the king really had clothes, i.e. do we really benefit from using these tools in our daily work without much price to pay? The answer is \begin_inset Quotes eld \end_inset No \begin_inset Quotes erd \end_inset at least to me. I find typing raw LaTeX code a rather tedious job, and Sweave not an ideal tool for creating beautiful PDF documents (e.g. difficult to control the size of figures). LyX was first written in 1995, and it provides a GUI front-end to LaTeX so that it is extremely easy to write LaTeX documents (almost WYSIWYG). It takes care of all the trivial details automatically with which we may easily get stuck if we deal with raw LaTeX code, for example, the special characters, huge math formulae and page margins, etc. Besides, it has great flexibility to be extended to incorporate with different document classes (beamer slides, journals, ...) and interact with external programs such as R. This makes it possible to write Sweave documents in LyX, and to generate a dynamic reproducible report requires only a single click. The only price to pay here is the configuration of all the tools, which may be an obstacle to ordinary users. We will demonstrate the advantages of the pgfSweave package \begin_inset CommandInset citation LatexCommand citep key "pgfSweave-pkg" \end_inset over the default Sweave through examples, e.g. the high quality of graphics. I have been using LyX+Sweave/pgfSweave for doing the homework and projects in several courses such as Stat 500, 544, 557 and 601, and in this talk I will share some experience from a student's perspective. Hopefully these tools can find more general use in our daily study and research. \end_layout \begin_layout Standard I just want to demonstrate the capabilities of LyX and Sweave in this \begin_inset Quotes eld \end_inset article \begin_inset Quotes erd \end_inset , so the contents below might be nonsense. It is actually my Stat 601 homework 2. The data comes from Dr Kaiser's course website. \end_layout \begin_layout Section Introduction \end_layout \begin_layout Standard The aim of this assignment is to estimate and compare two Gamma models for two groups of data respectively. We assume the observations in the same group are i.i.d, and they are independent between groups as well. The kernel density curves of the two groups are given in Figure \begin_inset CommandInset ref LatexCommand ref reference "fig:data-density" \end_inset , and the five-number summaries are as follows: \end_layout \begin_layout Scrap <>= \end_layout \begin_layout Scrap ## descriptive stats \end_layout \begin_layout Scrap dat=read.table('http://streaming.stat.iastate.edu/~stat601/Data/gammadat.txt',col.name s=c('g1','g2')) \end_layout \begin_layout Scrap print(summary(dat)) \end_layout \begin_layout Scrap @ \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout ScrapCenter <>= \end_layout \begin_layout ScrapCenter if (!require("ggplot2")) install.packages("ggplot2") \end_layout \begin_layout ScrapCenter library(ggplot2) \end_layout \begin_layout ScrapCenter theme_set(theme_bw()) \end_layout \begin_layout ScrapCenter meltdat=melt(dat) \end_layout \begin_layout ScrapCenter colnames(meltdat)=c('group','observation') \end_layout \begin_layout ScrapCenter print(qplot(observation,data=meltdat,geom='density',fill=group,alpha=I(.5))+scale _fill_grey() ) \end_layout \begin_layout ScrapCenter @ \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Kernal density curves for two groups: they do not differ too much from each other, except in terms of the kurtosis. \begin_inset CommandInset label LatexCommand label name "fig:data-density" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Formulation \end_layout \begin_layout Standard To formulate this problem, let \begin_inset Formula $Y_{ij}$ \end_inset denote the random variables for group \begin_inset Formula $i=1,2$ \end_inset and observation \begin_inset Formula $j=1,2,\cdots,n_{i}$ \end_inset . Suppose \begin_inset Formula $Y_{ij}\overset{i.i.d}{\sim}\text{Gamma}(\alpha_{i},\beta_{i})$ \end_inset , i.e. the probability density function for group \begin_inset Formula $i$ \end_inset is: \begin_inset Formula \begin{equation} f(y|\alpha_{i},\beta_{i})=\frac{\beta_{i}^{\alpha_{i}}}{\Gamma(\alpha_{i})}y^{\alpha_{i}-1}\exp(-\beta_{i}y);\quad y>0\end{equation} \end_inset so that \begin_inset Formula $E(Y_{ij})=\alpha_{i}/\beta_{i}$ \end_inset and \begin_inset Formula $Var(Y_{ij})=\alpha_{i}/\beta_{i}^{2}$ \end_inset . From this expression, the log-likelihood for observations from group \begin_inset Formula $i$ \end_inset is: \begin_inset Formula \begin{equation} L(\alpha_{i},\beta_{i})=n(\alpha_{i}\log(\beta_{i})-\log(\Gamma(\alpha_{i})))+(\alpha_{i}-1)\sum_{j=1}^{n}\log(y_{j})-\beta_{i}\sum_{j=1}^{n}y_{i}\label{eq:log-likelihood}\end{equation} \end_inset We can get the maximum likelihood estimates (MLE's) by the R function \emph on optim() \emph default with the log-likelihood as its objective function, and calculate the confidence intervals based on the Wald theory, using the observed information matrix (set \family typewriter hessian = TRUE \family default in \emph on optim() \emph default ) to provide variance-covariance matrices for the limiting Normal distribution. In particular, if \begin_inset Formula \[ I^{-1}(\hat{\alpha},\hat{\beta})=\left(\begin{array}{cc} \sigma_{\alpha,\alpha} & \sigma_{\alpha,\beta}\\ \sigma_{\alpha,\beta} & \sigma_{\beta,\beta}\end{array}\right)\] \end_inset then the \begin_inset Formula $(1-\alpha)100\%$ \end_inset confidence intervals are computed as \begin_inset Formula \begin{eqnarray*} \hat{\alpha} & \pm & z_{1-\alpha/2}\sqrt{\sigma_{\alpha,\alpha}}\\ \hat{\beta} & \pm & z_{1-\alpha/2}\sqrt{\sigma_{\beta,\beta}}\end{eqnarray*} \end_inset \end_layout \begin_layout Section Computation \end_layout \begin_layout Standard Table \begin_inset CommandInset ref LatexCommand ref reference "tab:mle-ci-loglike" \end_inset shows the MLE's, CI's and log-likelihood values for two groups as well as the combined data (i.e. the reduced model). \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout MLE's for parameters in two groups as well as the reduced model, including the confidence intervals and log-likelihood values \begin_inset CommandInset label LatexCommand label name "tab:mle-ci-loglike" \end_inset \end_layout \end_inset \end_layout \begin_layout ScrapCenter <>= \end_layout \begin_layout ScrapCenter ## given a parameter vector and data, return a log-likelihood value \end_layout \begin_layout ScrapCenter loglike=function(x, data){ \end_layout \begin_layout ScrapCenter sum(dgamma(data, shape=x[1], rate=x[2], log=TRUE)) \end_layout \begin_layout ScrapCenter } \end_layout \begin_layout ScrapCenter ## maximize loglikelihood \end_layout \begin_layout ScrapCenter g1=optim(c(1,1), loglike, data=dat[,1],control=list(fnscale=-1),hessian=TRUE) \end_layout \begin_layout ScrapCenter g2=optim(c(1,1), loglike, data=dat[,2],control=list(fnscale=-1),hessian=TRUE) \end_layout \begin_layout ScrapCenter g=optim(c(1,1), loglike, data=c(dat[,1],dat[,2]),control=list(fnscale=-1),hessia n=TRUE) \end_layout \begin_layout ScrapCenter wald.int=function(x0,hessian,level=.95){ \end_layout \begin_layout ScrapCenter sd.x=sqrt(diag(solve(-hessian))) \end_layout \begin_layout ScrapCenter z=qnorm(1-(1-level)/2) \end_layout \begin_layout ScrapCenter as.vector(t(x0+z*sd.x%*%t(c(0,-1,1)))) \end_layout \begin_layout ScrapCenter } \end_layout \begin_layout ScrapCenter ## MLE's and Wald CI's \end_layout \begin_layout ScrapCenter est=data.frame('MLE'=rbind(wald.int(g1$par,g1$hessian),wald.int(g2$par,g2$hessian), wald.int(g$par,g$hessian)),'log-likelihood'=c(g1$value,g2$value,g$value)) \end_layout \begin_layout ScrapCenter colnames(est)[-7]=c('$ \backslash \backslash alpha$','lower.CI','upper.CI','$ \backslash \backslash beta$','lower.CI','upper.CI') \end_layout \begin_layout ScrapCenter rownames(est)=c('Group 1','Group 2','Combined data') \end_layout \begin_layout ScrapCenter if (!require("xtable")) install.packages("xtable") \end_layout \begin_layout ScrapCenter library(xtable) \end_layout \begin_layout ScrapCenter print(xtable(est,digits=3),floating=FALSE,sanitize.colnames.function=I) \end_layout \begin_layout ScrapCenter @ \end_layout \end_inset \end_layout \begin_layout Standard There are no problems in the numerical optimizations -- the convergence code is 0 for all optimizations, which indicates successful completions of these optimizations. We have tried several different initial values for the parameters and the results remained almost the same, so we believe the estimates should be reliable at least numerically. \end_layout \begin_layout Standard Since our interest is whether the two groups are significantly different, we may as well perform a likelihood ratio test (LRT). The test statistic is \begin_inset Formula $-2(L(\alpha_{1},\beta_{1})+L(\alpha_{2},\beta_{2})-L(\alpha,\beta))$ \end_inset which follows the \begin_inset Formula $\chi_{2}^{2}$ \end_inset distribution asymptotically. The p-value of the test is \begin_inset Flex S/R expression status collapsed \begin_layout Plain Layout round(1-pchisq(2*(g1$value+g2$value-g$value), 2),3) \end_layout \end_inset , and this indicates a very weak evidence of difference between two groups. Therefore, we will treat the two groups as if there were from the same underlying distribution hereinafter, and the corresponding parameter estimates are given in the 3rd row of Table \begin_inset CommandInset ref LatexCommand ref reference "tab:mle-ci-loglike" \end_inset . \end_layout \begin_layout Standard Figure \begin_inset CommandInset ref LatexCommand ref reference "fig:common-density" \end_inset shows the common density function of the two groups, which is actually very similar to the two curves in Figure \begin_inset CommandInset ref LatexCommand ref reference "fig:data-density" \end_inset . Again, this tells us by intuition that there is no significant difference between the two groups. \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout ScrapCenter <>= \end_layout \begin_layout ScrapCenter curve(dgamma(x,shape=est[3,1],rate=est[3,4]), from=min(dat),to=max(dat),ylab='Ga mma density') \end_layout \begin_layout ScrapCenter @ \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout The estimated density function for the common Gamma distribution. \begin_inset CommandInset label LatexCommand label name "fig:common-density" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard As mentioned before, we can denote the expected value by \begin_inset Formula $\mu=\alpha/\beta$ \end_inset , and it is not difficult to get the estimated variance of \begin_inset Formula $\hat{\mu}$ \end_inset by the delta method: \begin_inset Formula \begin{eqnarray*} D & = & \left.\left(\frac{\partial}{\partial\alpha}\mu,\frac{\partial}{\partial\beta}\mu\right)\right|_{\alpha=\hat{\alpha},\beta=\hat{\beta}}\\ & = & \left.\left(\frac{1}{\beta},-\frac{\alpha}{\beta^{2}}\right)\right|_{\alpha=\hat{\alpha},\beta=\hat{\beta}}\\ \hat{Var}(\hat{\mu}) & = & DI^{-1}(\hat{\alpha},\hat{\beta})D'\end{eqnarray*} \end_inset and we know from independence that the estimated variance for the (estimated) common expected value is \begin_inset Formula $\hat{Var}(\hat{\mu}_{\text{common}})=\hat{Var}(\hat{\mu}_{1})+\hat{Var}(\hat{\mu}_{2})$ \end_inset ; the right-hand-side is known from previous computations. Then we can compute the 95% confidence intervals for the common expected value as well as the difference in two group means, as shown in Table \begin_inset CommandInset ref LatexCommand ref reference "tab:ci-common" \end_inset . We see the 95% CI for the difference of two means covers 0, thus we again have weak evidence of significant difference between two group means. \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout 95% Wald intervals for the common expected value as well as the difference in two group means, based on the delta method. \begin_inset CommandInset label LatexCommand label name "tab:ci-common" \end_inset \end_layout \end_inset \end_layout \begin_layout ScrapCenter <>= \end_layout \begin_layout ScrapCenter Dg=c(1/est[3,4],-est[3,1]/est[3,4]^2) \end_layout \begin_layout ScrapCenter sd.common=sqrt(Dg%*%solve(-g$hessian)%*%Dg) \end_layout \begin_layout ScrapCenter \end_layout \begin_layout ScrapCenter Dg1=c(1/est[1,4],-est[1,1]/est[1,4]^2) \end_layout \begin_layout ScrapCenter var.1=Dg1%*%solve(-g1$hessian)%*%Dg1 \end_layout \begin_layout ScrapCenter Dg2=c(1/est[2,4],-est[2,1]/est[2,4]^2) \end_layout \begin_layout ScrapCenter var.2=Dg2%*%solve(-g2$hessian)%*%Dg2 \end_layout \begin_layout ScrapCenter \end_layout \begin_layout ScrapCenter CI.common=rbind(est[3,1]/est[3,4]+qnorm(.975)*c(-1,1)*sd.common, \end_layout \begin_layout ScrapCenter (est[1,1]/est[1,4]-est[2,1]/est[2,4])+qnorm(.975)*c(-1,1)*sqrt(var.1+var.2)) \end_layout \begin_layout ScrapCenter dimnames(CI.common)=list(c('Common expected value','Difference of two means'), c('lower.CI','upper.CI')) \end_layout \begin_layout ScrapCenter print(xtable(CI.common,digits=3),floating=FALSE,sanitize.colnames.function=I) \end_layout \begin_layout ScrapCenter @ \end_layout \end_inset \end_layout \begin_layout Standard Finally, we may also try other methods to seek more evidence of our findings. An easy alternative is the t-test for the difference of means. The p-value based on the original data is \begin_inset Flex S/R expression status collapsed \begin_layout Plain Layout round(t.test(observation~group,data=meltdat)$p.value,3) \end_layout \end_inset , and it becomes \begin_inset Flex S/R expression status collapsed \begin_layout Plain Layout round(t.test(sqrt(observation)~group,data=meltdat)$p.value,3) \end_layout \end_inset when we take a square-root transformation. In both cases, we have extremely weak evidence of difference in two group means. \end_layout \begin_layout Section Conclusions \end_layout \begin_layout Standard So far we have examined the distributions for the two groups of data, and we may conclude, based on the previous computations, that a common distribution for both groups is appropriate. The likelihood ratio test and two-sample t-test have reached the same conclusio n that there is almost no evidence of difference in two group means. \end_layout \begin_layout Standard \begin_inset CommandInset bibtex LatexCommand bibtex bibfiles "LyX-pgfSweave-demo-Yihui-Xie" options "jss" \end_inset \end_layout \end_body \end_document