查看原文
其他

用Stata绘制25种从简到难的数据图, 提供完整代码和绘图结果!

计量经济圈 计量经济圈 2022-12-13

凡是搞计量经济的,都关注这个号了
稿件:econometrics666@126.com

所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

作为一名经济学家/数据分析师的巴铁好友Fahad Mirza,在世界银行和巴基斯坦经济研究中心 (CERP) 担任顾问期间,长期沉溺于Stata软件可视化操作。这不,他最近就专门整理了一份“排名前 25 位的 Stata数据图——包含完整代码”。下面,就分享一下各种图形的绘制代码和呈现结果。

国外教授整理了高效使用Stata的48个小技巧, 制作动图的代码爱了!

*代码可以被直接复制到Stata中运行,当然社群群友可以直接下载这边整理好的完整版Stata code。

正文

* 在开始之前,请安装以下软件包:

 ssc install schemepack, replace

 ssc install colrspace, replace

 ssc install palettes, replace

 ssc install labutil, replace


* Correlation Coef with CI (Nick Cox - corrci) 

net describe pr0041_4, from(http://www.stata-journal.com/software/sj21-3)

net install pr0041_4 

 ssc install violinplot, replace

 ssc install dstat, replace

 ssc install moremata, replace


 ***Correlation plot
   sysuse auto, clear
  twoway  (scatter price mpg, mcolor(%60) mlwidth(0)) (lowess price mpg), ///
    title("{bf}Scatterplot", pos(11) size(2.75)) ///
    subtitle("Price Vs. MPG", pos(11) size(2.5)) ///
    legend(off) ///
    scheme(white_tableau)

**correlation plot by group
 sysuse auto, clear
  levelsof foreign, local(foreign)
  foreach category of local foreign {
     local  scatter `scatter' scatter price mpg if foreign == `category', ///
     mcolor(%60) mlwidth(0) ||
  }
  
  twoway  `scatter' (lowess price mpg), ///
    title("{bf}Scatterplot", pos(11) size(2.75)) ///
    subtitle("Price Vs. MPG", pos(11) size(2.5)) ///
    legend(order(1 "Domestic" 2 "Foreign") size(2)) ///
    scheme(white_tableau)


**Jitter plot
 import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  
twoway  (scatter hwy cty, mcolor(%60) mlwidth(0)) (lfit hwy cty), ///
    title("{bf}Scatterplot with overlapping points", pos(11) size(2.75)) ///
    subtitle("mpg: City vs Highway mileage", pos(11) size(2.5)) ///
    legend(off) ///
    scheme(white_tableau)

import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear  
twoway  (scatter hwy cty, jitter(5) mcolor(%60) mlwidth(0)) (lfit hwy cty), ///
    title("{bf}Jittered points", pos(11) size(2.75)) ///
    subtitle("mpg: City vs Highway mileage", pos(11) size(2.5)) ///
    legend(off) ///
    scheme(white_tableau)


**Counts Chart
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  egen total = group(cty hwy)
  bysort total: egen count = count(total)
  twoway  (scatter hwy cty [aw = count], mcolor(%60) mlwidth(0) msize(1)) (lfit hwy cty), ///
    title("{bf}Counts plot", pos(11) size(2.75)) ///
    subtitle("mpg: City vs Highway mileage", pos(11) size(2.5)) ///
    legend(off) ///
    scheme(white_tableau)  
**Bubble Chart
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  keep if inlist(manufacturer, "audi", "ford", "honda", "hyundai")
  recode hwy  (15 16 17 18 19 = 1) (20 21 22 23 24 = 4) (25 26 27 28 29 = 8) ///
     (30 31 32 33 34 = 16) (35 36 = 32), gen(weight)
  levelsof manufacturer, local(options)
  local wordcount : word count `options'
  local i = 1
  foreach option of local options {
   colorpalette tableau, n(`wordcount') nograph
   local  scatter `scatter' scatter cty displ [fw = weight] if manufacturer == "`option'", ///
     mcolor("`r(p`i')'%60") mlwidth(0) jitter(10) ||
   local  line `line' lfit cty displ if manufacturer == "`option'", lcolor("`r(p`i')'") ||
   local ++i
  }
  twoway  `scatter' `line', ///
    title("{bf}Bubble Chart", pos(11) size(2.75)) ///
    subtitle("mpg: Displacement vs City mileage", pos(11) size(2.5)) ///
    ytitle("City Mileage", size(2)) ///
    legend(order(3 "Honda" 4 "Hyundai" 1 "Audi" 2 "Ford" ) size(2)) ///
    scheme(white_tableau)

**Marginal Histogram
  import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  egen total = group(cty hwy)
  bysort total: egen count = count(total)
  * Using loop to write and store the plotting commands and syntax by class
  twoway  (scatter hwy cty [aw = count], mcolor(%60) mlwidth(0) msize(1) legend(off)) ///
    (lfit hwy cty), legend(off) name(main, replace) ytitle("Highway MPG") xtitle("City MPG") ///
    graphregion(margin(t=-5))
  twoway  (histogram cty, yscale(off) xscale(off) ylabel(, nogrid) xlabel(, nogrid) bin(30)), name(cty_hist, replace) graphregion(margin(l=16)) fysize(15)
  twoway  (histogram hwy, horizontal yscale(off) xscale(off) ylabel(, nogrid) xlabel(, nogrid) bin(30)), name(hwy_hist, replace) graphregion(margin(b=15 t=-5)) fxsize(20)
  graph  combine cty_hist main hwy_hist, hole(2) commonscheme scheme(white_tableau) ///
    title("{bf}Marginal Histogram - Scatter Count plot", size(2.75) pos(11)) subtitle("mpg: Highway vs. City Mileage", size(2.5) pos(11))


**Marginal Boxplot
* Load Dataset 
  import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  egen total = group(cty hwy)
  bysort total: egen count = count(total)
  * Using loop to write and store the plotting commands and syntax by class
  twoway  (scatter hwy cty [aw = count], mcolor(%60) mlwidth(0) msize(1) legend(off)) ///
    (lfit hwy cty), legend(off) name(main, replace) ytitle("Highway MPG") xtitle("City MPG") ///
    graphregion(margin(t=-5))
  local i = 1
  local j = 10
  foreach var of varlist hwy cty {
    sort `var', stable
    quietly summarize `var', detail
    local mean_`var' = `r(mean)'
    local med_p_`var' = `r(p50)'
    local p75_`var' = `r(p75)'
    local p25_`var' = `r(p25)'
    local iqr_`var' = `p75_`var'' - `p25_`var''
    generate `var'uq = `var' if `var' <= `=`p75_`var''+(1.5*`iqr_`var'')'
    generate `var'lq = `var' if `var' >= `=`p25_`var''-(1.5*`iqr_`var'')'
    quietly summarize `var'uq
    local max_`var'uq = `r(max)'
    quietly summ `var'lq
    local min_`var'lq = `r(min)' 
    if `i' == 1 {
    colorpalette tableau, nograph
    local  lines`i' ///
      (scatteri `p75_`var'' 1 `max_`var'uq' 1, recast(line) lpattern(solid) lcolor("`r(p`j')'") lwidth(1)) || ///
      (scatteri `p25_`var'' 1 `min_`var'lq' 1, recast(line) lpattern(solid) lcolor("`r(p`j')'") lwidth(1)) || ///
      (scatteri `med_p_`var'' 1, ms(square) mcolor(background) msize(2)) || ///
      (scatteri `med_p_`var'' 1, ms(square) mcolor("`r(p`j')'")) || 
    }
    else {
    colorpalette tableau, nograph
    local  lines`i' ///
      (scatteri 1 `p75_`var'' 1 `max_`var'uq', recast(line) lpattern(solid) lcolor("`r(p`j')'") lwidth(1)) || ///
      (scatteri 1 `p25_`var'' 1 `min_`var'lq', recast(line) lpattern(solid) lcolor("`r(p`j')'") lwidth(1)) || ///
      (scatteri 1 `med_p_`var'', ms(square) mcolor(background) msize(2)) || ///
      (scatteri 1 `med_p_`var'', ms(square) mcolor("`r(p`j')'")) || 
    }
    
    drop *lq *uq
    local ++i
    local j = `j' + 4
  }
  twoway `lines1', legend(off) xlabel(, nogrid) ylabel(, nogrid) yscale(off) xscale(off) name(hwy_box, replace) graphregion(margin(b=15 t=-5)) fxsize(5)
  twoway `lines2', legend(off) xlabel(, nogrid) ylabel(, nogrid) yscale(off) xscale(off) name(cty_box, replace) graphregion(margin(l=16)) fysize(5)
  graph  combine cty_box main hwy_box, hole(2) commonscheme ycommon xcommon scheme(white_tableau) ///
    title("{bf}Marginal Box Plot - Scatter Count plot", size(2.75) pos(11)) subtitle("mpg: Highway vs. City Mileage", size(2.5) pos(11))


**Correlogram
* Load Dataset
  sysuse auto, clear 
  * Only change names of variable in local var_corr. 
  * The code will hopefully do the rest of the work without any hitch
  local var_corr price mpg trunk weight length turn foreign
  local countn : word count `var_corr'
  * Use correlation command
  quietly correlate `var_corr'
  matrix C = r(C)
  local rnames : rownames C
  * Now to generate a dataset from the Correlation Matrix
  clear
   * For no diagonal and total count
   local tot_rows : display `countn' * `countn'
   set obs `tot_rows'
   generate corrname1 = ""
   generate corrname2 = ""
   generate y = .
   generate x = .
   generate corr = .
   generate abs_corr = .
   local row = 1
   local y = 1
   local rowname = 2
   foreach name of local var_corr {
    forvalues i = `rowname'/`countn' { 
     local a : word `i' of `var_corr'
     replace corrname1 = "`name'" in `row'
     replace corrname2 = "`a'" in `row'
     replace y = `y' in `row'
     replace x = `i' in `row'
     replace corr = round(C[`i',`y'], .01) in `row'
     replace abs_corr = abs(C[`i',`y']) in `row'
     local ++row
    }
    local rowname = `rowname' + 1
    local y = `y' + 1
   }
  drop if missing(corrname1)
  replace abs_corr = 0.1 if abs_corr < 0.1 & abs_corr > 0.04
  colorpalette HCL pinkgreen, n(10) nograph intensity(0.65)
  *colorpalette CET CBD1, n(10) nograph //Color Blind Friendly option
  generate colorname = ""
  local col = 1
  forvalues colrange = -1(0.2)0.8 {
   replace colorname = "`r(p`col')'" if corr >= `colrange' & corr < `=`colrange' + 0.2'
   replace colorname = "`r(p10)'" if corr == 1
   local ++col
  } 
  * Plotting
  * Saving the plotting code in a local 
  forvalues i = 1/`=_N' {
   local slist "`slist' (scatteri `=y[`i']' `=x[`i']' "`: display %3.2f corr[`i']'", mlabposition(0) msize(`=abs_corr[`i']*15') mcolor("`=colorname[`i']'"))"
  }
  * Gather Y axis labels
  labmask y, val(corrname1)
  labmask x, val(corrname2)
  levelsof y, local(yl)
  foreach l of local yl {
   local ylab "`ylab' `l'  `" "`:lab (y) `l''" "'" 
  } 
  * Gather X Axis labels
  levelsof x, local(xl)
  foreach l of local xl {
   local xlab "`xlab' `l'  `" "`:lab (x) `l''" "'" 
  }  
  * Plot all the above saved lolcas
  twoway `slist', title("Correlogram of Auto Dataset Cars", size(3) pos(11)) ///
    note("Dataset Used: Sysuse Auto", size(2) margin(t=5)) ///
    xlabel(`xlab', labsize(2.5)) ylabel(`ylab', labsize(2.5)) ///
    xscale(range(1.75 )) yscale(range(0.75 )) ///
    ytitle("") xtitle("") ///
    legend(off) ///
    aspect(1) ///
    scheme(white_tableau)


**Diverging Bars
  sysuse auto, clear
  * Standardizing variable
  egen double mpg_z = std(mpg)
  * Generating indicator of below and above
  generate above = (mpg_z >= 0)
  * Sorting the mpg_z and assigning rank
  sort mpg_z, stable
  generate rank_des = _n * 2
  * Assigning label
  labmask rank_des, value(make)
  colorpalette tableau, nograph intensity(0.8)
  twoway  (bar mpg_z rank_des if above == 1, horizontal lwidth(0) barwidth(1.5) bcolor("`r(p3)'")) ///
    (bar mpg_z rank_des if above == 0, horizontal lwidth(0) barwidth(1.5) bcolor("`r(p4)'")), ///
    ytitle("") xtitle("") ///
    ylabel(2(2)148, valuelabel labsize(1.25) nogrid) xlabel(-4(1)4, nogrid) ///
    xscale(range(-4 4)) ///
    legend(off) ///
    title("{bf}Diverging Bars (Normalized MPG)", size(2.75) pos(11)) ///
    scheme(white_tableau)


**Diverging Lollipop Graph
  sysuse auto, clear
  keep in 1/20
  * Standardizing variable
  egen double mpg_z = std(mpg)
  * Sorting the mpg_z and assigning rank
  sort mpg_z, stable
  generate rank_des = _n   
  * Assigning label
  labmask rank_des, value(make)
  * Generate 0 point
  generate zero = 0
  * Labels
  tostring mpg_z, gen(mpg_z_lab) force format(%3.2f)
  compress
  * Plot 
  twoway  (rspike zero mpg_z rank_des, horizontal) ///
    (scatter rank_des mpg_z, msize(5.3) mlabel(mpg_z_lab) mlabsize(1.5) mlabposition(0)), ///
    xlabel(-2.5(1)-0.5 0 0.5(1)2.5, labsize(2)) ylabel(1(1)20, valuelabel labsize(2)) ///
    legend(off) ///
    ytitle("Car Name") ///
    title("{bf}Diverging Lollipop Chart (Normalized MPG)", size(2.75) pos(11)) ///
    scheme(white_tableau)


**Diverging Dot Plot
sysuse auto, clear
  * Keeping first 20 observations as example
  keep in 1/20
  * Standardizing variable
  egen double mpg_z = std(mpg)
  * Sorting the mpg_z and assigning rank
  sort mpg_z, stable
  generate rank_des = _n   
  * Assigning label onto the sorted serial number
  labmask rank_des, value(make)
  * Generating indicator of below and above
  generate above = (mpg_z >= 0)
  * Labels
  tostring mpg_z, gen(mpg_z_lab) force format(%3.2f)
  compress
  * Plot 
  colorpalette tableau, nograph intensity(0.8)
  twoway  (scatter rank_des mpg_z if above == 0, mcolor("`r(p4)'") msize(5) mlabel(mpg_z_lab) mlabsize(1.3) mlabposition(0)) ///
    (scatter rank_des mpg_z if above == 1, mcolor("`r(p3)'") msize(5) mlabel(mpg_z_lab) mlabsize(1.3) mlabposition(0)) ///
    , ///
    xlabel(-2.5(1)-0.5 0 0.5(1)2.5, labsize(2)) ylabel(1(1)20, valuelabel labsize(2)) ///
    legend(off) ///
    ytitle("Car Name") ///
    title("{bf}Diverging Dot Plot (Normalized MPG)", size(2.75) pos(11)) ///
    scheme(white_tableau)

***Diverging Bars — Correlation Plot
  sysuse auto, clear 
  * Only change names of variable in local var_corr. 
  * The code will hopefully do the rest of the work without any hitch
  local var_corr price mpg trunk weight length turn foreign
  local countn : word count `var_corr'
  * Use correlation command
  * https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800800307
  * SE = (upper limit – lower limit) / 3.92
  quietly corrci `var_corr'
  matrix C = r(corr)
  local rnames : rownames C
  * Now to generate a dataset from the Correlation Matrix
  clear
  * This will not have the diagonal of matrix (correlation of 1) 
  local tot_rows : display `countn' * `countn'
  set obs `tot_rows'
  generate corrname1 = ""
  generate corrname2 = ""
  generate byte y = .
  generate byte x = .
  generate double corr = .
  generate double abs_corr = .
  local row = 1
  local y = 1
  local rowname = 2
  foreach name of local var_corr {
   forvalues i = `rowname'/`countn' { 
    local a : word `i' of `var_corr'
    replace corrname1 = "`name'" in `row'
    replace corrname2 = "`a'" in `row'
    replace y = `y' in `row'
    replace x = `i' in `row'
    replace corr = C[`i',`y'] in `row'
    replace abs_corr = abs(C[`i',`y']) in `row'
    local ++row
   }
   local rowname = `rowname' + 1
   local y = `y' + 1
  }
  drop if missing(corrname1)
*Gnerating a variable that will contain color codes
  * colorpalette HCL pinkgreen, n(20) nograph intensity(0.75) //Not Color Blind Friendly
  colorpalette CET CBD1, n(20) nograph //Color Blind Friendly option
  generate colorname = ""
  local col = 1
  forvalues colrange = -1(0.1)0.9 {
   replace colorname = "`r(p`col')'" if corr >= `colrange' & corr < `=`colrange' + 0.1'
   replace colorname = "`r(p20)'" if corr == 1
   local ++col
  } 
  * Grouped correlation of variables
  generate group_corr = corrname1 + " - " + corrname2
  compress
  * Sort the plot
  sort corr, stable
  generate rank_corr = _n
  labmask rank_corr, values(group_corr)
  * Plotting
  * Run the commands ahead in one go if you have reached this point in breaks
  * Saving the plotting code in a local 
  forvalues i = 1/`=_N' {
   local barlist "`barlist' (scatteri `=rank_corr[`i']' 0 `=rank_corr[`i']' `=corr[`i']' , recast(line) lcolor("`=colorname[`i']'") lwidth(*6))"
  }
  * Saving labels for Y-Axis in a local
  levelsof rank_corr, local(yl)
  foreach l of local yl {
   local ylab "`ylab' `l'  `" "`:lab (rank_corr) `l''" "'" 
  } 
  twoway `barlist', ///
    legend(off) scheme(white_tableau) ylabel(`ylab', labsize(2.5)) ///
    xlab(, labsize(2.5)) ///
    ytitle("Pairs") xtitle("Correlation Coeff.") ///
    title("{bf}Correlation Coefficient (Diverging Bar Plot)", size(2.75) pos(11))

**Diverging Bars — Correlation Plot with Confidence Intervals
sysuse auto, clear 
  * Only change names of variable in local var_corr. 
  * The code will hopefully do the rest of the work without any hitch
  local var_corr price mpg trunk weight length turn foreign
  local countn : word count `var_corr'
  * Use correlation command
  * https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800800307
  * SE = (upper limit – lower limit) / 3.92
  quietly corrci `var_corr'
  matrix C = r(corr)
  local rnames : rownames C
  matrix LB = r(lb)
  matrix UB = r(ub)
  matrix Z = r(z) //matrix of z = atanh r
  egen miss = rowmiss(`var_corr')
  count if miss == 0
  local N = r(N)
  * Now to generate a dataset from the Correlation Matrix
  clear
  * This will not have the diagonal of matrix (correlation of 1) 
  local tot_rows : display `countn' * `countn'
  set obs `tot_rows'
  generate corrname1 = ""
  generate corrname2 = ""
  generate byte y = .
  generate byte x = .
  generate double corr = .
  generate double lb = .
  generate double ub = .
  generate double z = .
  generate double abs_corr = .
  local row = 1
  local y = 1
  local rowname = 2
  foreach name of local var_corr {
   forvalues i = `rowname'/`countn' { 
    local a : word `i' of `var_corr'
    replace corrname1 = "`name'" in `row'
    replace corrname2 = "`a'" in `row'
    replace y = `y' in `row'
    replace x = `i' in `row'
    replace corr = C[`i',`y'] in `row'
    replace lb = LB[`i',`y'] in `row'
    replace ub = UB[`i',`y'] in `row'
    replace z = Z[`i',`y'] in `row'
    replace abs_corr = abs(C[`i',`y']) in `row'
    local ++row
   }
   local rowname = `rowname' + 1
   local y = `y' + 1
  }
  drop if missing(corrname1)
  * Generating total non missing count and P-Values
  generate N = `N'
  generate double p = min(2 * ttail(N - 2, abs_corr * sqrt(N - 2) / sqrt(1 - abs_corr^2)), 1)
  * Generate stars
  generate stars = "*" if p <= 0.1 & p > 0.05
  replace stars = "**" if p <= 0.05 & p > 0.01
  replace stars = "***" if p <= 0.01
  * Generating a variable that will contain color codes
  * colorpalette HCL pinkgreen, n(20) nograph intensity(0.75) //Not Color Blind Friendly
  colorpalette CET CBD1, n(20) nograph //Color Blind Friendly option
  generate colorname = ""
  local col = 1
  forvalues colrange = -1(0.1)0.9 {
   replace colorname = "`r(p`col')'" if corr >= `colrange' & corr < `=`colrange' + 0.1'
   replace colorname = "`r(p20)'" if corr == 1
   local ++col
  } 
  * Grouped correlation of variables
  generate group_corr = corrname1 + " - " + corrname2
  compress
  * Sort the plot
  sort corr, stable
  generate rank_corr = _n
  labmask rank_corr, values(group_corr)
  * Plotting
  * Run the commands ahead in one go if you have reached this point in breaks
  * Saving the plotting code in a local 
  forvalues i = 1/`=_N' {
   local barlist "`barlist' (scatteri `=rank_corr[`i']' 0 `=rank_corr[`i']' `=corr[`i']' , recast(line) lcolor("`=colorname[`i']'") lwidth(*6))"
  }
  * Saving labels for Y-Axis in a local
  levelsof rank_corr, local(yl)
  foreach l of local yl {
   local ylab "`ylab' `l'  `" "`:lab (rank_corr) `l''" "'" 
  } 
  twoway `barlist' ///
    (rspike lb ub rank_corr, horizontal lcolor(white) lwidth(*2)) ///
    (rspike lb ub rank_corr, horizontal lcolor(black*.5)), ///
    legend(off) scheme(white_tableau) ylabel(`ylab', labsize(2.5)) ///
    xlab(, labsize(2.5)) ///
    ytitle("Pairs") xtitle("Correlation Coeff.") ///
    title("{bf}Correlation Coefficient with Confidence Interval (Diverging Bar Plot)", size(2.75) pos(11))

**Area Chart
  import delimited "https://github.com/tidyverse/ggplot2/raw/main/data-raw/economics.csv", clear
  * YOY Change
  generate yoy = (psavert[_n] - psavert[_n-1]) / psavert[_n-1]
  generate monthyear = ym(year(date(date, "YMD")), month(date(date, "YMD")))
  format monthyear %tm
  twoway  (area yoy monthyear if monthyear <= tm(1975m12), lwidth(0)), ///
    xla(84(12)185, format(%tmCY)) ///
    plotregion(lstyle(solid) lwidth(.1)) ///
    xtitle("") ///
    ytitle("% Returns for Personal savings", size(2.75)) ///
    xscale(noline) yscale(noline) ///
    title("{bf}Area Chart", pos(11) size(3)) ///
    subtitle("% Returns for Personal Savings", pos(11) size(2.5)) ///
    scheme(white_tableau) 


**Ordered Bar Charts
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  * Acquiring average mileage (city) by manufacturer
  collapse (mean) cty, by(manufacturer)
  graph bar  (asis) cty, over(manufacturer, sort(1) label(labsize(1.75))) scheme(white_w3d) ///
     title("{bf}Ordered Bar Chart", pos(11) size(2.75)) ///
     ytitle("City" "Mileage", orient(horizontal) size(2)) ///
     ylabel(, labsize(2)) ///
     subtitle("Make Vs. Avg. Mileage", pos(11) size(2.5))


 **Lollipop Charts (Vertical)
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  * Acquiring average mileage (city) by manufacturer
  collapse (mean) cty, by(manufacturer)
  sort cty, stable 
  generate order = _n
  labmask order, values(manufacturer)
  * Plotting 
  quietly summarize order
  twoway  dropline cty order, ///
    msize(2) ///
    yscale(range(0 25)) ///
    ylabel(0(5)25) ///
    ytitle("City" "Mileage", orient(horizontal)) ///    
    xscale(range(0.25)) ///
    xlabel(`r(min)'(1)`r(max)', valuelabel labsize(1.75)) ///
    xtitle("") ///
    title("{bf}Lollipop Chart", pos(11) size(2.75)) ///
    subtitle("Make Vs. Avg. Mileage", pos(11) size(2.5)) ///
    scheme(white_w3d)


**Dot Plot (Horizontal)
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
  * Acquiring average mileage (city) by manufacturer
  collapse (mean) cty, by(manufacturer)
  sort cty, stable 
  generate order = _n
  labmask order, values(manufacturer)
  * Plotting 
  quietly summarize cty
  local xmin = `r(min)'
  quietly summarize order
  twoway  dot cty order, horizontal ///
    msize(2) ///
    yscale(range(`r(min)' `r(max)')) ///
    ylabel(`r(min)'(1)`r(max)', valuelabel labsize(1.75)) ///
    ytitle("Make", orient(horizontal) size(2)) ///    
    xscale(range(`xmin')) ///
    xlabel(10(5)25, nogrid) ///
    xtitle("Mileage", size(2)) ///
    title("{bf}Dot Plot", pos(11) size(2.75)) ///
    subtitle("Make Vs. Avg. Mileage", pos(11) size(2.5)) ///
    scheme(white_w3d)


**Slope Chart
import delimited "https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv", varnames(1) clear
 * Adding variable names to imported data
 rename (v2 v3) (y1952 y1957)
 * Checking which value is lower than previous data value
 generate negative = (y1957 < y1952)
 generate lab1952 = continent + ", " + string(round(y1952))
 generate lab1957 = continent + ", " + string(round(y1957))
 generate continent1 = 1
 generate continent2 = 2
 colorpalette w3, nograph
 twoway  (pcspike y1952 continent1 y1957 continent2 if negative == 0, legend(off) lcolor("`r(p11)'")) ///
   (pcspike y1952 continent1 y1957 continent2 if negative == 1, legend(off) lcolor("`r(p1)'")) ///
   (scatter y1952 continent1, ms(i) mlabposition(9) mlabel(lab1952)) ///
   (scatter y1957 continent2, ms(i) mlabposition(3) mlabel(lab1957)) ///
   (scatteri 12700 1 "{bf}Year 1952", ms(i) mlabpos(9)) ///
   (scatteri 12700 2 "{bf}Year 1957", ms(i) mlabpos(3)) ///   
   , ///
   ylabel(0(4000)12000, labsize(2) nogrid) ///
   ytitle("Avg." "GDP/Capita", size(2) orient(horizontal)) ///
   yscale(range(0 13000)) ///
   xlabel(1(1)2) ///
   xscale(off) ///
   xtitle("") ///
   xscale(range(0.2 2.8)) ///
   aspect(1.3) ///
   title("{bf}Slope Chart", pos(11) size(2.75)) ///
   subtitle("Mean GDP per capita: 1952 Vs. 1957" " ", pos(11) size(2)) ///
   graphregion(margin(r=25)) ///
   scheme(white_w3d)


***Dumbbell Plot
import delimited "https://raw.githubusercontent.com/selva86/datasets/master/health.csv", varnames(1) clear 
 * Preparing Y-axis
 generate srno = _n * 3
 labmask srno, values(area)
 foreach var of varlist pct* {
  replace `var' = `var' * 100
 }
 colorpalette w3, nograph
 twoway  (rspike pct_2013 pct_2014 srno, horizontal lcolor("`r(p6)'*0.4")) ///
   (scatter srno pct_2013, mcolor("`r(p6)'*0.4")) ///
   (scatter srno pct_2014, mcolor("`r(p6)'")) ///
   , ///
   ylabel(3(3)78, valuelabel angle(horizontal) labsize(2)) ///
   legend(order(3 "2014" 2 "2013") pos(11) row(1) size(2)) ///
   ytitle("") ///
   title("{bf}Dumbbell Plot", pos(11) size(2.75)) ///
   subtitle("% Change in Health Indicators by Area: 2014 vs. 2013", pos(11) size(2)) ///
   scheme(white_tableau)

**Histogram on Continuous Variable (Over Category)
    clear frames
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 replace class = subinstr(class, "2", "two", .)
 frame copy default original, replace
 levelsof class, local(cls) 
 foreach l of local cls {
  frame put displ class, into(`l')
  frame change `l'
  twoway__histogram_gen displ if class == "`l'", start(1) width(0.1) frequency generate(h x, replace)
  rename (h) (h_`l')
  keep x h_`l'
  drop if missing(x)
  save `l', replace
  frame change original
 }
 frame change original
 twoway__histogram_gen displ, start(1) width(0.1) frequency generate(h x, replace)
 drop h
 generate tag = 1 if missing(x)
 replace x = _n if missing(x)
 foreach l of local cls {
  merge 1:1 x using `l', nogen
  * erase `l'.dta
 }
 replace x = . if tag == 1
 drop tag 
 keep x h_*
 drop if missing(x)
 reshape long h_, i(x) j(type) string
 bysort x (type) : gen cumul_sum_ = sum(h_) if !missing(h_)
 drop h_*
 reshape wide cumul_sum_, i(x) j(type) string
 * Plotting
 ds cumul*
 local wcount: word count `r(varlist)'
 forvalues i = `wcount'(-1)1 {
  ds cumul*
  local a : word `i' of `r(varlist)'
  display "`a'"
  colorpalette tableau, nograph n(`i')
  local bar "`bar' (bar `a' x, fcolor("`r(p`i')'") barwidth(0.1) lwidth(0.1) lcolor(gs4))"
 } 
 twoway  `bar', xlabel(1(1)7) scheme(white_tableau) ///
   legend(order(1 "2 Seater" 2 "SUV" 3 "Subcompact" 4 "Pickup" 5 "Minivan" 6 "Midsize" 7 "Compact") rowgap(0) size(2)) ///
   xlabel(, labsize(2)) ylabel(, labsize(2)) ///
   ytitle("Count", size(2)) xtitle("Displacement", size(2)) ///
   title("{bf}Histogram with Auto Binning", pos(11) size(2.75)) ///
   subtitle("Engine Displacement across Vehicle Classes", pos(11) size(2)) 

 
   ***Histogram on Categorical Variable
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 * Plotting 
 forvalues i = 1/7 {
  local barlwidth "`barlwidth' bar(`i', lwidth(0)) "
 }
 graph bar (count),  over(class) over(manufacturer, label(alternate labsize(2))) asyvars stack ///
      scheme(white_w3d) ///
      ylabel(, nogrid) ///
      legend(order(7 "SUV" 6 "Subcompact" 5 "Pickup" 4 "Minivan" 3 "Midsize" 2 "Compact" 1 "2 Seater") rowgap(0.25) size(2)) ///
      lintensity(*0) ///
      `barlwidth' ///
      title("{bf}Histogram on Categorical Variable", pos(11) size(2.75)) ///
      subtitle("Manufacturer across Vehicle Classes", pos(11) size(2)) 
   
   
   
 **Density Plot (By Category)
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
levelof cyl, local(cylinders)
 foreach cylinder of local cylinders {
  quietly summarize cty
  local kden "`kden' (kdensity cty if cyl == `cylinder', range(`r(min)' `r(max)') recast(area) fcolor(%70) lwidth(*0.25))"
 }
 twoway `kden',  scheme(white_w3d) ///
     legend(subtitle("Cylinders", size(2)) label(1 "4") label(2 "5") label(3 "6") label(4 "8") rowgap(0.25) size(2)) ///
     title("{bf}Density Plot", pos(11) size(2.75)) ///
     ytitle("Density", size(2) orient(horizontal)) ///
     ylabel(, nogrid labsize(2)) ///
     xtitle("City Mileage", size(2)) ///
     xlabel(, nogrid labsize(2)) ///
     subtitle("City Mileage over number of cylinders", pos(11) size(2)) 
   

 
   **The Box Plot
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 graph box cty,  over(class) ///
     ytitle("City Mileage", size(2.25)) ///
     ylabel(, nogrid) ///
     title("{bf}Box Plot", pos(11) size(2.75)) ///
     b1title(" " "Class of vehicle", size(2.5)) ///
     subtitle("City Mileage grouped by class of vehicle", pos(11) size(2)) ///
     scheme(white_w3d)
   

 **Tufte Style Box Plot (Over Category)
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 * Tufte styled box plot
 graph box cty,  over(class) ///
     box(1, color(white%0)) ///
     medtype(marker) ///
     medmarker(mcolor(black) mlwidth(0)) ///
     cwhiskers ///
     alsize(0) ///
     intensity(0) ///
     lintensity(1) ///
     lines(lpattern(solid) lwidth(medium)) ///
     ylabel(, nogrid) ///
     yscale(noline) ///
     title("{bf}Box Plot", pos(11) size(2.75)) ///
     subtitle("City Mileage over number of cylinders", pos(11) size(2)) ///
     scheme(white_w3d)
   


   **Minimalistic Box Plot (Over Category & By Type)
 import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 levelsof cyl, local(cylinders)
 local catcount: word count `cylinders'
 forvalues i = 1/`catcount' {
  colorpalette tableau, nograph n(`catcount')
  local boxopt "`boxopt' box(`i', color("`r(p`i')'")) "
 }
 display `"`boxopt'"'
 graph box cty,  over(cyl) ///
     by(class, ///
      row(1) legend(pos(3)) imargin(l=1.5 r=1.5) style(compact) ///
      title("{bf}Box Plot", pos(11) size(2.75)) ///
      subtitle("City Mileage over number of cylinders" " ", pos(11) size(2)) ///
      note(, size(2)) ///
     ) ///
     asyvars ///
     `boxopt' ///
     boxgap(50) ///
     medtype(marker) ///
     medmarker(mcolor() mlwidth(0) msize(1)) ///
     cwhiskers ///
     alsize(0) ///
     intensity(0) ///
     lintensity(1) ///
     lines(lpattern(solid) lwidth(medium)) ///
     ylabel(, nogrid) ///
     yscale(noline) ///
     ytitle("City Mileage", size(2.25)) ///
     subtitle(, size(2.5)) ///          //size of group headers
     legend(size(2.25) rowgap(0.25) subtitle("Cylinders", size(2.25))) ///
     scheme(white_tableau)


**Violin Plot
import delimited "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/mpg.csv", clear
 * Original Violin Plot
 * This plot contains box and distribution
 violinplot cty,  over(class) vertical scheme(white_w3d) ///
      ytitle("City Mileage", size(2.25)) ///
      ylabel(, nogrid) ///
      title("{bf}Box Plot", pos(11) size(2.75)) ///
      b1title(" " "Class of vehicle", size(2.5)) ///
      subtitle("City Mileage grouped by class of vehicle", pos(11) size(2))
 * To make a version without box we can use:
  violinplot cty,  over(class) vertical scheme(white_w3d) nobox nomedian noline nowhiskers ///
      ytitle("City Mileage", size(2.25)) ///
      ylabel(, nogrid) ///
      title("{bf}Box Plot (Density Only)", pos(11) size(2.75)) ///
      b1title(" " "Class of vehicle", size(2.5)) ///
      subtitle("City Mileage grouped by class of vehicle", pos(11) size(2))
   
   
   
 ***Population Pyramid Plot
import delimited "https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv", clear 
 format users %20.0g
 replace users = round(users)
 replace users = -(users) if users < 0 & gender == "Female"
 replace users = -(users) if users > 0 & gender == "Male"
 encode stage, gen(stage_n)
 forvalues i = -15000000(5000000)15000000 {
  if `i' != 0 {
   local xlab "`xlab' `i' `"`=abs(`i')/1000000'm"'"  //Use of compound quotes to work with labels with absolute (abs) values
  }
  else {
   local xlab "`xlab' 0 `"0"'"
  }
 }
 * display `"`xlab'"'
 twoway  (bar users stage_n if gender == "Female", horizontal lwidth(0) barwidth(0.8)) ///
   (bar users stage_n if gender == "Male", horizontal lwidth(0) barwidth(0.8)) ///
   , ///
   yscale(noline) ///
   xlabel(`xlab', nogrid) ///
   ylabel(1(1)18, nogrid noticks valuelabel labsize(2)) ///
   ytitle("Stage") ///
   legend(order(1 "Female" 2 "Male") size(2)) ///
   title("{bf}Email Campaign Funnel", size(2.75)) ///
   scheme(white_tableau)

*社群群友可以直接在社群下载Stata完整版code

Source: https://medium.com/the-stata-gallery/top-25-stata-visualizations-with-full-code-668b5df114b6

关于Stata, 1.Stata16新增功能有哪些? 满满干货拿走不谢,2.Stata资料全分享,快点收藏学习3.Stata统计功能、数据作图、学习资源4.Stata学习的书籍和材料大放送, 以火力全开的势头5.史上最全Stata绘图技巧, 女生的最爱,6.把Stata结果输出到word, excel的干货方案,7.编程语言中的函数什么鬼?Stata所有函数在此集结,8.世界范围内使用最多的500个Stata程序,9.6张图掌握Stata软件的方方面面, 还有谁, 还有谁? 10.LR检验、Wald检验、LM检验什么鬼?怎么在Stata实现,11.Stata15版新功能,你竟然没有想到,一睹为快,12."高级计量经济学及Stata应用"和"Stata十八讲"配套数据,13.数据管理的Stata程序功夫秘籍,14.非线性面板模型中内生性解决方案以及Stata命令15.把动态面板命令讲清楚了,对Stata的ado详尽解,16.半参数估计思想和Stata操作示例,17.Stata最有用的points都在这里,无可替代的材料18.PSM倾向匹配Stata操作详细步骤和代码,干货十足,19.随机前沿分析和包络数据分析 SFA,DEA 及Stata操作,20.福利大放送, Stata编程技巧和使用Tips大集成,21.使用Stata进行随机前沿分析的经典操作指南,22.Stata, 不可能后悔的10篇文章, 编程code和注解,23.用Stata学习Econometrics的小tips, 第二发礼炮,24.用Stata学习Econometrics的小tips, 第一发礼炮,25.广义合成控制法gsynth, Stata运行程序release,26.多重中介效应的估计与检验, Stata MP15可下载,27.输出变量的描述性统计的方案,28.2SLS第一阶段输出, 截面或面板数据及统计值都行,29.盈余管理指标的构建及其Stata实现程序, 对应解读和经典文献,30.Python, Stata, R软件史上最全快捷键合辑!,31.用Stata做面板数据分析, 操作代码应有尽有,32.用Stata做面板数据分析, 操作代码应有尽有,33.没有这5个Stata命令, 我真的会活不下去!,34.第一(二)卷.Stata最新且有趣的程序系列汇编,35.第三卷.Stata最新且急需的程序系列汇编,36.第四卷.Stata最新且急需的程序系列汇编,37.干货: UN和WTO推荐的最全且权威的实证研究方法及在Stata实现!必收藏!38.再中心化影响函数RIF回归和分解的Stata操作程序39.R和Stata软件meta分析操作详细攻略, 对研究再开展研究的利器!40.不能安装Stata命令咋弄?这个方法一直都比较靠谱!,41.使用Stata做结构方程模型GSEM的操作指南42.疫情期计量课程免费开放!面板数据, 因果推断, 时间序列分析与Stata应用,43.一些Stata常见操作代码和注释, 能够让年轻学人更快掌握相关命令!44.Stata语言中的常用函数及其用法解释, 在附上42篇Stata相关学习资料,45.Stata经典操作笔记和学习资源合辑! 都是些博士生导师比较推荐的! 46.Stata17版本对外发布了, 里面提供了25个让学者激动的新程序板块!47.分享9篇文章的Stata复制代码, 可以直接下载进行一一复制实证结果, 48.实证中常用Stata操作命令和代码总结,49.如何用Stata实现熵值法的指南?50.多期DID或渐进DID或交叠DID, 最新Stata执行命令整理如下供大家学习,51.推荐一份超级大礼包资源, 里面有丰富的Stata学习材料, 写文章作报告找工作的指南


关于计量方法类书籍,参看1.一些比较常见的因果推断书籍25本汇总, 很多可以直接下载PDF,2.推荐一本专攻处理效应分析的书籍, 包括主流政策评估计量方法,3.年龄-时期-队列分析及其实现过程和经典书籍, 附code和数据!4.推荐使用Python语言做因果推断前沿方法的书籍,5.使用Stata做时间序列分析书籍, 包括模型讲解以及Stata示例操作,6.一本最新因果推断书籍, 包括了机器学习因果推断方法, 学习主流和前沿方法,7.社会经济政策的评估计量经济学, 提供书籍和数据和程序文件,8.全面且前沿的因果推断课程, 提供视频, 课件, 书籍和经典文献,9.从网页上直接复制代码的因果推断书籍出现了, 学会主流方法成效极快,10.推荐书籍"用R软件做应用因果分析", 有需要的学者可以自行下载!11.哪本因果推断书籍最好?我们给你整理好了这个书单!12.“不一样”的因果推断书籍, 很多观点让我们能恍然大悟, 涵盖了不少其他书里没有的因果推断方法!13.搞懂因果推断中内生性问题解决方法必读的书籍和文献已搜集好!14.一位“诗人”教授写了本因果推断书籍, 现在可以直接下载PDF参看!15.使用R软件学习计量经济学方法三本书籍推荐,16.数据缺失方法处理大全, 经典书籍助你修复数据,17.中介和调节效应操作指南, 经典书籍和PPT珍藏版,18.用R语言做Econometrics的书籍推荐, 值得拥有的经典,19.史上最全的因果识别经典前沿书籍, 仅此一份,20.重磅好书"环境能源计量经济学(附代码)", 该领域主流, 时髦和前沿的计量方法,21.Acemoglu又出版了两本经济学教课书!22.Stata17MP版最新使用指南全书, 包括DSGE, Lasso回归, ERM, 贝叶斯分析等,23.全新因果推断方法新书, 配套R, Stata和Python的代码, 还包括教授视频和PPT素材!24.学习计量, 统计和各种软件的必备书单,25.机器学习第一书, 数据挖掘, 推理和预测,26.计量回归中的交互项到底什么鬼? 捎一本书给你,27.荐书,计量经济学宝典,28.送书: 应用时间序列分析(经典),29.计量经济学教科书,多门类多级别的一个都不能少,30.50本经济学书单,入门到精通分门别类,31.从入门到进阶的Python数据分析手册, 课程内容完全免费!32.2卷RDD断点回归使用手册, 含Stata和R软件操作流程,33.社会网络分析最新文献和软件学习手册,34.环境, 能源和资源经济学手册推荐, 经典著作需要反复咀嚼,35.各领域经济学手册全在这里, 不学手册只能做重复研究,36.史上最全博士论文撰写指导手册Handbook

计量经济圈公众号搜索功能及操作流程演示

下面短链接文章属于合集,可以收藏起来阅读,不然以后都找不到了。

4年,计量经济圈近1000篇不重类计量文章,

可直接在公众号菜单栏搜索任何计量相关问题,

Econometrics Circle




数据系列空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 |  夜间灯光 | 官员方言  | 微观数据 | 内部数据计量系列匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节 | 时间序列 | RDD断点 | 合成控制 | 200篇合辑 | 因果识别 | 社会网络 | 空间DID数据处理Stata | R | Python | 缺失值 | CHIP/ CHNS/CHARLS/CFPS/CGSS等 |干货系列能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI | SSCI查询 | 名家经验计量经济圈组织了一个计量社群,有如下特征:热情互助最多前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存