Quick Stata Guide by Liz Foster Table of Contents Part 1: Top Ten Stata Command
Quick Stata Guide by Liz Foster Table of Contents Part 1: Top Ten Stata Commands 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the * 15 explanation of data set 16 Part 3: More Examples 17 interaction terms 17 regression line 18 Appendix: Key Terms and Concepts 20 Quick Stata Guide Top Ten Stata Commands 1 Part 1: Top Ten Stata Commands describe This command tells you information about the variables in your dataset – how big they are, what they represent, units, what different codes stand for – if this information is available. Example . describe Contains data from example.dta obs: 281 Child Support Awards Santa Clara County California vars: 5 18 Nov 2004 15:52 size: 4,496 (99.6% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- award int %8.0g Child support award earndad float %9.0g Father's monthly earnings earnmom float %9.0g Mother's monthly earnings nkids byte %8.0g Number of kids petmom byte %8.0g yesno Was it the mother who petitioned for divorce? ------------------------------------------------------------------------------- Sorted by: Options You can select only certain variables by listing them, for example: describe earnmom earndad generate This command generates new variables. In particular, it can generate dummy variables and interaction terms. It can be abbreviated gen. Example . gen richmom = (earnmom >= 2500) . table richmom, c(freq min earnmom max earnmom mean earnmom) ---------------------------------------------------------------------- richmom | Freq. min(earnmom) max(earnmom) mean(earnmom) ----------+----------------------------------------------------------- 0 | 239 0 2491.67 1205.992 1 | 42 2500 5250 2950.984 ---------------------------------------------------------------------- This creates a new binary variable equal to 1 if the mother earns more than $2500 a month. Or, we could generate a variable that indicated whether the mother earned more than the father: gen richermom = (earnmom > earndad) If you want to see whether child support is a quadratic function of the number of children, Quick Stata Guide Top Ten Stata Commands 2 rather than linear, you need to add an nkids2 term. . gen nkidssq = nkids * nkids . reg award nkids nkidssq, r Regression with robust standard errors Number of obs = 281 F( 2, 278) = 31.92 Prob > F = 0.0000 R-squared = 0.1921 Root MSE = 218.1 ------------------------------------------------------------------------------ | Robust award | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nkids | 446.319 111.7826 3.99 0.000 226.2711 666.3669 nkidssq | -80.59451 32.09875 -2.51 0.013 -143.782 -17.40702 _cons | -106.9748 85.40495 -1.25 0.211 -275.0973 61.14779 ------------------------------------------------------------------------------ The coefficient on the squared term is significant, so the quadratic form fits the data better. To see whether the effect of the mother being the petitioner is different for mothers who earn more than their husbands, we need an interaction term richermom * petmom. . gen richermom_X_petmom = richermom * petmom . reg award richermom petmom richermom_X_petmom, r Regression with robust standard errors Number of obs = 281 F( 3, 277) = 17.06 Prob > F = 0.0000 R-squared = 0.1970 Root MSE = 217.83 ------------------------------------------------------------------------------ | Robust award | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- richermom | -257.8725 66.34101 -3.89 0.000 -388.4691 -127.2759 petmom | -150.4746 64.45085 -2.33 0.020 -277.3503 -23.5989 richermom_~m | 87.16457 74.13835 1.18 0.241 -58.78159 233.1107 _cons | 596.4483 57.13669 10.44 0.000 483.971 708.9256 ------------------------------------------------------------------------------ The interaction term is not significant. Options The command gen can be combined with the command tab to generate a set of indicator variables for the categories of a category variable. For example: . tab nkids, gen(nkids_) Number of | kids | Freq. Percent Cum. ------------+----------------------------------- 1 | 143 50.89 50.89 2 | 117 41.64 92.53 3 | 20 7.12 99.64 4 | 1 0.36 100.00 ------------+----------------------------------- Quick Stata Guide Top Ten Stata Commands 3 Total | 281 100.00 . sum nkids_* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- nkids_1 | 281 .5088968 .5008128 0 1 nkids_2 | 281 .4163701 .4938359 0 1 nkids_3 | 281 .0711744 .2575746 0 1 nkids_4 | 281 .0035587 .059655 0 1 This creates four new indicator variables. For example, nkids_2 is equal to 2 if the family has two children, and 0 otherwise. We can now regress the child support award on the number of children in the most flexible way possible without assuming the relationship to be linear or quadratic. . reg award nkids_*, r Regression with robust standard errors Number of obs = 281 F( 3, 277) = 54.72 Prob > F = 0.0000 R-squared = 0.1954 Root MSE = 218.04 ------------------------------------------------------------------------------ | Robust award | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nkids_1 | 60.06993 13.61372 4.41 0.000 33.27045 86.86941 nkids_2 | 258.4444 23.15878 11.16 0.000 212.8549 304.034 nkids_3 | 334.95 74.62313 4.49 0.000 188.0495 481.8505 nkids_4 | (dropped) _cons | 200 . . . . . ------------------------------------------------------------------------------ For help in interpreting these results, see the page for test. regress This command runs an OLS regression. The first variable is the dependant one (Y) the following are the independent ones (Xs). Can be abbreviated reg. Example . reg award nkids Source | SS df MS Number of obs = 281 -------------+------------------------------ F( 1, 279) = 55.87 Model | 2730761.85 1 2730761.85 Prob > F = 0.0000 Residual | 13636695.1 279 48877.0432 R-squared = 0.1668 -------------+------------------------------ Adj R-squared = 0.1639 Total | 16367456.9 280 58455.2033 Root MSE = 221.08 ------------------------------------------------------------------------------ award | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nkids | 154.1658 20.62522 7.47 0.000 113.565 194.7666 _cons | 120.0708 34.95281 3.44 0.001 51.26609 188.8755 ------------------------------------------------------------------------------ Quick Stata Guide Top Ten Stata Commands 4 Options The option , r is added so that Stata allows for heteroskedasticity and calculates the correct standard errors. According to Watson, you should always use it. It changes the format of the output a little: . reg award nkids, r Regression with robust standard errors Number of obs = 281 F( 1, 279) = 35.12 Prob > F = 0.0000 R-squared = 0.1668 Root MSE = 221.08 ------------------------------------------------------------------------------ | Robust award | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nkids | 154.1658 26.01487 5.93 0.000 102.9554 205.3761 _cons | 120.0708 36.85967 3.26 0.001 47.51243 192.6292 ------------------------------------------------------------------------------ So we have the result that award = 120.1 + 154.2 * nkids with standard errors of 26.0 and 36.9 on the two coefficients. Stata no longer automatically displays the adjusted R-squared. To make Stata display it, use: display _result(8) scatter Produces basic scatter plots of data. Example . scatter earnmom award Quick Stata Guide Top Ten Stata Commands 5 Options Add more variables. The last variable will always be on the x-axis, the other variables on the y-axis, represented by different colored dots. . scatter earnmom earndad award There are dozens of other options – read Stata help. If you want to change something about the scatter plot, you can. sort This commands sorts your data by the values of a specific variable. It must be run before you can use the prefix by : Example The command sort nkids produces no output, but if you now run describe it will tell you that your dataset is sorted by nkids. summarize If run with no arguments, this command produces a basic summary of every variable in your data set. It may be abbreviated sum. Quick Stata Guide Top Ten Stata Commands 6 Example . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- award | 281 362.0178 241.7751 0 1600 earndad | 281 1363.912 2514.409 0 28333.33 earnmom | 281 1466.809 962.5245 0 5250 nkids | 281 1.569395 .6405823 1 4 petmom | 281 .7793594 .4154184 0 1 This data set has 5 variables, called award, earndad, earnmom, nkids, and petmom. The first column Obs tells you the number of observations you have for each variable – here we have 281 for each. The second column Mean tells you the average value of each variable in the dataset. The third column Std. Dev. tells you the standard deviation of the variable. The fourth and fifth columns Min and Max tell you the smallest and largest value of the variable in the dataset. Options You can add a list of variables to produce summary stats for those variables only. For example: . summarize earnmom earndad Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- earnmom | 281 1466.809 962.5245 0 5250 earndad | 281 1363.912 2514.409 0 28333.33 You can add the option , detail to produce more detailed statistics for one or more variables. . summarize earnmom, detail earnmom ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 281 25% 900 0 Sum of Wgt. 281 50% 1500 Mean 1466.809 Largest Std. Dev. 962.5245 75% 2100 3700 90% 2666.67 4083.33 Variance 926453.4 95% 2950 4158.33 Skewness .2368765 99% 4083.33 5250 Kurtosis 3.094632 table When given a list of variables, produces tables showing the frequency of combinations of values of those variables. Quick Stata Guide Top Ten uploads/Finance/ quick-stata-guide.pdf
Tags
Administrationstata award nkids variable ------------------------------------------------------------------------------Documents similaires







-
35
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Jui 25, 2021
- Catégorie Business / Finance
- Langue French
- Taille du fichier 0.1948MB