Regression: using dummy variables/selecting the reference category
If using categorical variables in your regression, you need to add n-1 dummy variables. Here ‘n’ is the number of categories in the variable.
In the example below, variable ‘industry’ has twelve categories (type tab industry, or tab industry, nolabel)
The easiest way to include a set of dummies in a regression is by
using the prefix “i.” By default, the first category (or lowest value) is
used as reference. For example:
sysuse nlsw88.dta
reg wage hours i.industry, robust
To change the reference category to “Professional services”
(category number 11) instead of “Ag/Forestry/Fisheries” (category
number 1), use the prefix “ib#.” where “#” is the number of the
reference category you want to use; in this case is 11.
sysuse nlsw88.dta
reg wage hours ib11.industry, robust
_cons 3.126629 .8899074 3.51 0.000 1.381489 4.871769
Public Administration 3.232405 .8857298 3.65 0.000 1.495457 4.969352
Professional Services 2.094988 .8192781 2.56 0.011 .4883548 3.701622
Entertainment/Rec Svc 1.111801 1.192314 0.93 0.351 -1.226369 3.449972
Personal Services -1.018771 .8439617 -1.21 0.228 -2.67381 .6362679
Business/Repair Svc 1.990151 1.054457 1.89 0.059 -.0776775 4.057979
Finance/Ins/Real Estate 3.92933 .9934195 3.96 0.000 1.981199 5.877461
Wholesale/Retail Trade .4583809 .8548564 0.54 0.592 -1.218023 2.134785
Transport/Comm/Utility 5.432544 1.03998 5.22 0.000 3.393107 7.471981
Manufacturing 1.415641 .849571 1.67 0.096 -.2503983 3.081679
Construction 1.858089 1.281807 1.45 0.147 -.6555808 4.371759
Mining 9.328331 7.287849 1.28 0.201 -4.963399 23.62006
industry
hours .0723658 .0110213 6.57 0.000 .0507526 .093979
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
Root MSE = 5.5454
R-squared = 0.0800
Prob > F = 0.0000
F( 12, 2215) = 24.96
Linear regression Number of obs = 2228
_cons 5.221617 .4119032 12.68 0.000 4.41386 6.029374
Public Administration 1.137416 .4176899 2.72 0.007 .3183117 1.956521
Entertainment/Rec Svc -.983187 .9004471 -1.09 0.275 -2.748996 .7826217
Personal Services -3.113759 .3192289 -9.75 0.000 -3.739779 -2.48774
Business/Repair Svc -.1048377 .7094241 -0.15 0.883 -1.496044 1.286368
Finance/Ins/Real Estate 1.834342 .6171526 2.97 0.003 .6240837 3.0446
Wholesale/Retail Trade -1.636607 .3504059 -4.67 0.000 -2.323766 -.949449
Transport/Comm/Utility 3.337556 .6861828 4.86 0.000 1.991927 4.683185
Manufacturing -.6793477 .3362365 -2.02 0.043 -1.338719 -.019976
Construction -.2368991 1.011309 -0.23 0.815 -2.220112 1.746314
Mining 7.233343 7.245913 1.00 0.318 -6.97615 21.44284
Ag/Forestry/Fisheries -2.094988 .8192781 -2.56 0.011 -3.701622 -.4883548
industry
hours .0723658 .0110213 6.57 0.000 .0507526 .093979
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
Root MSE = 5.5454
R-squared = 0.0800
Prob > F = 0.0000
F( 12, 2215) = 24.96
Linear regression Number of obs = 2228
The “ib#.” option is available since Stata 11 (type help fvvarlist for more options/details). For older Stata versions you need to
use “xi:” along with “i.” (type help xi for more options/details). For the examples above type (output omitted):
xi: reg wage hours i.industry, robust
char industry[omit]11 /*Using category 11 as reference*/
xi: reg wage hours i.industry, robust
To create dummies as variables type
tab industry, gen(industry)
To include all categories by suppressing the constant type:
reg wage hours bn.industry, robust hascons