A graph showing the gender wage gap

In regression analysis, a dummy variable (also known as indicator variable or just dummy) is one that takes a binary value (0 or 1) to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.[1] In machine learning this is known as one-hot encoding.

Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels, such as education level or occupation. In this case, multiple dummy variables would be created to represent each level of the variable, and only one dummy variable would take on a value of 1 for each observation. Dummy variables are useful because they allow the use of categorical variables in our analysis, which would otherwise be difficult to include due to their non-numeric nature. .

Function and use

edit

A dummy variable is applied to a binary variable--for example, sex may be encoded as a dummy variable by listing females as 0 and men as 1.

As with any addition of variables to a model, the addition of dummy variables will increase the within-sample model fit (coefficient of determination), but at a cost of fewer degrees of freedom and loss of generality of the model (out of sample model fit). Too many dummy variables result in a model that does not provide any general conclusions.

Dummy variables are useful in various cases. For example, in econometric time series analysis, dummy variables may be used to indicate the occurrence of wars, or major strikes. It could thus be thought of as a Boolean, i.e., a truth value represented as the numerical value 0 or 1 (as is sometimes done in computer programming).

Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons: D1=1 if the observation is for summer, and equals zero otherwise; D2=1 if and only if autumn, otherwise equals zero; D3=1 if and only if winter, otherwise equals zero; and D4=1 if and only if spring, otherwise equals zero. In the panel data fixed effects estimator dummies are created for each of the units in cross-sectional data (e.g. firms or countries) or periods in a pooled time-series.

In such regressions, either the constant term or one of the dummies has to be removed making this the base category against which the others are assessed. This is because if dummy variables for all categories were included, their sum would equal 1 for all observations, which is identical to and hence perfectly correlated with the vector-of-ones variable whose coefficient is the constant term; if the vector-of-ones variable were also present, this would result in perfect multicollinearity,[2] so that the matrix inversion in the estimation algorithm would be impossible. This is referred to as the dummy variable trap.

See also

edit

References

edit
  1. ^ Draper, N.R.; Smith, H. (1998) Applied Regression Analysis, Wiley. ISBN 0-471-17082-8 (Chapter 14)
  2. ^ Suits, Daniel B. (1957). "Use of Dummy Variables in Regression Equations". Journal of the American Statistical Association. 52 (280): 548–551. JSTOR 2281705.

Further reading

edit
  • Asteriou, Dimitrios; Hall, S. G. (2015). "Dummy Variables". Applied Econometrics (3rd ed.). London: Palgrave Macmillan. pp. 209–230. ISBN 978-1-137-41546-2.
  • Kooyman, Marius A. (1976). Dummy Variables in Econometrics. Tilburg: Tilburg University Press. ISBN 90-237-2919-6.
edit

📚 Artikel Terkait di Wikipedia

Dummy variable

placeholder variable Dummy variable (statistics), an indicator variable This disambiguation page lists articles associated with the title Dummy variable. If an

Continuous or discrete variable

and statistics, a quantitative variable may be continuous or discrete. If it can take on two real values and all the values between them, the variable is

Dependent and independent variables

A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are the outcome of the

Dummy

term for bound variable in mathematics Dummy variable (statistics), another term for binary variable in statistics Pacifier, called a dummy in some countries

Categorical variable

In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of

Indicator function

function. A related concept in statistics is that of a dummy variable. (This must not be confused with "dummy variables" as that term is usually used in

List of statistics articles

model Drift rate – redirects to Stochastic drift Dudley's theorem Dummy variable (statistics) Duncan's new multiple range test Dunn index Dunnett's test Durbin

Crash test dummy

A crash test dummy, or simply dummy, is a full-scale anthropomorphic test device (ATD) that simulates the dimensions, weight proportions and articulation