Dougherty: Introduction to Econometrics 3e
About the NLSY
NLSY panel data set
(Used in Exercises 14.5–14.8 in the text)
The data set is a sub-set of a major US data-base, the National Longitudinal Survey of Youth (NLSY79). NLSY79 is a panel survey in which a nationally-representative sample of young males and females aged 14 to 21 in 1979 have been re-interviewed since 1979. Until 1994 the interviews took place annually and now they are being conducted at two-yearly intervals. The core sample originally consisted of 3,003 males and 3,108 females. In addition there are special supplementary samples (some now discontinued) of ethnic minorities, those in poverty and those serving in the armed forces. Extensive background information was obtained in the base-year survey in 1979 and since then information has been updated each year on education, training, employment, marital status, fertility, health, child care and assets and income. In addition special sections have been added from time to time on other topics – for example, drug use. The surveys have been extremely detailed and the quality of the execution of the survey is very high. As a consequence NLSY79 is regarded as one of the most important data bases available to social scientists working with U.S. data.
The data relate to the years 1980–1994, 1996, 1998, and 2000. Note that there are many missing data. Obviously if a respondent was not interviewed in a given year, all data for that year are missing. In addition many data are missing for specific reasons.
The data are restricted to males whose marital status is either single or married, who are not in school, for whom ASVAB scores are available, who worked at least 30 hours per week and whose reported hourly rate of pay was at least $2.50 and not more than $250,
The variables listed below were recorded for each respondent for each of the years 1980–1994, 1996, 1998, and 2000. Hence there are potentially 18 observations for each respondent. However, owing to non-interviews or exclusions, the actual number is lower for many respondents and the panel is of the unbalanced type.
Personal variables
| ID | C | Respondent identification number |
| AGE | C | age |
| AGESQ | C | square of AGE |
| S | C | years of schooling (highest grade completed) |
| Ethnicity: | ||
| ETHBLACK | D | black |
| ETHHISP | D | hispanic |
| HEIGHT85 | C | height in inches in 1985 |
| WEIGHT | C | weight in pounds |
| Score on a component of the ASVAB battery (scaled with mean 50, standard deviation 10): | ||
| ASVAB2 | C | arithmetic reasoning |
| ASVAB3 | C | word knowledge |
| ASVAB4 | C | paragraph comprehension |
| ASVABC | C | composite of ASVAB2 (with double weight),ASVAB3 and ASVAB4 |
| SM | C | mother’s years of schooling |
| SF | C | father’s years of schooling |
| SIBLINGS | C | number of siblings |
| CHILDREN | C | number of children in the household |
| YOUNGEST | C | age of youngest child |
| MARRIED | D | married in the interview year |
| SINGLE | D | single in the interview year |
| SINGBOTH | D | single in the interview year and four years later |
| SOONMARR | D | single in the interview years but married four years later |
| URBAN | D | living in an urban area |
| Region of residence (census classification): | ||
| REGNE | D | north-east |
| REGNC | D | north-central |
| REGW | D | west |
| REGS | D | south |
Work-related variables
| EARNINGS | C | current hourly earnings in 1996 constant dollars | |
| HOURS | C | hours worked per week | |
| TENURE | C | years worked with present employer | |
| TENURESQ | C | square of TENURE | |
| EXP | C | total years of work experience | |
| EXPSQ | C | square of EXP/td> | |
| Sector of employment: | |||
| CLASSPRI | D | private sector employee | |
| CLASSPUB | D | public sector | |
| CLASSSE | D | self-employed | |
| UNION | D | member of a union (question asked 1988-2000 only) | |
| UNCOLB | D | wages set by collective bargaining | |
C indicates a continuous variable, D a dummy variable.


