Interest in Cuban socioeconomic indicators is usually focused on pre- and post-Revolutionary comparisons. That is not the purpose of this paper. My focus here will be only on the pre-Revolutionary period. The purpose of this work is quite specific. It is to see whether the available data on socioeconomic indicators are consistent with the recent direct estimate in Devereux and Ward (2009) (henceforth DW) of purchasing power parity (PPP) adjusted per capita income for Cuba in the 1950s.
Briefly, in this paper I take a set of countries, not including Cuba, augmented with individual states of the United States, and construct statistical models of PPP adjusted per capita income for 1955, using various socioeconomic variables as predictors. The models are then used to predict income for Cuba, and the predicted values are compared with that provided by DW.
The paper is organized as follows. The first section briefly discusses the data, and in the second, I describe Cuba’s relative position for several of the indicators. The next section treats income as the dependent variable, and uses regression analysis to derive predicted income for Cuba. There I estimate regressions for single and multiple indicators. In the following section I treat income as an explanatory variable, and I estimate models for each socioeconomic indicator separately. I then estimate Cuban income as the number that, given the estimated models, “best fits” the Cuban values for all the indicators. All procedures used yield similar results, and all are quite consistent with the value derived by DW that Cuban per capita GDP in 1955 was 27% of that of the United States. I end with a brief conclusion in the last section.
The basic set of countries included in this study consists of those countries for which DW provides PPP measures of relative per capita income in 1955. These are the countries of Latin America, eight countries of Europe (Belgium, Denmark, France, Germany, France, Italy, the Netherlands, Norway, and the United Kingdom), and the United States.2 I have augmented this set with individual American states by using relative personal per capita income in 1955 as the relevant relative income measure.
For convenience of exposition, the socioeconomic indicators for the late 1950s on which data has been obtained can be grouped into the categories of consumption, education, health, and nutrition. This categorization is only for convenience of exposition, as some variables can conceivably be included in more than one category.
Under consumption are included telephones, passenger cars, televisions, radios and energy consumed. In the category of education are newspaper circulation and school enrollment. For health, I have used the infant mortality rate (IMR), life expectancy, and number of physicians. Under nutrition I simply have used calories consumed. I will also use a group of demographic variables that consists of urbanization, percent of the population that is not white, and population density. For phones, cars, newspapers, IMR, physicians and the demographic variables I have data for the countries and the individual American states. For the other variables I only have data at the country level. Table 1 presents some summary statistics for the data.
Table 1. Summary Statistics
CUBA’S RELATIVE INTERNATIONAL POSITION
This section illustrates the relationship in the raw data between some of the socioeconomic indicators and per capita income across countries and individual states, and Cuba’s place among them. Needless to say, one does not wish to rely too heavily on a single indicator, as unaccounted-for country specific idiosyncratic factors can affect a given value of an indicator for a specific country at a point in time. Instead, one should look at the overall picture that emerges as one compares income to a wide variety of indicators.
Figure 1 shows the log of phone lines per 1000 persons in 1957 against the log of relative income in 1955. As can be seen, phones and income were highly correlated (0.98). Based on the intensity of phone lines, the DW measure of Cuban income appears to be somewhat high, but probably not significantly so.
Figure 1. Phone Lines
Figure 2 shows the relationship between passenger cars and income. Once again it is evident that the correlation between the two is very high (0.97). This time the DW measure of income is pretty much what one would expect given the intensity of passenger cars in 1957 Cuba.
Figure 2. Passenger Cars
The third of the consumption indicators shown is the availability of television sets. I did not obtain data by state, so only country data are shown in Figure 3. Furthermore, Bolivia, Ecuador, and Paraguay are excluded because they reported no televisions in 1960. I presume that for reasons other than low income those countries were extremely slow in starting up television broadcasts.3 In terms of televisions, Cuba appears to have had higher levels of ownership than the DW measure of income would imply. Alternatively, the availability of televisions in Cuba implies that DW’s measure of income is too low.
Figure 3. Televisions
Turning to health indicators, Figure 4 shows the log of the IMR against the log of relative income. As with televisions, Cuba had an IMR that was lower than would be expected given the DW measure of Cuban income, or that the implied level of income for Cuba given the country’s IMR is higher.
Figure 4. Infant Mortality Rate
Figure 5 shows the relationship between newspaper circulation and income. Here Cuba appears to be low given the DW measure of income. Interestingly, with the exception of the District of Columbia, all of the U.S. also appears low.
Figure 5. Newspaper Circulation
Finally, Figure 6 shows the relationship between calories consumed and income. If anything, Cuba appears to have had higher consumption of calories than would be expected given the DW measure of income.
Figure 6. Calories
These six socioeconomic indicators are representative of the relationship in the raw data between all the indicators and income. The correlations (in the logs) are always high (from a low of 0.78 for calories to a high of 0.98 for phones and energy consumption). For some, the DW measure of income appears to be somewhat high given the international variation, but for others it seems somewhat low. For yet still others, it looks just about right. For none of the indicators does Cuba appear to be an outlier. This casual inspection of the data suggests that the DW measure of per capita income for Cuba in 1955 is broadly consistent with socioeconomic variables from the period 1955–60. In the next section I provide more systematic evidence in support of this proposition.
INCOME AS DEPENDENT VARIABLE
Consider the following statistical model for the log of per capita income in country or state i, yi.:
yi = αj + βjxij + λjzi + εij (1)
where xij is country i’s value of indicator j, zi is a set demographic variables, and εij is an error term with E(εij) = 0. This error term captures unobserved variables whose effects are orthogonal to the other variables on the right hand side of (1). Equation (1) is to be interpreted merely as a statistical relationship, not a causal one. Its use is solely predictive.
The first procedure in this section is to estimate equations such as (1) for each socioeconomic indicator. The estimated coefficients, αj, βj and λj, can then be used along with Cuban values for xCj and zCi, to obtain a point estimate of Cuban income, yC. This estimate of the log of relative per capita income can then be compared with the DW value of yC = ln(27) = 3.30.
Table 2 shows the estimates of the coefficients of equations of type (1) for 11 different socioeconomic indicators. For each indicator two regressions were estimated. The first type of regression uses the indicator as the sole regressor. The second type includes the demographic variables as additional regressors. In all regressions, the indicator variable is highly significant. In all but one of the second type of regressions at least one of the demographic variables is significant at conventional levels. The fit is high in all cases, with R2s exceeding 0.80 in all of the regressions that include the demographic variables.
Table 2. Income Regresssions Using Individual Indicators
The last column of Table 2 shows the predicted values for the log of Cuban per capita GDP in 1955. While there are slightly more instances for which the predicted values are below the DW measure than for which they exceed it, the average predicted value exceeds the DW measure for both types of regressions. Focusing on the predictions from the regressions that include the demographic variables, the average predicted level of relative income is about 10% above that of DW. Under the assumption that the error terms are uncorrelated across equations (very unlikely), the average predicted value of 3.40 is within two standard errors of the DW measure of 3.30. In any case, we have no evidence that the DW measure is too low.
An alternative approach is to estimate an equation such as (1), but using multiple indicators. The corresponding equation would be:
yi = α + βxi + λzi + εi (2)
where xi is now a vector of socioeconomic indicators. The primary limitation with this approach is the relatively small number of observations that I currently have for several of the indicators. I have data for countries and individual states for phone lines, passenger cars, newspaper circulation, IMR, and physicians. For the other indicators I have data only at the country level. Given the small number of observations in the latter group, and the high collinearity among the indicators, it is not possible to estimate (2) using all the indicators. Instead, I have carried out two separate regressions. The first involves countries and states, and includes as independent variables those indicators that I have available for the states (phone lines, passenger cars, newspaper circulation, IMR, and physicians), along with the demographic variables. The results of this estimation are shown in the first column of Table 3.
The second estimation involves only country level data. For this regression I included as explanatory variables those independent variables from the first regression that were significant at conventional levels, plus the indicators that were not included in the first regression because they were not available for the states.4 The results from the regression are shown in the second column of Table 3.
Table 3. Income Regressions with Multiple Indicators
At the bottom of each column of Table 3 appears the predicted value of the log of Cuban per capita relative income. As can be seen, the predicted income values from both regressions (3.30 and 3.33) are very close to the DW calculation (3.30). In both cases the point estimates for Cuba are well within two standard errors of the DW measure.
Table 4. Indicator Regressions with Income as Explanatory Variable
INCOME AS EXPLANATORY VARIABLE
In this section I reverse the question of the previous section. I now ask, “What level of income would best explain the values of Cuba’s socioeconomic indicators”? To answer this question for each indicator, I first estimate equations of the following form:
xij = aj + bjyi + cjzi + uij (3)
where uij is an error term with E(uij) = 0. Let yC be the log of per capita Cuban relative income. Using the estimated coefficients for (3), aj, bj and cj, I construct for each indicator a residual for Cuba as follows:
uCj = xCj–(aj + bjyC + cjzC) (4)
I then search for the yC that “best fits,” where that term is made clear below.
Ordinary least squares (OLS) estimates of equation (3) for each indicator are shown in Table 4. Two equations were estimated for each indicator. The first uses the log of relative income as the only regressor. The second equation includes the demographic variables. As with the reverse of these equations in Table 2, the goodness of fit for all the indicators is quite high, and the coefficient of the log of relative income is always highly significant.
For a set of weights corresponding to the indicators, w = (wi,…, wJ), and a given level of the log of income, yC, define the following sum of squared errors:
S(w, yC) = Σjwj(uCj)2 (5)
where the residuals, uCj, j = 1,…, J, are determined according to equation (4). I now define the level of relative income that best explains the values of Cuba’s socioeconomic indicators (the one that “best fits”) as the value of yC that minimizes equation (5).
To be able to minimize equation (5) we need to determine a set of weights. I consider two cases. In the first case I set all weights equal to one. In the second case I use as weights the inverse of the standard error of the coefficients of the log of relative income from the two types of regressions in Table 4. Thus there are four different estimates. These are shown in Table 5. When the coefficient estimates from the regressions with the log of relative income as the sole independent variable are used, the best fitting values for the log of Cuban relative income are 3.34 with equal weights, and 3.26 with the inverse of the standard errors as weights. These estimates imply levels of income for Cuba that are 4% above and 4% below the DW measure. When the coefficient estimates from the regressions that include the demographic variables are used, the corresponding values are 3.35 and 3.23, or 5% above and 7% below the DW measure.
Table 5. Best Fitting Cuban Relative Income
To see how likely such deviations are, I computed the same type of estimates for each of the countries in the data set using the coefficient estimates from the regressions that included the demographic variables and the weighted sum of squares. Considering only the 21 countries that were included in the eleven indicator regressions, I found that for eight of them, the best fitting income level was more than 8% below the actual level, and for five of them it was more than 8% above. Among all 28 countries there is even more dispersion. Eighteen of them have best fitting income levels that are more than 8% above or below their actual measures. Consequently, the deviations between the estimates of yC and the DW measure are well within the variation we observe among the various countries.
In this paper I employ a variety of techniques for estimating Cuban per capita relative income in 1955 from socioeconomic indicators. The eight estimates nicely bracket the DW measure of 27% of U.S. per capita income. If we consider only those estimates that use the demographic variables, the estimates are within ±8% of the DW measure. The two most direct measures— multiple regression with income as the dependent variable—actually give estimates that are equal to, and 3% above, the DW measure. I believe this is significant evidence that the measure arrived at in Devereux and Ward (2009) through direct measurement is quite consistent with the values of Cuba’s socioeconomic indicators.
1. I wish to thank Pan Yijun and Carl Mbao for able research assistance.
2. The latest version of their paper also includes the Soviet Union, but I worked from an earlier draft.
3. It should be noted that the outlier with the lowest number of televisions per capita in Figure 3 is Chile. Neighbor Peru was also among the countries with few televisions.
4. The indicator of television availability was not included because three countries report zero televisions for 1955, and a fourth, Chile, nearly zero.