In common use, the term statistic is used to refer to a summary of a set of data (e.g. total rainfall in February was 15 cm), but in a more technical sense, a statistic refers to a function of a number of random variables. In the first use, no concept of probability is involved in the calculation of a statistic while, in the second, a statistic refers to a feature of a model which includes assumptions about the population and sampling process. Researchers often use summary statistics in an informal way to judge the importance of between group differences and to check for outliers. In principle, calculating a summary statistic is straightforward and in many areas of public policy summary statistics are sufficient to answer important questions (e.g. is GDP higher this year than last year). In other areas, in particular where questions of causation are important, it is usually not possible, however, to provide direct answers to policy questions by simply summarising observed data. In these situations, researchers need to calculate measures of uncertainty and conduct statistical tests and therefore need to resort to using statistical models.
For example, an article in Monday’s Guardian by Madeleine Bunting discussed whether ethnic diversity was a source of social cohesion or of social conflict. In order to examine the association between ethnic diversity and social cohesion, studies need to use statistical models rather than descriptive statistics partly because there are big differences in the characteristics of areas that are related to ethnic diversity and that might also influence social cohesion, in particular, area deprivation. For example, if we wanted to compare the level of trust that people in, say, Richmond-upon-Thames (where around 30 percent of the population are from non-White backgrounds) and people in Brent (where over 80 percent of the population are from non-White backgrounds) have for their neighbours there is little point in simply comparing the responses that people give to survey questions. We know that people living in more deprived neighbourhoods have lower levels of trust in their neighbours than people in better-off neighbourhoods, so we would expect that people in Richmond-upon-Thames had much higher levels of trust in their neighbours than people in Brent. Instead of relying on a summary statistic we can construct a statistical model of the relationship between diversity and cohesion, however, in which the assumptions of the model allow us to identify the effect of diversity on cohesion after adjusting for differences in deprivation between areas. There might be, however, only limited variation in ethnic diversity which is independent of deprivation (i.e. all areas with high proportions of the population from ethnic minority backgrounds tend to be deprived). If this is the case, we might still not be able to say anything meaningful about the relationship between ethnic diversity and social cohesion because the division between ethnically mixed and ethnically homogeneous areas is also a division between deprived and more affluent areas. Modelling might still produce estimates of the effect of diversity on cohesion but, in this situation, they would largely reflect the structure imposed on the data from the model rather than information in the data itself.
In order to illustrate the difficulty of separating ethnicity and deprivation at the area level, the figure below plots estimates of the social capital in neighbourhoods (measured using questions from the Citizenship Survey about trust, co-operation, shared values and belonging) against the proportion of the population from ethnic minority backgrounds separately for four English regions. Social capital in neighbourhoods was estimated for neighbourhoods grouped into deciles of the index of multiple deprivation giving ten observation in each region with the most affluent neighbourhoods in decile 1 and the most deprived neighbourhoods in decile 10 (the approach is described here). The figure shows that overall there is a strong negative relationship between the level of social capital in different types of area and the proportion of the population from an ethnic minority background. The points are coloured, however, according to the decile of the deprivation index and show that in each region there is also a significant negative relationship between social capital and neighbourhood deprivation. Studies have concluded that after adjusting for neighbourhood deprivation the association between ethnic diversity and social capital is actually positive (i.e. the reverse of the association actually observed in the data). The extent to which this result isolates the effect of varying ethnic diversity between neighbourhoods with similar levels of deprivation or is based on a comparison of what are fundamentally different types of area having different levels of both diversity and deprivation needs to be considered, however.
My point here is that although whether diversity is positively associated with social conditions in neighbourhoods is an important question, researchers can’t answer that or similar questions without recourse to using modelling techniques which require a number of assumptions. The reliability of the conclusion that diversity is positively associated with social cohesion does depend on whether those assumptions are reasonable. In most cases, modelling assumptions are made for convenience rather than realism and moving from the results of statistical models to public policy conclusions is something that we should only do with caution.