J.F. Phillips
© 2001


Demographics from the 1996 census for virtually all of British Columbia, Canada (divided into 144 jurisdictions and areas) are regressed against 1997 and 1998 rape rates and overall violent crime rates for those same jurisdictions and areas.  The best predictor of rape rates and violent crime rates was found to be the portion of the population that is aborigine (native American), probably because the portions aboriginal correlate well with multiple factors that are generally accepted as being contributors to criminality.  When aboriginal proportion is excluded from the regression, a few of those factors become the best, quite effective predictors of crime rates.

Each percent of unemployment rate was found to predict .184 rapes (s=.026) and .774 violent crimes (s=.142) per 1000 population.  Each percent of families having only one parent in the home was found to predict .084 rapes (s=.031) and .997 violent crimes (s=.170) per 1000 population.  Unemployment and 1-parent homes account for 23.5 and 15.0 percent, respectively, of the total violent crime and 30.5 and 3.3 percent, respectively, of the rape rates. [MAIN RESULTS]


It is generally accepted that a number of demographic characteristics of populations can be used as predictors of the prevalence of various types of crime in those particular populations, and that some of those characteristics have this predictive power because they are caused by crime or are caused by the same things as cause or impede crime, or because they are contributing causes of crime or are crime reducing factors (or both).



This analysis involved correlation of two types of data from two different sources.  The correlation was actually required on two levels, first to match data of the two types in preparation for the correlation that occurs in the statistical analysis.

Data on different kinds of crime for almost every different policing jurisdiction in British Columbia, Canada were available in annual reports from the web site of the British Columbia (B.C.) Attorney General (AG).  The data used were for 1997 and 1998, and for all types of violent crime for each policing jurisdiction.  The data that were not available were for a very small number of aboriginal ("First Nation") policed jurisdictions that, according to a representative of the AG office, dealt with very minor levels of crime.

The other type of data was demographics data available at the Stats Canada web site.  This data, from the 1996 census, include statistics for cities, towns, aboriginal reserves, and rural areas outside of those areas but within subdivisions of the B.C. regional districts (RDs).  The data include statistics on population, land area, family characteristics, home characteristics, and, by sex: age, ethnicity, language, income, employment, education, births and deaths.


The first level of correlation of the data was that involved in matching crime data to the demographics data.

Correlation of the two kinds of data was almost straightforward in the case of policing jurisdictions identified by the AG as being policed by "municipal" or "independent" police forces.  The municipal jurisdictions are cities and towns with populations greater than 5000, policed by the Royal Canadian Mounted Police (RCMP) under contracts with the cities/towns.  The "independent" jurisdictions are also cities and towns with populations greater than 5000, but these jurisdictions provide for their own policing.  In both cases, the policing jurisdictions are cities or towns that are also census units, and in some cases also include one or more aboriginal reserves that are adjacent to the cities or lie within their boundaries.

These policing jurisdictions have directly corresponding demographics data if the policing jurisdictions include no reserves.  If a jurisdiction includes one or more reserves, the reserves demographic data can be added to the city/town data to readily obtain the corresponding data.  If there is only one reserve in the vicinity, it is obviously the one to be added to the city/town.  If there are more than one nearby, there is some ambiguity since the AG reports do not identify the exact reserve(s) to add in.  If any of the reserves have significant populations, the ambiguity could usually be resolved by comparing the policing jurisdiction population for the city/town with the total of the population for the city/town and the reserve(s).  If a reserve population is insignificant, the decision as to whether or not the specific reserve is part of a particular policing jurisdiction is also insignificant.

Correlating crime data and demographics data was a definite challenge for policing jurisdictions identified by the AG as "provincial."  These RCMP police forces provide policing for small towns, reserves and rural areas that don't lie within towns or reserves.  A given police office might serve multiple small towns and reserves, plus neighboring rural areas.  One of the reasons for difficulty in correlating the two sets of data is that some of the provincial policing jurisdictions are given names that are not the same as the names of any census areas.

If a particular RD subdivision includes only one provincial police jurisdiction, it was assumed that the boundary of the police jurisdiction was the same as the boundary of the RD subdivision.  This is probably not exactly true in all cases, but the errors caused by the assumption would be small because the boundaries of the RD subdivisions and policing jurisdictions are generally in sparsely populated areas if they are not at natural boundaries such as rivers.

In some cases RD subdivisions each enclose multiple provincial policing jurisdictions serving rural areas outside of cities and towns.  In each such case, the crime data for the multiple policing jurisdictions within the RD subdivision were combined for matching with the demographics data for the subdivision and any towns and reserves within the outer boundaries of the subdivision (excluding any towns and reserves within the boundaries that are separately identified as policing jurisdictions).  The combination was necessary because the exact boundaries of the policing jurisdictions are not specified in the AG data, and demographics data is not available at the Stats Canada site for areas within such boundaries anyway.

The AG crime data include the population of the area served by each policing jurisdiction.  The crime data populations were for 1997 and 1998, so were most often slightly larger than the 1996 census data for the corresponding census areas.  These populations were used as a tool to correlate the policing jurisdictions with corresponding census areas or combinations thereof.  The comparison of the 1997 crime populations with the 1996 census populations in most cases confirmed that the areas covered by the crime data were closely the same areas as covered by the census data.  In an occasional case, for example, a provincial jurisdiction with a name the same as that of a town also had practically the same population as given in census data for that town, leading to the assumption that the provincial jurisdiction is the town and includes little if any area outside of such town.

The correlations between the policing jurisdictions and the census units is detailed in this separate document, along with the populations and notation of some correlations that are questionable but considered of minor significance.


Crime Data Preparation

The crime data available in the AG reports include numbers of crimes of essentially all types, including violent (against people) crimes.  The analysis reported here is only for rape and total violent crime.  The numbers for the various kinds of violent crime, other than assault and rape, were generally too small to permit statistically significant analysis.  The total violent crime rates were calculated by combining the numbers for murder, attempted murder, rape, assault and robbery.  The results would be practically the same if only assault were included since the numbers for assault were far greater than all the others combined.

Averages of the rape rates and violent crime rates for 1997 and 1998 were calculated for each policing jurisdiction to increase the numbers, minimize the impacts of rare crime events, obtain rates more like typical for each jurisdiction, and increase the statistical validity of the results.  It should be noted that the numbers of crime other than assault for all but the largest policing jurisdictions vary considerably from year to year whereas the demographics of those areas change much more gradually from year to year, and even from one census to the next.

Demographics Data Preparation

Demographics (census) data used in the analysis (for each city, town, settlement, reserve, and rural area) included: population, land area, visible minority population, aboriginal population, people speaking languages other than english/french/aboriginal, french-speaking population, number employed, their average annual income, unemployment rate, number of families with only one parent at home, number with both at home, number of people over 14 years old, and number of those over 14 that do not have highschool certificates.

The investigator calculated (from the source data) the logarithm of the population density, per-capita annual income, unemployment rates for combined areas, portions of families with only one parent in the home, portions of those over 14 years old that do not have highschool certificates, and portions of the population that are: visible minority, aboriginal, french speaking, other language speaking.

Statistical Analysis

The statistical analysis was done in stages, starting with separate analyis of violent crime rates for: the twelve jurisdictions that have their own police forces; the 59 mostly mid-sized towns and cities policed by the RCMP under contract; and the small town, reserve and rural areas policed by the RCMP (divided into 73 parts).  Then, the violent crime rates for the entire data set was analyzed.  Finally, the analysis was done for rape in all the jurisdictions and areas.

Each of the five analyses consisted of reviewing scatter plots of the various potential predictors against each other and the crime rates, followed by determination of the correlation coefficient for each of the variables against each of the others, and followed finally by multiple linear regression of crime rates against potential predictor variables.




These jurisdictions are identified as "independent" in the B.C. AG reports.  They are all cities, including the largest in B.C.

The scatter plot of crime rates vs. per-capita annual salary was of interest.  Half (six) of the jurisdictions had salaries around $23,000 and $24,000.  The other half were spread out from $26,000 to $46,000.  The crime rates for the low salary group were spread out from 10 to 20 per 1000 population, while the crime rates for the high-salary group were all from about 4 to 9 per 1000.

It was apparent from the plot that the low-pay cities were the high-crime cities and that the high-pay cities were the low-crime cities.  The calculated correlation coefficient was high but would have been even higher if the relationship had been more like a direct straight-line one.  The four cities that had the low per-capita incomes and the highest crime rates were the cities with the highest population densities: New Westminster, Vancouver, Esquimalt and Victoria.

Correlation coefficients between crime and various demographic factors were:

  .790  part of population aborigine
  .735  part of families with only one parent in the home
  .680  population density (log scale)
 -.622  per-capita income
  .558  unemployment rate
  .432  part over 14 years with no highschool certificate

Correlations between some of the demographic factors are also interesting, and often contrast with the correlations for rural B.C. and overall B.C.  For these cities (including the largest in B.C.) aborigines and the French-speaking tend strongly to live where the population densities are highest.  The high population density cities also tend strongly to have high unemployment, low per-capita incomes, and high portions of one-parent households.  Population density had practically no correlation with lack of highschool certificates.  The correlation with one-parent families was even greater for aborigine and French-speaking populations than for population density.

As one would expect, unemployment had high correlation with one-parent families and high negative correlation with per-capita income, and lack of highschool certificates correlated very highly, and negatively, with per-capita income.

The multiple linear regression for the jurisdictions with their own police forces (excluding aboriginal proportion) was not very conclusive, as one should expect from the fact that the group included only twelve cities.  Portion of families with only one parent in the home yielded a moderately good result (p=.006) but with a relatively large standard deviation for the crime axis intercept value.  Population density yielded a moderately good result (p=.015) with standard deviation for the intercept value about half as great.


These are all small to mid-sized cities, and are identified as "municipal" in the AG reports.

Significant correlations with crime rates include:

  .71  part of families with only 1 parent
  .58  part of population that is aborigine
  .48  part over 14 with no highschool certificate
  .43  unemployment rate

Aboriginal population proportion correlates moderately high with unemployment, and a bit less with portion of families having only one parent in the home, and with portion of those over 14 years old having no highschool certificate.  Unemployment correlates highly with not having the highschool certificate and highly/negatively with per-capita income.  Per-capita income also correlates highly/negatively with portion of those over 14 having no highschool certificate.  The correlation between 1-parent families and lack of highschool certificate was only moderately high.

The multiple logistic regression for these 59 "municipal" police jurisdictions was an unqualified success.  The final regression results were:

PREDICTOR     VALUE      s      p    EXPLAINS
CONSTANT    -36.79     9.21   <.001
INCOME ($k)   .736      .244   .004     3.0%
1PARENT (%)  1.476      .213  <.001    49.5%
NOHSCERT(%)   .422      .128   .002     7.8%


The investigator had combined data into 73 groupings for these "provincial" policed areas.

The significant correlations with crime for these rural areas were:

  .70  portion of the population that is aboriginal
  .50  unemployment rate
 -.39  per-capita annual income
 -.385  population density (people per sqare kilometer)
  .36  portion of those over 14 that have no highschool certificate

There was moderately high correlation between population density and per-capita income and minority population, and moderately high negative correlation between population density and aboriginal population, unemployment, 1-parent households, and adults without highschool certificates.  Aboriginal population was highly correlated with unemployment and less highly correlated with 1-parent homes and lack of highschool certificates.  Aboriginal population had a high negative correlation with per-capita income, possibly because the aboriginal populations tended to be in the countryside, where per-capita income was lower than in the towns.

Besides tending to live in the areas (towns) with the higher population densities, minorities tend strongly to be in areas where numbers of people without highschool certificates were low—implying that the minorities tend to have highschool certificates.  The places where minorities tended to live were those that tend to have higher per-capita incomes—the towns rather than the countryside.

As one should expect, unemployment correlated highly with 1-parent homes and lack of highschool certificates—and more highly (and negatively) with per-capita income.

The multiple regression for the provincial-policed jurisdictions was only moderately conclusive.  A good regression was obtained with unemployment rate and portion of families with only one parent in the home as predictors.

PREDICTOR     VALUE     s      p    EXPLAINS
CONSTANT     -4.08    3.89   .298
UNEMPL (%)     .744    .208  .001     25.4%
1PARENT (%)    .921    .333  .007      7.4%

The regression was moderately improved, especially with regard to the uncertainty of the constant (intercept), by using only the unemployment rate in the regression, with the following results.

PREDICTOR     VALUE     s      p    EXPLAINS
CONSTANT    + 4.17    2.62   .116
UNEMPL (%)     .979    .199 <.001     24.2%


The significant correlations of various demographic factors with total violent crime rates for all of B.C. (n=144) were:

  .612  aboriginal population (portion)
  .505  portion of families with only one parent in the home
  .485  unemployment rate
  .408  portion of those over 14 yrs old with no highschool cert
 -.367  per-capita annual income

Places with higher population densities (e.g., towns and cities) tended strongly to have high minority populations, low aboriginal populations, high employment, high per-capita incomes, and low portions of those over 14 years old that do not have highschool certificates.  Places with proportionally higher aboriginal populations tended strongly to have very high violent crime rates, low population densities, very high unemployment, high portions without highschool certificates and low per-capita incomes.

As would be expected, again, high per-capita incomes were associated with low unemployment (meaning there was competition for jobs) and high portions without highschool certificates (meaning the less educated tended to be the ones without jobs).  The correlation of income with 1-parent homes was low.

The multiple linear regression, with portion aboriginal excluded, was quite conclusive:

PREDICTOR     VALUE     s      p    EXPLAINS
CONSTANT     -5.97     2.40  .014
UNEMPL (%)     .774    .142 <.001     23.5%
1PARENT(%)     .997    .170 <.001     15.0%

This result is quite similar to the result for only the provincial jurisdictions.  The last parameter dropped prior to this regression was the portion of those over 14 years old that did not have highschool certificates.


The most significant correlations of factors with rape rates were:

  .724  part of population aborigine (> than for all violent crime)
  .553  unemployment rate (> than for all violent crime)
  .466  part over 14 with no highschool certificate (about same)
 -.407  population density (log scale) (nonsign. for viol. crime)
 -.385  per-capita income (about same as for all violent crime)
  .326  part of families with only one parent in the home (less)

The correlation between the rape rates and the overall violent crime rates was high (.814), indicating that much of what affects one probably also affects the other in about the same way.

The multiple regression with the highest statistical significance levels, but excluding aboriginal proportion, yielded a result that higher population density (logarithmic) predicts a large reduction in rape rate.  The population density explained 16.6 percent of the rape rates.  As there are few viable explanations (other than isolated men becoming frustrated) for why reducing population density would increase rape rates, it is most likely that the population density is a good predictor mostly because it correlates highly with other factors that are known to affect criminality.  For example, high population density in B.C. correlates with reduced unemployment.  Dropping population density from the regression resulted in unemployment taking an even greater role in the regression equation than it did when population density was included, as follows:

PREDICTOR     VALUE     s      p    EXPLAINS
CONSTANT     -1.196    .444  .008
UNEMPL (%)     .184    .026 <.001     30.5% (17.3 when pop'n dens. incl)
1PARENT(%)     .084    .031  .009      7.3%



The few demographic factors found to be good predictors of violent crime rates are no surprise.  The available demographic measures that were found to be most significant have been found so in any number of studies of crime in the U.S.  As found by other studies, crime in areas with high populations of native Americans tends strongly to be higher than other areas, probably because the people in those high crime areas, and particularly the native Americans, are plagued by multiple handicaps.

Besides the fact that a few factors (chiefly unemployment and 1-parent homes) were found to be good predictors and probably causal factors of rape, assault and overall violent crime, it is important that some other factors included in the analyses were not found to be good predictors—indicating that they are unlikely to be strong causes of violent crime in B.C., Canada.  These include population density, portion French speaking, portion speaking other languages, total visible minority population, and (except for "municipal" jurisdictions) per-capita income and portion of those over 14 years old who do not have highschool certificates.

It is also significant that correlations between various factors were considerably different between the different kinds of policing jurisdictions, which is essentially between metropolitan areas, cities and rural areas.

This analysis was limited to a great extent by availability of data.  Were data available on other things likely to contribute to or mitigate rape or overall violent crime, the study could be more comprehensive.

The analysis could also be improved with the use of more complete data applicable to the demographics addressed in this study.  The income data was somewhat problematic for two reasons.  First, data was not available for the individual census units to enable correction for taxation and government subsidies to people.  The data was found for regional districts only and indicates that the income corrections could be significant.  Also, the income data was missing in the Stats Canada reports for small aboriginal reserves, to protect privacy of people.  This resulted in calculated per-capita incomes being artificially reduced by small amounts for areas including such reserves.

The analysis might also be improved if the crime data separately specified that crime which occurred on reserves and in individual towns and settlements within the policing jurisdictions that include other areas.

Data for 1999 are now posted on the AG web site, but the applicability of the 1996 demographics data to 1999 is less than for 1997 and 1998.  When 2001 census data is available, it could be averaged with the 1996 data to obtain demographics data that should be applicable to 1999 crime data.


A numerical measure of correlation between two variables is a measure of how closely one variable is proportional to the other.  The values range from negative one to positive one.  A value of zero means the two variables are entirely unrelated.  A value of positive one means that one of the variables is perfectly proportional to the other, and vice versa, and that an increase of one variable corresponds to an increase of the other.  A value of negative one means the same except that it means an increase of one corresponds to a decrease of the other.

p–A "p" value is the maximum probability (possibility) that the seeming relationship between a variable (like crime rate) and a possibly predictive variable (like proportion of families with only one parent in the home) could be just a result of chance.  That is, it is the maximum possible probability that there is not a true relationship between the two variables.  If p=.001, for example, there is only one chance in a thousand that there is not a true relationship (correlation) between the two variables.  Low values of "p" are therefore desireable, and a value of .05 is typically considered the maximum acceptable as an indication of statistical significance (validity).  A value of .05 means the associated finding is significant at the 95 percent level of confidence, or that one can be 95 percent confident that the finding is true.

s—This is called a standard deviation.  It is a measure of how much variation or variability one can expect.  One can think of it as like a "plus or minus" value except that there is a probability that values will lie outside the basic value plus or minus the standard deviation, and there is a considerably lower probability that values will lie outside the basic value plus or minus 2 standard deviations, and even a much smaller probability that values will lie outside the basic value plus or minus 3 standard deviations, etc.  It is best that an "s" value be small in relation to the basic value (no more than about 20 percent of the basic value).

Copyright Notice: Permission is granted by the author to reproduce this report, or to reproduce any part of it with attribution.


Please contact the author with any questions or constructive comments you may have on this study and report.  Also please advise of any sources of further data that could result in an improved analysis.