rev 5-09-09

Some people assume that a statistic stated in print or in the media must be, by definition, truth.  This is wrong both because people can be mistaken in what they say or write, and because some people purposely lie with statistics.  Statistics are purposely and mistakenly misused extensively in the gun control debate.  To recognize the misuses, one must have some understanding of statistics.  In addition to this misuse, much of the info one should have and understand is in statistical form.  For both these reasons, a person wanting to know the truth about guns and their relationships to crime and violence needs to be able to understand statistics more than for most other issues.

On this page we hope to explain both the misuses and the things you need to understand about statistical methods, studies and data besides the basics (like, what an average is).  We hope the explanations are understandable even if not strictly correct, but even simplified explanations of statistics require some effort to understand so don't expect it all to be easy.

Some statistics are intended just to be a measure of something, like the prevalence of gun ownership.  Other statistics try to suggest or show a relationship between two or more things, such as that one thing is partially caused by another.





Most statistical methods and studies are unable to prove that one thing is a cause of another.  The methods are usually only able to demonstrate that one thing correlates with another or with absense of the other.  That is, a properly done statistical study might show that one thing exists along with the other, but cannot prove that one of the things causes the other.  So, a study showing that higher rates of gun ownership rates tend to exist alongside lower violent crime rates would not prove that higher gun ownership rates cause lower violent crime rates.  Such a result would only suggest that higher gun ownership rates might cause reduced violent crime rates.

Given that a study showed a correlation between two things "a" and "b," it would be necessary to consider the possibility that:  "a" might be the cause of "b"; "b" might be the cause of "a"; each might be partially the cause of the other; and that both might be partially or wholly caused by some other, maybe unknown, factor(s).

People are tempted to eliminate some of the competing alternative explanations based on logic.  However, doing so demands a perfect understanding of the relationships between the factors being evaluated.  In other words, the "logic" must be correct.  Relying on logic to eliminate competing explanations typically results in erroneous interpretation of study results, especially if the process is being attempted by someone who knows little about the topic (like gun controllers who have never owned a gun deciding about gun matters impacting crime, violence, accidents, etc.).  A study that finds that there is a high gun ownership rate in a time and place where there is a high murder rate, for example, might have such findings in part because a high murder rate causes people to acquire guns to protect themselves, their families and their neighbors.  It is naive or dishonest to claim on the basis of such results that high gun ownership causes high murder rates.

There are some facts that can support the idea that one thing is the cause of another, as opposed to the reverse or both of them being caused by something else altogether.  One thing that is essential is a logical explanation (of the cause-and-effect relationship) consistent with known facts and other observations, and that has withstood the scrutiny of numerous people knowledgable of the field.  That is, one must ask how one thing could possibly cause the other.

Another fact that suggests that one factor is the cause of the other is that the second factor does not ever change until introduction or forced change of the first.  If violent crime rates in a state start dropping faster after a right-to-carry law goes into effect in that state, after accounting for any other changes that went into effect at the same time, this would be very strong indication that the right-to-carry law causes crime reduction.  On the other hand, a Washington, DC crime rate reduction trend that starts two years before a firearm ban in Washington, DC pretty much proves that the crime rate reduction was not caused by the ban.

Another fact that can suggest that one thing is a cause of others is that the thing was included in the analysis as a result of people suggesting for some theoretical reason that it might be the cause of factors that were found to be correlated in earlier analyses, and was found to have a higher correlation with the other factors than was observed between the other factors themselves.  Higher correlations are found between causes and effects than between multiple things that share the same cause.



For a statistic to be reliable, it must be based upon a large number of observations ("data points").  If the statistic is about cause and effect, the observations must have variation in both the suspected causal factors and the suspected effect.  Any other possible causes must be accounted for in the analysis.

This can be accomplished by taking observations in a given place over a long period of time, by taking observations in many representative places for a given time, or by taking observations in the many representative places over the long period of time.  If the times or places of observations are selected other than randomly from the entire range of possibilities, the statistics that result can only be applied to the times and places represented by the observations.  For example, an analysis of data from only a few places may not apply to other places.  And an analysis of data from a given period of time may not apply years later when other relevant conditions have changed.  Attempts to correct results from one time and place to apply otherwise by adjusting for race mix, population density, income level, etc. can very likely fail because of not correcting for all relevant factors (because of not knowing what all is relevant).

A second prerequisite for a statistic to be reliable is that it must agree with the actual observations rather than being someone's interpretation of the observations.  This is very often a problem in relation to statistics derived from surveys (asking people questions), and in relation to statistics derived from the "case control" method used extensively by medical researchers (addressed later herein).  This prerequisite is the reason that survey questions and method must be very carefully crafted in order to obtain valid results, and is the reason that dishonest researchers can purposely design a survey to show what they want rather than the truth.

For example, a survey of youths about carrying guns can be reliable about what youths say about carrying guns, but not about whether or not the youths actually do carry guns.  Statistics that say a certain number or portion of certain groups of youths carry guns or carry guns to school are simply lies or mistakes.

As another example, a survey in which people are asked if they favor banning assault weapons cannot be reliable about whether or not the public favors banning assault weapons.  It can only be reliable about the public favoring the banning of whatever they think assault weapons are.  Were the survey to ask people if they favored banning something that was already banned, they would all answer "no."  Were it to ask if they favored banning guns that could shoot just one bullet each time the trigger was pulled, as with a turn-of-the-century revolver, again they would answer "no."  Were it to ask people if they favored banning a gun that could shoot bullets continuously for as long as the trigger was held, like a real assault weapon, they would answer "yes" the same as they would answer about "assault weapons," but the survey would only show that the public is grossly ignorant about firearms since it would show that the public wants something that already exists.  Survey results must be evaluated on the basis of whether or not the survey questions are likely to be properly understood by the people asked the questions.

The same kind of rationale applies to surveys (and votes) about other nonexistent gun controller fabrications like "Saturday night specials."

It is possible that many such surveys are faulty only because of the ignorance of those authoring the surveys.  But, it is also likely that at least some of the surveys are designed by dishonest people who want to "prove" something that really isn't true.



Lies are often told with statistics by searching through all the available data and selecting out little pieces of it to report.  For example, years of data showing that rates of youth accidental deaths involving guns dropped almost every year for almost every locale and almost every age group and race group can be searched until a pair of years is found for which some age group and/or race group in some small village showed an increase.  This typically involves only very small numbers of cases.  For example, some village never has an accidental gun death but finally did in one year.  Small populations yield small numbers of cases that are not statisticly significant.  That is, the cases could just be randomly dispersed cases of the overall population rather than being indicative of a true relationship.

Selection of cases is one of the faults of the typical claims involving comparisons of different countries relating to firearms and different kinds of violence.  The country comparisons also suffer from the fact that the comparisons never have true measures of gun possession or account for all the applicable differences (besides gun laws) between the countries compared.

When statistics are stated about small subsets of an overall population, it is a safe bet that someone has searched through the available data to find something that appears to support what they want to say (rather than showing the whole picture that doesn't).  If someone does this you can be sure they are telling you a lie.

Lies are also told by misrepresenting whatever the data of a study or survey actually shows.  For example, a study finds that, in homes of one county where the instrument of a death over a period of a few years (ending 13 years ago) was a gun, the ratio of the sum of suicide deaths plus accidental deaths plus murders to the sum of the self defense killings was 42.7.  This gets misrepresented as, "a gun kept (or kept for self defense) in the home is (now and everywhere) 43 times more likely to be used to kill someone you know (or a loved one) than to be used (or used to kill) in self defense (or against an intruder)."

Another way in which lies are told using statistics is by making vague numerical (statistical) statements that are actually meaningless but make the unwary reader think they actually say something significant, or simply induce an emotional response in the reader because of the emotional topic of the statement.  For example, gun controllers make claims regarding children but do not say that they include, as "children," criminals that are 19 years old.  They claim that children have "access" to guns in the home but do not say if the access is such that the children could actually fire a gun, or if the access is unsupervised.  They make claims about undefined "assault weapons" and "Saturday night specials" or "junk" guns.

They say things like "death by firearms is the fastest growing method of suicide" because the gun suicide rate increased in one year by a larger proportion (percentage) than other types of suicide even though the rates for both years might have been insignificant in comparison to the rates for other methods.  For example, the Pacific Center for Violence Prevention, in referring to an "obviously unbiased" "study" by the fanatical gun control Violence Policy Center, indicated on their web site in late 1996 that the study's findings include that

"the rate at which Florida's concealed weapons law is arming criminals is increasing; of those who committed a crime after having received their concealed weapons license, one in five committed their crime with a gun; the number of revocations over the past year due to gun-related crimes committed after licensure jumped 35 percent over the previous seven-year tally."

Besides the question of how a permit to carry can have anything to do with getting a firearm, the initial statement could simply be a reflection of there being one single criminal act having been committed by a person with a permit where there had been none in the preceding year, the number being insignificant in both years.

The second statement misrepresents the facts differently by referring to gun crimes that were actually just instances of people with permits forgetting that they were carrying a gun or that they happened to be entering a place in which they were not allowed to carry the gun.

The final statement does the same, but also misrepresents the truth by using actual numbers rather than per capita rates.  If the number of people with permits to carry concealed weapons rises dramatically as a result of law permitting it, the number of mistakes people with permits make will naturally go up by a large percentage.  But, this does not mean that the percentage of permit holders that make mistakes per year after the law takes effect is any greater than the percentage of permit holders that made mistakes per year before it takes effect.

When someone claims something like "rising," "fastest growing" or "most," the reader should ask if the claim is about actual numbers, per capita rates, or percent change and should ask if the raw numbers involved are or can be large enough to be the result of something other than chance.  If the claim is not specific about these things, it's a sure bet that those making the claim are purposely trying to mislead.  Doubling, or 100 percent increase, of nothing is still nothing.  In most cases, only per capita rates are legitimate bases for comparisons.

Statistics can also be used to unduly impress people with true statements that are nonetheless irrelevant or meaningless.  Every claim should be examined for "what does it actually mean, and does it actually prove anything or mean anything to the discussion at hand."  For example, what is proved by, "Since 1979, more children (60,008) have died from gunfire in the United States than American soldiers died during the Vietnam and Gulf wars and in U.S. engagements in Haiti, Somalia, and Bosnia combined"?  What ages of "children"?  Why not include World War 2?  Why make the comparison at all?  What does it prove?

A common kind of lie that comes commonly from the medical community is saying that something increases your (or the) risk of this or that.  Largely this comes from the fact that most of the medical community, and their authors of studies, don't understand the thing called "risk factor" addressed in most of their studies—or they just use the confusing term to mislead.  You see, a risk factor is not something that causes risk to go up or down.  A risk factor is something for which the correlation with some condition is being checked or has been checked.  For example, the gun controllers examine possession of guns as a risk factor regarding the condition of dying or being injured.

But, when the docs find that a risk factor is highly significant, the kinds of studies they do are incapable of determining that the risk factor has any causal relationship to the condition of interest.  The values they associate with the risk factors are simply indications of the degree to which the factors correlate with the conditions of interest.

The statistics can even be used to predict levels of the conditions.  For example, one could predict levels of violent crime that might be observed in specific cities using a formula that would account for various things, including the portion of the population that has a gun.  But the ability to predict does not mean cause and effect.  Gun possession rate could be a good predictor of crime levels simply because more people get guns if people need them to protect themselves from violent crime.  After the medical community identifies some significant risk factor for some particular medical condition, they still have to do a lot of work to determine the extent to which the factor actually causes the condition.

If you read anything about "risk" out of the medical community, you will understand the material better if you replace their references to "risk factor" with "predictor."  And replace "risk" with "prediction ability."




If many different checks ("studies"), in various times and places, demonstrate uniformly that more of one thing occurs when more of another occurs--or if they uniformly demonstrate that more of one occurs when less of another occurs--then it becomes pretty much certainty that the two things are related (not necessarily one caused by the other).  Similar assurance can be obtained with fewer studies involving larger numbers of cases.  It might be just that the things are related, or it might be that one partially or totally causes the other.  If the studies together show that a thing occurs ONLY when another exists, or ONLY when it doesn't exist, it becomes definite that the one thing causes the other.  But a few small scale studies showing that two things tend to occur together are only sufficient to suggest possible relationships between factors.


The case-control method is used extensively by researchers in medical fields.  With this method, the researchers start with a number of instances (cases) of whatever the researchers are trying to find causes for.  For example, they might start with a number of people who had lung cancer.

The researchers then look among the cases for an abnormal frequency of some factor they suspect might be the cause of the end result they started with.  If they find that some factor, such as cigarette smoking history, exists among the cases considerably more than that factor exists among the population at large, represented by a "control" group selected at random from the overall population, the researchers can conclude that the factor might be the cause of the condition that was the original characteristic of the cases.  Unfortunately, medical researchers use the term "risk factor" to refer to the strength of the association between the end effect and the possible cause.  It is natural then that other people would mistakenly infer from the word "risk" that a causal relationship has been established by the research.  But, even the medical researchers imply or say so out of ignorance or in a conscious attempt to mislead the public.

For the method to have any validity for suggesting causal relationships, the observed frequency of the suspect characteristic must be "corrected" for departures of the case population characteristics from the characteristics of the overall population to which one would like to apply the statistic.  This must be done for all characteristics that are correlated with the factor being examined and have causal relationships to the end effect shared by all the case subjects.  That is, all other causes must be known and accounted for.

This is another point at which medical researchers typically fail in applying the case-control method to research about crime and violence.  The problem is that those researchers don't have a nearly adequate idea as to what factors they should correct for.  As a result, they tend just to correct for some demographic factors (e.g., race and age) as they would do in a study about disease.  People (criminologists) who are most knowledgable of guns, crime and violence decided long ago not to use the method because even they did not know enough to make the necessary corrections.

The main problem with the "case-control" method used by the medical community in their gun related studies is that it involves surveying members of "control" groups about their possession or their households' possession of guns or some categories of guns.  The inconvenient truth is that a very large portion of those who possess guns simply lie and say they don't possess any if asked by a stranger, especially any stranger associated with government.  This fact alone causes the doctors' studies to indicate erroneous things like, "a gun in the home increases the risk of suicide by a factor of 4.7."


Regression analysis is based on the idea that, if a number of different things each partially cause or prevent some result of interest, the relationships between all the things and the result can be represented by an algebraic equation the different parts of which can be determined given enough sets of data on the various things.  The equation would be like:

C0 + CxX + CyY + CzZ + etc. = RESULT

In this equation, the first factor (C0) represents all of the RESULT that cannot be explained by totalling factors consisting of variables like X, Y and Z times a constant for each variable.  The variables (X, Y, Z, etc.) are all the things that someone thinks might have some effect on the RESULT.  In a real-life study, such as those done by John Lott, Jr., the equation might be something like:

C0 + C1x(arrest rate) + C2x(percapita income) + C3x(population density) + C4x(unemployment rate) + C5x(police/citizen) + C6x(mo. right-to-carry in effect) + etc. = ROBBERY RATE

When the number of sets of data available for all the different terms in the equation except the constants ("C" values) are at least as great as the number of constants, it is possible to solve for those constants.  When this is done with statistical data, the solutions are estimates and the resultant equation cannot be expected to be perfectly accurate for predicting results from additional data sets.  However, the solutions can be expected to yield results that approximate actual end results, and to yield results that tend to be sometimes over actual values and sometimes under them rather than always being skewed.  If the number of data sets used is large, the method allows reliable determination of the likely range of each constant.

Some of the factors included in actual analyses are likely to be logarithms, squares, square roots or other mathematical functions of the actual factors for which correlation is being tested.  This is done because the relationships between those factors and the end effect are not linear, so using the mathematical functions permit the analysis to be more accurate.

In the actual analyses by Lott, there were about 50 different factors that were treated as possibly causing or impeding a type of crime rate.  He and his associate included a term that had a specific value for each county in order to account for the extent to which something peculiar about a county accounted for crime rate for that county.  He did the same for states.  He also did the same for years in order to account for fluctuations in crime rates from one year to another nationwide (i.e., "crime cycles").  These are what he refers to as "dummy" variables in his book.

The analyses by Lott and Mustard also simultaneously evaluated the factors for a large number of equations.  The equations included one for each of several different kinds of crime rates, plus equations for accidental gun death rate, accidental gun injury rate, suicide rate, etc.  In all this they had data on all those different results of interest separately for every county in the U.S., separately for each of 16 years (18 by the time Lott's book was published).  This is by far the most comprehensive group of analyses yet of the things that might increase or decrease different kinds of crime and violence.  Such analyses can only be performed using high capacity computers.  Once the data sets have been entered, the analysis can be repeated with various changes without a great deal of effort, and this is exactly what Lott and Mustard did in order to test the impact of adding or deleting various things.  In fact, if someone were to come up with nationwide county-level data on some factor they would like to have added, it would be rather easy to do so.