Are There Algorithms that Can Discover Causal Structure?

May 1, 1998

Report Number

514

Authors

Leo Breiman

Citation

Electronic Journal of Probability</em>, Vol. 5 (2000) Paper no. 2, pages 1-18

Abstract

For nearly a century, investigators in the social and life sciences have used regression models to deduce cause-and-effect relationships from patterns of association. Path models and automated search procedures are more recent developments. However, these formal procedures tend to neglect the difficulties in establishing causal relations, and the mathematical complexities tend to obscure rather than clarify the assumptions on which the analysis is based. This paper focuses on statistical procedures that seem to convert association into causation.

Formal statistical inference is, by its nature, conditional. If maintained hypotheses A, B, C, .... hold, then H can be tested against the data. However, if A, B, C, .... remain in doubt, so must inferences about H. Careful scrutiny of maintained hypotheses should therefore be a critical part of empirical work---a principle honored more often in the breach than the observance.

Spirtes, Glymour, and Scheines have developed algorithms for causal discovery. We have been quite critical of their work. Korb and Wallace, as well as SGS, have tried to answer the criticisms. This paper will continue the discussion. The responses may lead to progress in clarifying assumptions behind the methods, but there is little progress in demonstrating that the assumptions hold true for any real applications. The mathematical theory may be of some interest, but claims to have developed a rigorous engine for inferring causation from association are premature at best. The theorems have no implications for samples of any realistic size. Furthermore, examples used to illustrate the algorithms are diagnostic of failure rather than success. There remains a wide gap between association and causation.

PDF File

514.pdf

Postscript File

514.ps.Z