Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Bias (statistics)

From Wikipedia, the free encyclopedia
Systemic inaccuracy
For other uses, seeBias (disambiguation).

In thefield of statistics,bias is a systematic tendency in which the methods used to gatherdata andestimate asample statistic present an inaccurate, skewed or distorted (biased) depiction of reality. Statistical bias exists in numerous stages of the data collection and analysis process, including: the source of the data, the methods used to collect the data, theestimator chosen, and the methods used to analyze the data.

Data analysts can take various measures at each stage of the process to reduce the impact of statistical bias in their work. Understanding the source of statistical bias can help to assess whether the observed results are close to actuality. Issues of statistical bias has been argued to be closely linked to issues ofstatistical validity.[1]

Statistical bias can have significant real world implications as data is used to inform decision making across a wide variety of processes in society. Data is used to inform lawmaking, industry regulation, corporate marketing and distribution tactics, and institutional policies in organizations and workplaces. Therefore, there can be significant implications if statistical bias is not accounted for and controlled. For example, if a pharmaceutical company wishes to explore the effect of a medication on the common cold but the datasample only includes men, any conclusions made from that data will be biased towards how the medication affects men rather than people in general. That means the information would be incomplete and not useful for deciding if the medication is ready for release in the general public. In this scenario, the bias can be addressed by broadening the sample. Thissampling error is only one of the ways in which data can be biased.

Bias can be differentiated from other statistical mistakes such asaccuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria. Other forms of human-based bias emerge in data collection as well such asresponse bias, in which participants give inaccurate responses to a question. Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously. Ideally, all factors are controlled and accounted for.

Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested), or from the phenomenon ofrandom errors.[2] The termsflaw ormistake are recommended to differentiate procedural errors from these specifically defined outcome-based terms.

Bias of an estimator

[edit]
Main article:Bias of an estimator

Statistical bias is a feature of astatistical technique or of its results whereby theexpected value of the results differs from the true underlying quantitativeparameter beingestimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: letT{\displaystyle T} be a statistic used to estimate a parameterθ{\displaystyle \theta }, and letE(T){\displaystyle \operatorname {E} (T)} denote the expected value ofT{\displaystyle T}. Then,

bias(T,θ)=bias(T)=E(T)θ{\displaystyle \operatorname {bias} (T,\theta )=\operatorname {bias} (T)=\operatorname {E} (T)-\theta }

is called the bias of the statisticT{\displaystyle T} (with respect toθ{\displaystyle \theta }). Ifbias(T,θ)=0{\displaystyle \operatorname {bias} (T,\theta )=0}, thenT{\displaystyle T} is said to be anunbiased estimator ofθ{\displaystyle \theta }; otherwise, it is said to be abiased estimator ofθ{\displaystyle \theta }.

The bias of a statisticT{\displaystyle T} is always relative to the parameterθ{\displaystyle \theta } it is used to estimate, but the parameterθ{\displaystyle \theta } is often omitted when it is clear from the context what is being estimated.

Types

[edit]

Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.

Data selection

[edit]

Selection bias involves individuals being more likely to be selected for study than others,biasing the sample. This can also be termed selection effect,sampling bias andBerksonian bias.[3]

  • Spectrum bias arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of thesensitivity and specificity of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones.[4]
  • Observer selection bias occurs when the evidence presented has been pre-filtered by observers, which is so-calledanthropic principle. The data collected is not only filtered by the design of experiment, but also by the necessary precondition that there must be someone doing a study.[5] An example is the impact of the Earth in the past. The impact event may cause the extinction of intelligent animals, or there were no intelligent animals at that time. Therefore, some impact events have not been observed, but they may have occurred in the past.[6]
  • Volunteer bias occurs when volunteers have intrinsically different characteristics from the target population of the study.[7] Research has shown that volunteers tend to come from families with higher socioeconomic status.[8] Furthermore, another study shows that women are more probable to volunteer for studies than men.[9]
  • Funding bias may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor.[10]
  • Attrition bias arises due to a loss of participants, e.g., loss of follow up during a study.[11]
  • Recall bias arises due to differences in the accuracy or completeness of participant recollections of past events; for example, patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.

Hypothesis testing

[edit]
See also:Uniformly most powerful test

In the Neyman–Pearson framework, the goodness of a hypothesis test is determined by itstype I and type II errors.[12]Type I error, orfalse positive, happens when the null hypothesis is correct but is rejected. The false positive rate is written asα{\displaystyle \alpha }.Type II error, orfalse negative, happens when the null hypothesis is not correct but is accepted. The false negative rate is written asβ{\displaystyle \beta }.

For instance, suppose that speeding is defined as having average driving speed limit is below 85 km/h, and let the null hypothesis be "not speeding". If someone receives a ticket with an average driving speed of 70 km/h, the decision maker has committed a Type I error. Conversely, if someone does not receive a ticket with an average driving speed of 90 km/h, the decision maker has committed a Type II error.

Generally, a statistical test may decreaseα{\displaystyle \alpha }, but possibly at the price of increasingβ{\displaystyle \beta }, and vice versa. For example, the test may be very sensitive to true positives, but at the price of creating many false positives, and vice versa. Furthermore, whereasα{\displaystyle \alpha } depends on just the statistical test itself and the null hypothesisH0{\displaystyle H_{0}},β{\displaystyle \beta } depends on the statistical test and an unknown alternative hypothesisH1{\displaystyle H_{1}}. The Neyman–Pearson framework bypasses the difficulty with having an unknownH1{\displaystyle H_{1}} by imposing a kind of uniformity. That is, tests are good iff they work well on anyH1{\displaystyle H_{1}}.

Formally, define the following:

Using the previous definitions, we havePr(reject H0|H0,T)=α(H0,T),Pr(reject H0|H1,T)=1β(H1,T){\displaystyle \Pr({\text{reject }}H_{0}|H_{0},\mathrm {T} )=\alpha (H_{0},\mathrm {T} ),\quad \Pr({\text{reject }}H_{0}|H_{1},\mathrm {T} )=1-\beta (H_{1},\mathrm {T} )}whereH1{\displaystyle H_{1}} is an unspecified alternative hypothesis.

We say that the test isunbiased (in the Neyman–Pearson sense) iff for any alternative hypothesisH1{\displaystyle H_{1}},Pr(reject H0|H1,T)α(H0,T){\displaystyle \Pr({\text{reject }}H_{0}|H_{1},\mathrm {T} )\geq \alpha (H_{0},\mathrm {T} )}Note that the right side of the equation cannot possibly be greater thanα(H0,T){\displaystyle \alpha (H_{0},\mathrm {T} )}, because this formalism allowsH1{\displaystyle H_{1}} to be exactly the same asH0{\displaystyle H_{0}}, in which casePr(reject H0|H1,T)=α(H0,T){\displaystyle \Pr({\text{reject }}H_{0}|H_{1},\mathrm {T} )=\alpha (H_{0},\mathrm {T} )}In short: an unbiased test is a test that minimizes the maximal false negative rate over all alternative hypotheses.

Generally, one considers not a single test, but an entire family of tests, using atest statistic. LetT{\displaystyle T} be a test statistic. For eachsignificance levelp[0,1]{\displaystyle p\in [0,1]}, the corresponding test is to check whetherT>p{\displaystyle T>p}. If so, then the null hypothesis is rejected at significance levelp{\displaystyle p}. Otherwise, it is accepted.

We say that the test statistic (or the family of tests) isunbiased (in the Neyman–Pearson sense) iff for any significance levelp{\displaystyle p}, it is unbiased.[12]: 6 [13]

Estimator selection

[edit]

Thebias of an estimator is the difference between an estimator's expected value and the true value of the parameter being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator, in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons. First, an unbiased estimator may not exist without further assumptions. Second, sometimes an unbiased estimator is hard to compute. Third, a biased estimator may have a lower value of mean squared error.

  • A biased estimator is better than any unbiased estimator arising from thePoisson distribution.[14][15] The value of a biased estimator is always positive and the mean squared error of it is smaller than the unbiased one, which makes the biased estimator be more accurate.
  • Omitted-variable bias is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.

Analysis methods

[edit]
  • Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, thesyndemic involvingobesity anddiabetes may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
  • Ineducational measurement, bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit."[16] The source of the bias is irrelevant to the trait the test is intended to measure.
  • Observer bias arises when the researcher subconsciously influences the experiment due tocognitive bias where judgment may alter how an experiment is carried out / how results are recorded.

Interpretation

[edit]

Reporting bias involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.

Addressing statistical bias

[edit]

Depending on the type of bias present, researchers and analysts can take different steps to reduce bias on a data set. All types of bias mentioned above have corresponding measures which can be taken to reduce or eliminate their impacts.

Bias should be accounted for at every step of the data collection process, beginning with clearly defined research parameters and consideration of the team who will be conducting the research.[2]Observer bias may be reduced by implementing ablind ordouble-blind technique. Avoidance ofp-hacking is essential to the process of accurate data collection. One way to check for bias in results after is rerunning analyses with different independent variables to observe whether a given phenomenon still occurs in dependent variables.[17] Careful use of language in reporting can reduce misleading phrases, such as discussion of a result "approaching" statistical significant as compared to actually achieving it.[2]

See also

[edit]

References

[edit]
  1. ^Cole, Nancy S. (October 1981)."Bias in testing".American Psychologist.36 (10):1067–1077.doi:10.1037/0003-066X.36.10.1067.ISSN 1935-990X.
  2. ^abcPopovic, Aleksandar; Huecker, Martin R. (June 23, 2023)."Study Bias".Stat Pearls.PMID 34662027.
  3. ^Rothman, Kenneth J.;Greenland, Sander;Lash, Timothy L. (2008).Modern Epidemiology.Lippincott Williams & Wilkins. pp. 134–137.
  4. ^Mulherin, Stephanie A.; Miller, William C. (2002-10-01). "Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation".Annals of Internal Medicine.137 (7):598–602.doi:10.7326/0003-4819-137-7-200210010-00011.ISSN 1539-3704.PMID 12353947.S2CID 35752032.
  5. ^Bostrom, Nick (2013-05-31).Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge.doi:10.4324/9780203953464.ISBN 978-0-203-95346-4.
  6. ^Ćirković, Milan M.; Sandberg, Anders; Bostrom, Nick (2010)."Anthropic Shadow: Observation Selection Effects and Human Extinction Risks".Risk Analysis.30 (10):1495–1506.Bibcode:2010RiskA..30.1495C.doi:10.1111/j.1539-6924.2010.01460.x.ISSN 1539-6924.PMID 20626690.S2CID 6485564.
  7. ^Tripepi, Giovanni; Jager, Kitty J.; Dekker, Friedo W.; Zoccali, Carmine (2010)."Selection Bias and Information Bias in Clinical Research".Nephron Clinical Practice.115 (2):c94–c99.doi:10.1159/000312871.ISSN 1660-2110.PMID 20407272.S2CID 18856450.
  8. ^"Volunteer bias".Catalog of Bias. 2017-11-17. Retrieved2021-12-18.
  9. ^Alex, Evans (2020)."Why Do Women Volunteer More Than Men?". Retrieved2021-12-22.
  10. ^Krimsky, Sheldon (2013-07-01)."Do Financial Conflicts of Interest Bias Research?: An Inquiry into the "Funding Effect" Hypothesis".Science, Technology, & Human Values.38 (4):566–587.doi:10.1177/0162243912456271.ISSN 0162-2439.S2CID 42598982.
  11. ^Higgins, Julian P. T.; Green, Sally (March 2011). "8. Introduction to sources of bias in clinical trials". In Higgins, Julian P. T.; et al. (eds.).Cochrane Handbook for Systematic Reviews of Interventions (version 5.1). The Cochrane Collaboration.
  12. ^abNeyman, Jerzy;Pearson, Egon S. (1936)."Contributions to the theory of testing statistical hypotheses".Statistical Research Memoirs.1:1–37.
  13. ^Casella, George; Berger, Roger L. (2002).Statistical Inference (2nd ed.). p. 387.
  14. ^Romano, Joseph P.; Siegel, A. F. (1986-06-01).Counterexamples in Probability And Statistics. CRC Press. pp. 194–196.ISBN 978-0-412-98901-8.
  15. ^Hardy, Michael (2003)."An Illuminating Counterexample".The American Mathematical Monthly.110 (3):234–238.doi:10.2307/3647938.ISSN 0002-9890.JSTOR 3647938.
  16. ^National Council on Measurement in Education (NCME)."NCME Assessment Glossary". Archived fromthe original on 2017-07-22.
  17. ^"5 Types of Statistical Biases to Avoid in Your Analyses".Business Insights Blog. 2017-06-13. Retrieved2023-08-16.

External links

[edit]
Statistical biases
Other biases
Bias reduction
Retrieved from "https://en.wikipedia.org/w/index.php?title=Bias_(statistics)&oldid=1333044232"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp