Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Quartile

From Wikipedia, the free encyclopedia
Statistic which divides data into four same-sized parts for analysis

Instatistics,quartiles are a type ofquantiles which divide the number of data points into four parts, orquarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form oforder statistic. The three quartiles, resulting in four data divisions, are as follows:

  • The first quartile (Q1) is defined as the 25thpercentile where lowest 25% data is below this point. It is also known as thelower quartile.
  • The second quartile (Q2) is themedian of a data set; thus 50% of the data lies below this point.
  • The third quartile (Q3) is the 75th percentile where lowest 75% data is below this point. It is known as theupper quartile, as 75% of the data lies below this point.[1]

Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide afive-number summary of the data. This summary is important in statistics because it provides information about both thecenter and thespread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset isskewed toward one side. Since quartiles divide the number of data points evenly, therange is generally not the same between adjacent quartiles (i.e. usually (Q3 -Q2) ≠ (Q2 -Q1)).Interquartile range (IQR) is defined as the difference between the 75th and 25th percentiles orQ3 -Q1. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence ofoutliers in the data, and the difference in spread between the middle 50% of the data and the outer data points.[2]

Definitions

[edit]
Boxplot (with quartiles and aninterquartile range) and aprobability density function (pdf) of a normal N(0,1σ2) population
SymbolNamesDefinition
Q1
Splits off the lowest 25% of data from the highest 75%
Q2
  • Second quartile
  • Median
  • 50th percentile
Cuts data set in half
Q3
  • Third quartile
  • Upper quartile
  • 75th percentile
Splits off the highest 25% of data from the lowest 75%

Computing methods

[edit]

Discrete distributions

[edit]

For discrete distributions, there is no universal agreement on selecting the quartile values.[3]

Method 1

[edit]
  1. Use themedian to divide the ordered data set into two halves. The median becomes the second quartile.
    • If there are an odd number of data points in the original ordered data set,do not include the median (the central value in the ordered list) in either half.
    • If there are an even number of data points in the original ordered data set, split this data set exactly in half.
  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

This rule is employed by theTI-83 calculatorboxplot and "1-Var Stats" functions.

Method 2

[edit]
  1. Use the median to divide the ordered data set into two halves. The median becomes the second quartile.
    • If there are an odd number of data points in the original ordered data set,include the median (the central value in the ordered list) in both halves.
    • If there are an even number of data points in the original ordered data set, split this data set exactly in half.
  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

The values found by this method are also known as "Tukey's hinges";[4] see alsomidhinge.

Method 3

[edit]
  1. Use the median to divide the ordered data set into two halves. The median becomes the second quartile.
    • If there are odd numbers of data points, then go to the next step.
    • If there are even numbers of data points, then the Method 3 starts off the same as the Method 1 or the Method 2 above and you can choose to include or not include the median as a new datapoint. If you choose to include the median as the new datapoint, then proceed to the step 2 or 3 below because you now have an odd number of datapoints. If you do not choose the median as the new data point, then continue the Method 1 or 2 where you have started.
  2. If there are (4n+1) data points, then the lower quartile is 25% of thenth data value plus 75% of the (n+1)th data value; the upper quartile is 75% of the (3n+1)th data point plus 25% of the (3n+2)th data point.
  3. If there are (4n+3) data points, then the lower quartile is 75% of the (n+1)th data value plus 25% of the (n+2)th data value; the upper quartile is 25% of the (3n+2)th data point plus 75% of the (3n+3)th data point.

Method 4

[edit]

If we have an ordered datasetx1,x2,...,xn{\displaystyle x_{1},x_{2},...,x_{n}}, then we can interpolate between data points to find thep{\displaystyle p}th empiricalquantile ifxi{\displaystyle x_{i}} is in thei/(n+1){\displaystyle i/(n+1)} quantile. If we denote the integer part of a numbera{\displaystyle a} bya{\displaystyle \lfloor a\rfloor }, then the empirical quantile function is given by,

q(p/4)=xk+α(xk+1xk){\displaystyle q(p/4)=x_{k}+\alpha (x_{k+1}-x_{k})},

xk{\displaystyle x_{k}} is the last data point in quartilep, andxk+1{\displaystyle x_{k+1}} is the first data point in quartilep+1.

α{\displaystyle \alpha } measures where the quartile falls betweenxk{\displaystyle x_{k}} andxk+1{\displaystyle x_{k+1}}. Ifα{\displaystyle \alpha } = 0 then the quartile falls exactly onxk{\displaystyle x_{k}}. Ifα{\displaystyle \alpha } = 0.5 then the quartile falls exactly half way betweenxk{\displaystyle x_{k}} andxk+1{\displaystyle x_{k+1}}.

q(p/4)=xk+α(xk+1xk){\displaystyle q(p/4)=x_{k}+\alpha (x_{k+1}-x_{k})},

wherek=p(n+1)/4{\displaystyle k=\lfloor p(n+1)/4\rfloor } andα=p(n+1)/4p(n+1)/4{\displaystyle \alpha =p(n+1)/4-\lfloor p(n+1)/4\rfloor }.[1]

To find the first, second, and third quartiles of the dataset we would evaluateq(0.25){\displaystyle q(0.25)},q(0.5){\displaystyle q(0.5)}, andq(0.75){\displaystyle q(0.75)} respectively.

Example 1

[edit]

Ordered Data Set (of an odd number of data points): 6, 7, 15, 36, 39,40, 41, 42, 43, 47, 49.

The bold number (40) is the median splitting the data set into two halves with equal number of data points.

Method 1Method 2Method 3Method 4
Q11525.520.2515
Q240404040
Q34342.542.7543

Example 2

[edit]

Ordered Data Set (of an even number of data points): 7, 15,36, 39, 40, 41.

The bold numbers (36, 39) are used to calculate the median as their average. As there are an even number of data points, the first three methods all give the same results. (The Method 3 is executed such that the median is not chosen as a new data point and the Method 1 started.)

Method 1Method 2Method 3Method 4
Q115151513
Q237.537.537.537.5
Q340404040.25

Continuous probability distributions

[edit]
Quartiles on a cumulative distribution function of a normal distribution

If we define acontinuous probability distributions asP(X){\displaystyle P(X)} whereX{\displaystyle X} is areal valuedrandom variable, itscumulative distribution function (CDF) is given by

FX(x)=P(Xx){\displaystyle F_{X}(x)=P(X\leq x)}.[1]

TheCDF gives the probability that the random variableX{\displaystyle X} is less than or equal to the valuex{\displaystyle x}. Therefore, the first quartile is the value ofx{\displaystyle x} whenFX(x)=0.25{\displaystyle F_{X}(x)=0.25}, the second quartile isx{\displaystyle x} whenFX(x)=0.5{\displaystyle F_{X}(x)=0.5}, and the third quartile isx{\displaystyle x} whenFX(x)=0.75{\displaystyle F_{X}(x)=0.75}.[5] The values ofx{\displaystyle x} can be found with thequantile functionQ(p){\displaystyle Q(p)} wherep=0.25{\displaystyle p=0.25} for the first quartile,p=0.5{\displaystyle p=0.5} for the second quartile, andp=0.75{\displaystyle p=0.75} for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function ismonotonically increasing because theone-to-one correspondence between the input and output of the cumulative distribution function holds.

Outliers

[edit]

There are methods by which to check foroutliers in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.[6] Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea ofdescriptive statistics, when encountering anoutlier, we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. TheInterquartile Range (IQR), defined as the difference between the upper and lower quartiles (Q3Q1{\textstyle Q_{3}-Q_{1}}), may be used to characterize the data when there may be extremities that skew the data; theinterquartile range is a relativelyrobust statistic (also sometimes called "resistance") compared to therange andstandard deviation. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.

After determining the first (lower) and third (upper) quartiles (Q1{\textstyle Q_{1}} andQ3{\textstyle Q_{3}} respectively) and the interquartile range (IQR=Q3Q1{\textstyle {\textrm {IQR}}=Q_{3}-Q_{1}}) as outlined above, then fences are calculated using the following formula:

Lower fence=Q1(1.5×IQR){\displaystyle {\text{Lower fence}}=Q_{1}-(1.5\times \mathrm {IQR} )}
Upper fence=Q3+(1.5×IQR){\displaystyle {\text{Upper fence}}=Q_{3}+(1.5\times \mathrm {IQR} )}
Boxplot Diagram with Outliers

The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. The fences provide a guideline by which to define anoutlier, which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence. It is common for the lower and upper fences along with the outliers to be represented by aboxplot. For the boxplot shown on the right, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot.

When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be easy to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of ahypothesis test for determining normality of the population. The significance of the outliers varies depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.[7]

Computer software for quartiles

[edit]
EnvironmentFunctionQuartile Method
Microsoft ExcelQUARTILE.EXCMethod 4
Microsoft ExcelQUARTILE.INCMethod 3
TI-8X series calculators1-Var StatsMethod 1
RfivenumMethod 2
Rquantile (default)Method 4
Pythonnumpy.percentileMethod 4 (with n−1)
Pythonpandas.DataFrame.describeMethod 3

Excel

[edit]

The Excel functionQUARTILE.INC(array, quart) provides the desired quartile value for a given array of data, using Method 3 from above. TheQUARTILE function is a legacy function from Excel 2007 or earlier, giving the same output of the functionQUARTILE.INC. In the function,array is the dataset of numbers that is being analyzed andquart is any of the following 5 values depending on which quartile is being calculated.[8]

QuartOutput QUARTILE Value
0Minimum value
1Lower Quartile (25th percentile)
2Median
3Upper Quartile (75th percentile)
4Maximum value

MATLAB

[edit]

In order to calculate quartiles in Matlab, the functionquantile(A,p) can be used. WhereA is the vector of data being analyzed andp is the percentage that relates to the quartiles as stated below.[9]

pOutput QUARTILE Value
0Minimum value
0.25Lower Quartile (25th percentile)
0.5Median
0.75Upper Quartile (75th percentile)
1Maximum value

See also

[edit]

References

[edit]
  1. ^abcDekking, Michel (2005).A modern introduction to probability and statistics: understanding why and how. London: Springer. pp. 236-238.ISBN 978-1-85233-896-1.OCLC 262680588.
  2. ^Knoch, Jessica (February 23, 2018)."How are Quartiles Used in Statistics?".Magoosh. Archived fromthe original on December 10, 2019. RetrievedFebruary 24, 2023.
  3. ^Hyndman, Rob J; Fan, Yanan (November 1996)."Sample quantiles in statistical packages".American Statistician.50 (4):361–365.doi:10.2307/2684934.JSTOR 2684934.
  4. ^Tukey, John Wilder (1977).Exploratory Data Analysis. Addison-Wesley Publishing Company.ISBN 978-0-201-07616-5.
  5. ^"6. Distribution and Quantile Functions"(PDF).math.bme.hu.
  6. ^Walfish, Steven (November 2006)."A Review of Statistical Outlier Method".Pharmaceutical Technology.
  7. ^Dawson, Robert (July 1, 2011)."How Significant is a Boxplot Outlier?".Journal of Statistics Education.19 (2).doi:10.1080/10691898.2011.11889610.
  8. ^"How to use the Excel QUARTILE function | Exceljet".exceljet.net. RetrievedDecember 11, 2019.
  9. ^"Quantiles of a data set – MATLAB quantile".www.mathworks.com. RetrievedDecember 11, 2019.

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Quartile&oldid=1276898667"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp