Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

German tank problem

From Wikipedia, the free encyclopedia
Problem in statistical estimation

DuringWorld War II, production of German tanks such as thePanther was accurately estimated by Allied intelligence using statistical methods.

In thestatistical theory ofestimation, theGerman tank problem consists of estimating the maximum of adiscrete uniform distribution fromsampling without replacement. In simple terms, suppose there exists an unknown number of items which are sequentially numbered from 1 toN. A random sample of these items is taken and their sequence numbers observed; the problem is to estimateN from these observed numbers.

The problem can be approached using eitherfrequentist inference orBayesian inference, leading to different results. Estimating the population maximum based on asingle sample yields divergent results, whereas estimation based onmultiple samples is a practical estimation question whose answer is simple (especially in the frequentist setting) but not obvious (especially in the Bayesian setting).

The problem is named after its historical application byAllied forces inWorld War II to the estimation of the monthly rate of German tank production from very limited data. This exploited the manufacturing practice of assigning and attaching ascending sequences of serial numbers to tank components (chassis, gearbox, engine, wheels), with some of the tanks eventually being captured in battle by Allied forces.

Suppositions

[edit]

The adversary is presumed to have manufactured a series of tanks marked with consecutive whole numbers, beginning with serial number 1. Additionally, regardless of a tank's date of manufacture, history of service, or the serial number it bears, the distribution over serial numbers becoming revealed to analysis is uniform, up to the point in time when the analysis is conducted.

Example

[edit]
Estimated population size (N). The number of observations in the sample isk. The largest sample serial number ism. Frequentist analysis is shown with dotted lines. Bayesian analysis has solid lines with mean and shading to show range from minimum possible value to mean plus 1 standard deviation). The example shows if four tanks are observed and the highest serial number is "60", frequentist analysis predicts 74, whereas Bayesian analysis predicts a mean of 88.5 and standard deviation of 138.72 − 88.5 = 50.22, and a minimum of 60 tanks. Inthe SVG file, hover over a graph to highlight it.

Assuming tanks are assigned sequential serial numbers starting with 1, suppose that four tanks are captured and that they have the serial numbers: 19, 40, 42 and 60.

Afrequentist approach (using theminimum-variance unbiased estimator) predicts the total number of tanks produced will be:

N74{\displaystyle N\approx 74}

ABayesian approach (using a uniformprior over the integers in[4,Ω]{\displaystyle [4,\Omega ]} for any suitably largeΩ{\displaystyle \Omega }) predicts that themedian number of tanks produced will be very similar to the frequentist prediction:

Nmed73.9{\displaystyle N_{med}\approx 73.9}

whereas the Bayesianmean predicts that the number of tanks produced would be:

Nav89{\displaystyle N_{av}\approx 89}

LetN equal the total number of tanks predicted to have been produced,m equal the highest serial number observed andk equal the number of tanks captured.

The frequentist prediction is calculated as:

Nm+mk1=74{\displaystyle N\approx m+{\frac {m}{k}}-1=74}

The Bayesian median is calculated as:

Nmedm+mln(2)k1=73.9{\displaystyle N_{med}\approx m+{\frac {m\ln(2)}{k-1}}=73.9}

The Bayesian mean is calculated as:

Nav(m1)k1k2=89{\displaystyle N_{av}\approx (m-1){\frac {k-1}{k-2}}=89}

These Bayesian quantities are derived from the Bayesian posterior distribution:

Pr(N=n)={0if n<m,k1k(m1k1)(nk)if nm.{\displaystyle \Pr(N=n)={\begin{cases}0&{\text{if }}n<m,\\{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{\binom {n}{k}}}&{\text{if }}n\geq m.\end{cases}}}

Thisprobability mass function has a positiveskewness, related to the fact that there are at least 60 tanks. Because of this skewness, the mean may not be the most meaningful estimate. Themedian in this example is 74.5, in close agreement with the frequentist formula. UsingStirling's approximation, the posterior may be approximated by an exponentially decaying function ofn,

Pr(N=n){0if n<m(k1)mk1nkif nm,{\displaystyle \Pr(N=n)\approx {\begin{cases}0&{\text{if }}n<m\\(k-1)m^{k-1}n^{-k}&{\text{if }}n\geq m,\end{cases}}}

which results in the following approximation for the median:

Nmedm+mln(2)k1{\displaystyle N_{med}\approx m+{\frac {m\ln(2)}{k-1}}}

and the following approximations for the mean and standard deviation:

Nμ±σ=89±50,μ=(m1)k1k2,σ=(k1)(m1)(mk+1)(k3)(k2)2.{\displaystyle {\begin{aligned}N&\approx \mu \pm \sigma =89\pm 50,\\[5pt]\mu &=(m-1){\frac {k-1}{k-2}},\\[5pt]\sigma &={\sqrt {\frac {(k-1)(m-1)(m-k+1)}{(k-3)(k-2)^{2}}}}.\end{aligned}}}

Historical example of the problem

[edit]
Panther tanks are loaded for transport to frontline units, 1943.

During the course of the Second World War, theWestern Allies made sustained efforts to determine the extent of German production and approached this in two major ways: conventional intelligence gathering and statistical estimation. In many cases, statistical analysis substantially improved on conventional intelligence. In some cases, conventional intelligence was used in conjunction with statistical methods, as was the case in estimation ofPanther tank production just prior toD-Day.

The allied command structure had thought thePanzer V (Panther) tanks seen in Italy, with their high velocity, long-barreled 75 mm/L70 guns, were unusual heavy tanks and would only be seen in northern France in small numbers, much the same way as theTiger I was seen in Tunisia. The US Army was confident that theSherman tank would continue to perform well, as it had versus thePanzer III andPanzer IV tanks in North Africa and Sicily.[a] Shortly before D-Day, rumors indicated that large numbers of Panzer V tanks were being used.

To determine whether this was true, the Allies attempted to estimate the number of tanks being produced. To do this, they used the serial numbers on captured or destroyed tanks. The principal numbers used were gearbox numbers, as these fell in two unbroken sequences. Chassis and engine numbers were also used, though their use was more complicated. Various other components were used to cross-check the analysis. Similar analyses were done on wheels, which were observed to be sequentially numbered (i.e., 1, 2, 3, ..., N).[2][b][3][4]

The analysis of tank wheels yielded an estimate for the number of wheel molds that were in use. A discussion with British road wheel makers then estimated the number of wheels that could be produced from this many molds, which yielded the number of tanks that were being produced each month. Analysis of wheels from two tanks (32 road wheels each, 64 road wheels total) yielded an estimate of 270 tanks produced in February 1944, substantially more than had previously been suspected.[5]

German records after the war showed production for the month of February 1944 was 276.[6][c] The statistical approach proved to be far more accurate than conventional intelligence methods, and the phrase "German tank problem" became accepted as a descriptor for this type of statistical analysis.

Estimating production was not the only use of this serial-number analysis. It was also used to understand German production more generally, including number of factories, relative importance of factories, length of supply chain (based on lag between production and use), changes in production, and use of resources such as rubber.

Specific data

[edit]

According to conventional Allied intelligence estimates, the Germans were producing around 1,400 tanks a month between June 1940 and September 1942. Applying the formula below to the serial numbers of captured tanks, the number was calculated to be 246 a month. After the war, captured German production figures from the ministry ofAlbert Speer showed the actual number to be 245.[3]

Estimates for some specific months are given as:[7]

MonthStatistical estimateIntelligence estimateGerman records
June 19401691,000122
June 19412441,550271
August 19423271,550342

Similar analyses

[edit]
V-2 rocket production was accurately estimated by statistical methods.

Similar serial-number analysis was used for other military equipment during World War II, most successfully for theV-2 rocket.[8]

Factory markings on Soviet military equipment were analyzed during theKorean War, and by German intelligence during World War II.[9]

In the 1980s, some Americans were given access to the production line of Israel'sMerkava tanks. The production numbers were classified, but the tanks had serial numbers, allowing estimation of production.[10]

The formula has been used in non-military contexts, for example to estimate the number ofCommodore 64 computers built, where the result (12.5 million) matches the low-end estimates.[11]

Countermeasures

[edit]
icon
This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(January 2013) (Learn how and when to remove this message)

To confound serial-number analysis, serial numbers can be excluded, or usable auxiliary information reduced. Alternatively, serial numbers that resist cryptanalysis can be used, most effectively by randomly choosing numbers without replacement from a list that is much larger than the number of objects produced, or by producing random numbers and checking them against the list of already assigned numbers; collisions are likely to occur unless the number of digits possible is more than twice the number of digits in the number of objects produced; this is valid for serial numbers in any base(see:Birthday problem).[d] For this, acryptographically secure pseudorandom number generator may be used. All these methods require a lookup table (or breaking the cypher) to back out from serial number to production order, which complicates use of serial numbers: a range of serial numbers cannot be recalled, for instance, but each must be looked up individually, or a list generated.

Alternatively, sequential serial numbers can be encrypted with a simplesubstitution cipher, which allows easy decoding, but is also easily broken byfrequency analysis: even if starting from an arbitrary point, the plaintext hasa pattern (namely, numbers are in sequence). One example is given inKen Follett's novelCode to Zero, where the encryption of theJupiter-C rocket serial numbers is given by:

HUNTSVILEX
1234567890

The code word here isHuntsville (with repeated letters omitted) to get a 10-letter key.[12] The rocket number 13 was therefore "HN", and the rocket number 24 was "UT".

Frequentist analysis

[edit]

Minimum-variance unbiased estimator

[edit]

Forpoint estimation (estimating a single value for the total,N^{\displaystyle {\widehat {N}}}), theminimum-variance unbiased estimator (MVUE, or UMVU estimator) is given by:[e]

N^=m(1+k1)1,{\displaystyle {\widehat {N}}=m(1+k^{-1})-1,}

wherem is the largest serial number observed (sample maximum) andk is the number of tanks observed (sample size).[10][13] Note that once a serial number has been observed, it is no longer in the pool and will not be observed again.

This has a variance[10]

var(N^)=1k(Nk)(N+1)(k+2)N2k2 for small samples kN,{\displaystyle \operatorname {var} \left({\widehat {N}}\right)={\frac {1}{k}}{\frac {(N-k)(N+1)}{(k+2)}}\approx {\frac {N^{2}}{k^{2}}}{\text{ for small samples }}k\ll N,}

so thestandard deviation is approximatelyN/k, the expected size of the gap between sorted observations in the sample.

The formula may be understood intuitively as the sample maximum plus the average gap between observations in the sample, the sample maximum being chosen as the initial estimator, due to being themaximum likelihood estimator,[f] with the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum,[g] and written as

N^=m+mkk=m+mk11=m(1+k1)1.{\displaystyle {\widehat {N}}=m+{\frac {m-k}{k}}=m+mk^{-1}-1=m(1+k^{-1})-1.}

This can be visualized by imagining that the observations in the sample are evenly spaced throughout the range, with additional observations just outside the range at 0 andN + 1. If starting with an initial gap between 0 and the lowest observation in the sample (the sample minimum), the average gap between consecutive observations in the sample is(mk)/k{\displaystyle (m-k)/k}, thek{\displaystyle -k} being because the observations themselves are not counted in computing the gap between observations.[h] A derivation of the expected value and the variance of the sample maximum are shown in the page of thediscrete uniform distribution.

This philosophy is formalized and generalized in the method ofmaximum spacing estimation; a similar heuristic is used forplotting position in aQ–Q plot, plotting sample points atk / (n + 1), which is evenly on the uniform distribution, with a gap at the end.

Confidence intervals

[edit]

Instead of, or in addition to,point estimation,interval estimation can be carried out, such asconfidence intervals.These are easily computed, based on the observation that the probability thatk observations in the sample will fall in an interval coveringp of the range (0 ≤ p ≤ 1) ispk (assuming in this section that draws arewith replacement, to simplify computations; if draws are without replacement, this overstates the likelihood, and intervals will be overly conservative).

Thus thesampling distribution of the quantile of the sample maximum is the graphx1/k from 0 to 1: thep-th toq-th quantile of the sample maximumm are the interval [p1/kNq1/kN]. Inverting this yields the corresponding confidence interval for the population maximum of [m/q1/km/p1/k].

For example, taking the symmetric 95% intervalp = 2.5% andq = 97.5% fork = 5 yields 0.0251/5 ≈ 0.48, 0.9751/5 ≈ 0.995, so the confidence interval is approximately [1.005m, 2.08m]. The lower bound is very close tom, thus more informative is the asymmetric confidence interval fromp = 5% to 100%; fork = 5 this yields 0.051/5 ≈ 0.55 and the interval [m, 1.82m].

More generally, the (downward biased) 95% confidence interval is [m,m/0.051/k] = [m,m·201/k]. For a range ofk values, with the UMVU point estimator (plus 1 for legibility) for reference, this yields:

kPoint estimateConfidence interval
12m[m, 20m]
21.5m[m, 4.5m]
51.2m[m, 1.82m]
101.1m[m, 1.35m]
201.05m[m, 1.16m]

Immediate observations are:

  • For small sample sizes, the confidence interval is very wide, reflecting great uncertainty in the estimate.
  • The range shrinks rapidly, reflecting the exponentially decaying probability thatall observations in the sample will be significantly below the maximum.
  • The confidence interval exhibits positive skew, asN can never be below the sample maximum, but can potentially be arbitrarily high above it.

Note thatm/k cannot be used naively (or rather (m + m/k − 1)/k) as an estimate of thestandard errorSE, as the standard error of an estimator is based on thepopulation maximum (a parameter), and using an estimate to estimate the error in that very estimate iscircular reasoning.

Bayesian analysis

[edit]

The Bayesian approach to the German tank problem[14] is to consider the posterior probability(N=nM=m,K=k){\displaystyle (N=n\mid M=m,K=k)} that the number of enemy tanksN{\displaystyle N} isn{\displaystyle n}, when the number of observed tanksK{\displaystyle K} isk{\displaystyle k}, and the maximum observed serial numberM{\displaystyle M} ism{\displaystyle m}. Note that this does not discard the relevant information contained in the samples, as we will see that the likelihood function can be constructed using only the number of samples and the max observed value.

The answer to this problem depends on the choice of prior forN{\displaystyle N}. One can proceed using a proper prior over the positive integers, e.g., the Poisson or Negative Binomial distribution, where a closed formula for the posterior mean and posterior variance can be obtained.[15] Below, we will instead adopt a bounded uniform prior.

For brevity, in what follows,(N=nM=m,K=k){\displaystyle (N=n\mid M=m,K=k)} is written(nm,k){\displaystyle (n\mid m,k)}.

Conditional probability

[edit]

The rule forconditional probability gives

(nm,k)(mk)=(mn,k)(nk)=(m,nk){\displaystyle (n\mid m,k)(m\mid k)=(m\mid n,k)(n\mid k)=(m,n\mid k)}

Probability ofM knowingN andK

[edit]

The expression

(mn,k)=(M=mN=n,K=k){\displaystyle (m\mid n,k)=(M=m\mid N=n,K=k)}

is the conditional probability that the maximum serial number observed,M{\displaystyle M}, is equal tom{\displaystyle m}, when the number of enemy tanks,N{\displaystyle N}, is known to be equal ton{\displaystyle n}, and the number of enemy tanks observed,K{\displaystyle K}, is known to be equal tok{\displaystyle k}.

It is

(mn,k)=(m1k1)(nk)1[km][mn]{\displaystyle (m\mid n,k)={\binom {m-1}{k-1}}{\binom {n}{k}}^{-1}[k\leq m][m\leq n]}

where(nk){\displaystyle {\binom {n}{k}}} is abinomial coefficient and[kn]{\displaystyle [k\leq n]} is anIverson bracket.

The expression can be derived as follows:(mn,k){\displaystyle (m\mid n,k)} answers the question: "What is the probability of a specific serial numberm{\displaystyle m} being the highest number observed in a sample ofk{\displaystyle k} tanks, given there aren{\displaystyle n} tanks in total?"

One can think of the sample of sizek{\displaystyle k} to be the result ofk{\displaystyle k} individual draws without replacement. Assumem{\displaystyle m} is observed on draw numberd{\displaystyle d}. The probability of this occurring is:m1nm2n1m3n2md+1nd+2d1 times1nd+1draw no. dmdndmd1nd1md(kd1)nd(kd1)kd times=(nk)!n!(m1)!(mk)!.{\displaystyle \underbrace {{\frac {m-1}{n}}\cdot {\frac {m-2}{n-1}}\cdot {\frac {m-3}{n-2}}\cdots {\frac {m-d+1}{n-d+2}}} _{d-1{\text{ times}}}\cdot \underbrace {\frac {1}{n-d+1}} _{{\text{draw no. }}d}\cdot \underbrace {{\frac {m-d}{n-d}}\cdot {\frac {m-d-1}{n-d-1}}\cdots {\frac {m-d-(k-d-1)}{n-d-(k-d-1)}}} _{k-d{\text{ times}}}={\frac {(n-k)!}{n!}}\cdot {\frac {(m-1)!}{(m-k)!}}.}

As can be seen from the right-hand side, this expression is independent ofd{\displaystyle d} and therefore the same for eachdk{\displaystyle d\leq k}. Asm{\displaystyle m} can be drawn onk{\displaystyle k} different draws, the probability of any specificm{\displaystyle m} being the largest one observed isk{\displaystyle k} times the above probability:(mn,k)=k(nk)!n!(m1)!(mk)!=(m1k1)(nk)1.{\displaystyle (m\mid n,k)=k\cdot {\frac {(n-k)!}{n!}}\cdot {\frac {(m-1)!}{(m-k)!}}={\binom {m-1}{k-1}}{\binom {n}{k}}^{-1}.}

Seen differently, afterk{\displaystyle k} draws without replacement, one draw was for objectm{\displaystyle m}, and the otherk1{\displaystyle k-1} were for objects in the range1{\displaystyle 1} tom1{\displaystyle m-1} ; there are(m1k1){\displaystyle {\binom {m-1}{k-1}}} ways to achieve this. And there are(nk){\displaystyle {\binom {n}{k}}}ways to draw without replacementk{\displaystyle k} objects from a collection ofn{\displaystyle n} objects.

Probability ofM knowing onlyK

[edit]

The expression(mk)=(M=mK=k){\displaystyle (m\mid k)=(M=m\mid K=k)} is the probability that the maximum serial number is equal tom{\displaystyle m} oncek{\displaystyle k} tanks have been observed but before the serial numbers have actually been observed.

The expression(mk){\displaystyle (m\mid k)} can be re-written in terms of the other quantities by marginalizing over all possiblen{\displaystyle n}.

(mk)=n=0(m,nk)=n=0(mn,k)(nk){\displaystyle {\begin{aligned}(m\mid k)&=\sum _{n=0}^{\infty }(m,n\mid k)\\&=\sum _{n=0}^{\infty }(m\mid n,k)(n\mid k)\end{aligned}}}

Prior probability ofN knowing onlyK

[edit]

We assume thatk{\displaystyle k} is fixed in advance so that we do not have to consider any distribution overk{\displaystyle k}. Thus, our prior can depend onk{\displaystyle k}.

The expression

(nk)=(N=nK=k){\displaystyle (n\mid k)=(N=n\mid K=k)}

is the credibility that the total number of tanks,N{\displaystyle N}, is equal ton{\displaystyle n} when the numberK{\displaystyle K} tanks observed is known to bek{\displaystyle k}, but before the serial numbers have been observed. Assume that it is somediscrete uniform distribution

(nk)=(Ωk)1[kn][n<Ω]{\displaystyle (n\mid k)=(\Omega -k)^{-1}[k\leq n][n<\Omega ]}

The upper limitΩ{\displaystyle \Omega } must be finite, because the function

f(n)=limΩ(Ωk)1[kn][n<Ω]=0{\displaystyle f(n)=\lim _{\Omega \rightarrow \infty }(\Omega -k)^{-1}[k\leq n][n<\Omega ]=0}

is not a mass distribution function. Our result below will not depend onΩ{\displaystyle \Omega }.

Posterior probability ofN knowingM andK

[edit]

Provided thatΩ>m{\displaystyle \Omega >m}, so that the prior is consistent with the observed data:

(nm,k)=(mn,k)(n=mΩ1(mn,k))1[mn][n<Ω]{\displaystyle (n\mid m,k)=(m\mid n,k)\left(\sum _{n=m}^{\Omega -1}(m\mid n,k)\right)^{-1}[m\leq n][n<\Omega ]}

AsΩ{\displaystyle \Omega \rightarrow \infty }, the summation approachesn=m(mn,k){\displaystyle \sum _{n=m}^{\infty }(m\mid n,k)} (which is finite ifk ≥ 2). Thus, for suitably largeΩ{\displaystyle \Omega }, we have

(nm,k)(mn,k)(n=m(mn,k))1[mn]{\displaystyle (n\mid m,k)\approx (m\mid n,k)\left(\sum _{n=m}^{\infty }(m\mid n,k)\right)^{-1}[m\leq n]}

Fork ≥ 1 themode of the distribution of the number of enemy tanks ism.

Fork ≥ 2, the credibility that the number of enemy tanks isequal ton{\displaystyle n}, is

(N=nm,k)=(k1)(m1k1)k1(nk)1[mn]{\displaystyle (N=n\mid m,k)=(k-1){\binom {m-1}{k-1}}k^{-1}{\binom {n}{k}}^{-1}[m\leq n]}

The credibility that the number of enemy tanks,N, isgreater than n, is

(N>nm,k)={1if n<m(m1k1)(nk1)if nm{\displaystyle (N>n\mid m,k)={\begin{cases}1&{\text{if }}n<m\\{\frac {\binom {m-1}{k-1}}{\binom {n}{k-1}}}&{\text{if }}n\geq m\end{cases}}}

Mean value and standard deviation

[edit]

Fork ≥ 3,N has the finitemean value:

(m1)(k1)(k2)1{\displaystyle (m-1)(k-1)(k-2)^{-1}}

Fork ≥ 4,N has the finitestandard deviation:

(k1)1/2(k2)1(k3)1/2(m1)1/2(m+1k)1/2{\displaystyle (k-1)^{1/2}(k-2)^{-1}(k-3)^{-1/2}(m-1)^{1/2}(m+1-k)^{1/2}}

These formulas are derived below.

Summation formula

[edit]

The followingbinomial coefficient identity is used below for simplifyingseries relating to the German Tank Problem.

n=m1(nk)=kk11(m1k1){\displaystyle \sum _{n=m}^{\infty }{\frac {1}{\binom {n}{k}}}={\frac {k}{k-1}}{\frac {1}{\binom {m-1}{k-1}}}}

This sum formula is somewhat analogous to the integral formula

n=mdnnk=1k11mk1{\displaystyle \int _{n=m}^{\infty }{\frac {dn}{n^{k}}}={\frac {1}{k-1}}{\frac {1}{m^{k-1}}}}

These formulas apply fork > 1.

One tank

[edit]

Observing one tank randomly out of a population ofn tanks gives the serial numberm with probability 1/n form ≤ n, and zero probability form > n. UsingIverson bracket notation this is written

(M=mN=n,K=1)=(mn)=[mn]n{\displaystyle (M=m\mid N=n,K=1)=(m\mid n)={\frac {[m\leq n]}{n}}}

This is the conditional probability mass distribution function ofm{\displaystyle m}.

When considered a function ofn for fixedm this is a likelihood function.

L(n)=[nm]n{\displaystyle {\mathcal {L}}(n)={\frac {[n\geq m]}{n}}}

Themaximum likelihood estimate for the total number of tanks isN0 = m, clearly a biased estimate since the true number can be more than this, potentially many more, but cannot be fewer.

The marginal likelihood (i.e. marginalized over all models) isinfinite, being a tail of theharmonic series.

nL(n)=n=m1n={\displaystyle \sum _{n}{\mathcal {L}}(n)=\sum _{n=m}^{\infty }{\frac {1}{n}}=\infty }

but

nL(n)[n<Ω]=n=mΩ11n=HΩ1Hm1{\displaystyle {\begin{aligned}\sum _{n}{\mathcal {L}}(n)[n<\Omega ]&=\sum _{n=m}^{\Omega -1}{\frac {1}{n}}\\[5pt]&=H_{\Omega -1}-H_{m-1}\end{aligned}}}

whereHn{\displaystyle H_{n}} is theharmonic number.

The credibility mass distribution function depends on the prior limitΩ{\displaystyle \Omega }:

(N=nM=m,K=1)=(nm)=[mn]n[n<Ω]HΩ1Hm1{\displaystyle {\begin{aligned}&(N=n\mid M=m,K=1)\\[5pt]={}&(n\mid m)={\frac {[m\leq n]}{n}}{\frac {[n<\Omega ]}{H_{\Omega -1}-H_{m-1}}}\end{aligned}}}

The mean value ofN{\displaystyle N} is

nn(nm)=n=mΩ11HΩ1Hm1=ΩmHΩ1Hm1Ωmlog(Ω1m1){\displaystyle {\begin{aligned}\sum _{n}n\cdot (n\mid m)&=\sum _{n=m}^{\Omega -1}{\frac {1}{H_{\Omega -1}-H_{m-1}}}\\[5pt]&={\frac {\Omega -m}{H_{\Omega -1}-H_{m-1}}}\\[5pt]&\approx {\frac {\Omega -m}{\log \left({\frac {\Omega -1}{m-1}}\right)}}\end{aligned}}}

Two tanks

[edit]

If two tanks rather than one are observed, then the probability that the larger of the observed two serial numbers is equal tom, is

(M=mN=n,K=2)=(mn)=[mn]m1(n2){\displaystyle (M=m\mid N=n,K=2)=(m\mid n)=[m\leq n]{\frac {m-1}{\binom {n}{2}}}}

When considered a function ofn for fixedm this is alikelihood function

L(n)=[nm]m1(n2){\displaystyle {\mathcal {L}}(n)=[n\geq m]{\frac {m-1}{\binom {n}{2}}}}

The total likelihood is

nL(n)=m11n=m1(n2)=m112211(m121)=2{\displaystyle {\begin{aligned}\sum _{n}{\mathcal {L}}(n)&={\frac {m-1}{1}}\sum _{n=m}^{\infty }{\frac {1}{\binom {n}{2}}}\\[4pt]&={\frac {m-1}{1}}\cdot {\frac {2}{2-1}}\cdot {\frac {1}{\binom {m-1}{2-1}}}\\[4pt]&=2\end{aligned}}}

and the credibility mass distribution function is

(N=nM=m,K=2)=(nm)=L(n)nL(n)=[nm]m1n(n1){\displaystyle {\begin{aligned}&(N=n\mid M=m,K=2)\\[4pt]={}&(n\mid m)\\[4pt]={}&{\frac {{\mathcal {L}}(n)}{\sum _{n}{\mathcal {L}}(n)}}\\[4pt]={}&[n\geq m]{\frac {m-1}{n(n-1)}}\end{aligned}}}

ThemedianN~{\displaystyle {\tilde {N}}} satisfies

n[nN~](nm)=12{\displaystyle \sum _{n}[n\geq {\tilde {N}}](n\mid m)={\frac {1}{2}}}

so

m1N~1=12{\displaystyle {\frac {m-1}{{\tilde {N}}-1}}={\frac {1}{2}}}

and so the median is

N~=2m1{\displaystyle {\tilde {N}}=2m-1}

but the mean value ofN{\displaystyle N} is infinite

μ=nn(nm)=m11n=m1n1={\displaystyle \mu =\sum _{n}n\cdot (n\mid m)={\frac {m-1}{1}}\sum _{n=m}^{\infty }{\frac {1}{n-1}}=\infty }

Many tanks

[edit]

Credibility mass distribution function

[edit]

The conditional probability that the largest ofk observations taken from the serial numbers {1,...,n}, is equal tom, is

(M=mN=n,K=k2)=(mn,k)=[mn](m1k1)(nk){\displaystyle {\begin{aligned}&(M=m\mid N=n,K=k\geq 2)\\={}&(m\mid n,k)\\={}&[m\leq n]{\frac {\binom {m-1}{k-1}}{\binom {n}{k}}}\end{aligned}}}

The likelihood function ofn is the same expression

L(n)=[nm](m1k1)(nk){\displaystyle {\mathcal {L}}(n)=[n\geq m]{\frac {\binom {m-1}{k-1}}{\binom {n}{k}}}}

The total likelihood is finite fork ≥ 2:

nL(n)=(m1k1)1n=m1(nk)=(m1k1)1kk11(m1k1)=kk1{\displaystyle {\begin{aligned}\sum _{n}{\mathcal {L}}(n)&={\frac {\binom {m-1}{k-1}}{1}}\sum _{n=m}^{\infty }{1 \over {\binom {n}{k}}}\\&={\frac {\binom {m-1}{k-1}}{1}}\cdot {\frac {k}{k-1}}\cdot {\frac {1}{\binom {m-1}{k-1}}}\\&={\frac {k}{k-1}}\end{aligned}}}

The credibility mass distribution function is

(N=nM=m,K=k2)=(nm,k)=L(n)nL(n)=[nm]k1k(m1k1)(nk)=[nm]m1n(m2k2)(n1k1)=[nm]m1nm2n1k1k2(m3k3)(n2k2){\displaystyle {\begin{aligned}&(N=n\mid M=m,K=k\geq 2)=(n\mid m,k)\\={}&{\frac {{\mathcal {L}}(n)}{\sum _{n}{\mathcal {L}}(n)}}\\={}&[n\geq m]{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{\binom {n}{k}}}\\={}&[n\geq m]{\frac {m-1}{n}}{\frac {\binom {m-2}{k-2}}{\binom {n-1}{k-1}}}\\={}&[n\geq m]{\frac {m-1}{n}}{\frac {m-2}{n-1}}{\frac {k-1}{k-2}}{\frac {\binom {m-3}{k-3}}{\binom {n-2}{k-2}}}\end{aligned}}}

Thecomplementary cumulative distribution function is the credibility thatN >x

(N>xM=m,K=k)={1if x<mn=x+1(nm,k)if xm=[x<m]+[xm]n=x+1k1k(m1k1)(Nk)=[x<m]+[xm]k1k(m1k1)1n=x+11(nk)=[x<m]+[xm]k1k(m1k1)1kk11(xk1)=[x<m]+[xm](m1k1)(xk1){\displaystyle {\begin{aligned}&(N>x\mid M=m,K=k)\\[4pt]={}&{\begin{cases}1&{\text{if }}x<m\\\sum _{n=x+1}^{\infty }(n\mid m,k)&{\text{if }}x\geq m\end{cases}}\\={}&[x<m]+[x\geq m]\sum _{n=x+1}^{\infty }{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{\binom {N}{k}}}\\[4pt]={}&[x<m]+[x\geq m]{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{1}}\sum _{n=x+1}^{\infty }{\frac {1}{\binom {n}{k}}}\\[4pt]={}&[x<m]+[x\geq m]{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{1}}\cdot {\frac {k}{k-1}}{\frac {1}{\binom {x}{k-1}}}\\[4pt]={}&[x<m]+[x\geq m]{\frac {\binom {m-1}{k-1}}{\binom {x}{k-1}}}\end{aligned}}}

Thecumulative distribution function is the credibility thatNx

(NxM=m,K=k)=1(N>xM=m,K=k)=[xm](1(m1k1)(xk1)){\displaystyle {\begin{aligned}&(N\leq x\mid M=m,K=k)\\[4pt]={}&1-(N>x\mid M=m,K=k)\\[4pt]={}&[x\geq m]\left(1-{\frac {\binom {m-1}{k-1}}{\binom {x}{k-1}}}\right)\end{aligned}}}

Order of magnitude

[edit]

The order of magnitude of the number of enemy tanks is

μ=nn(N=nM=m,K=k)=nn[nm]m1n(m2k2)(n1k1)=m11(m2k2)1n=m1(n1k1)=m11(m2k2)1k1k21(m2k2)=m11k1k2{\displaystyle {\begin{aligned}\mu &=\sum _{n}n\cdot (N=n\mid M=m,K=k)\\[4pt]&=\sum _{n}n[n\geq m]{\frac {m-1}{n}}{\frac {\binom {m-2}{k-2}}{\binom {n-1}{k-1}}}\\[4pt]&={\frac {m-1}{1}}{\frac {\binom {m-2}{k-2}}{1}}\sum _{n=m}^{\infty }{\frac {1}{\binom {n-1}{k-1}}}\\[4pt]&={\frac {m-1}{1}}{\frac {\binom {m-2}{k-2}}{1}}\cdot {\frac {k-1}{k-2}}{\frac {1}{\binom {m-2}{k-2}}}\\[4pt]&={\frac {m-1}{1}}{\frac {k-1}{k-2}}\end{aligned}}}

Statistical uncertainty

[edit]

The statistical uncertainty is the standard deviationσ{\displaystyle \sigma }, satisfying the equation

σ2+μ2=nn2(N=nM=m,K=k){\displaystyle \sigma ^{2}+\mu ^{2}=\sum _{n}n^{2}\cdot (N=n\mid M=m,K=k)}

So

σ2+μ2μ=nn(n1)(N=nM=m,K=k)=n=mn(n1)m1nm2n1k1k2(m3k3)(n2k2)=m11m21k1k2(m3k3)1n=m1(n2k2)=m11m21k1k2(m3k3)1k2k31(m3k3)=m11m21k1k3{\displaystyle {\begin{aligned}\sigma ^{2}+\mu ^{2}-\mu &=\sum _{n}n(n-1)\cdot (N=n\mid M=m,K=k)\\[4pt]&=\sum _{n=m}^{\infty }n(n-1){\frac {m-1}{n}}{\frac {m-2}{n-1}}{\frac {k-1}{k-2}}{\frac {\binom {m-3}{k-3}}{\binom {n-2}{k-2}}}\\[4pt]&={\frac {m-1}{1}}{\frac {m-2}{1}}{\frac {k-1}{k-2}}\cdot {\frac {\binom {m-3}{k-3}}{1}}\sum _{n=m}^{\infty }{\frac {1}{\binom {n-2}{k-2}}}\\[4pt]&={\frac {m-1}{1}}{\frac {m-2}{1}}{\frac {k-1}{k-2}}{\frac {\binom {m-3}{k-3}}{1}}{\frac {k-2}{k-3}}{\frac {1}{\binom {m-3}{k-3}}}\\[4pt]&={\frac {m-1}{1}}{\frac {m-2}{1}}{\frac {k-1}{k-3}}\end{aligned}}}

and

σ=m11m21k1k3+μμ2=(k1)(m1)(mk+1)(k3)(k2)2{\displaystyle {\begin{aligned}\sigma &={\sqrt {{\frac {m-1}{1}}{\frac {m-2}{1}}{\frac {k-1}{k-3}}+\mu -\mu ^{2}}}\\[4pt]&={\sqrt {\frac {(k-1)(m-1)(m-k+1)}{(k-3)(k-2)^{2}}}}\end{aligned}}}

Thevariance-to-mean ratio is simply

σ2μ=mk+1(k3)(k2){\displaystyle {\frac {\sigma ^{2}}{\mu }}={\frac {m-k+1}{(k-3)(k-2)}}}

See also

[edit]

Further reading

[edit]

External links

[edit]

Notes

[edit]
  1. ^An Armored Ground Forces policy statement of November 1943 concluded: "The recommendation of a limited proportion of tanks carrying a 90 mm gun is not concurred in for the following reasons: The M4 tank has been hailed widely as the best tank of the battlefield today. ... There appears to be no fear on the part of our forces of the German Mark VI (Tiger) tank. There can be no basis for the T26 tank other than the conception of a tank-vs.-tank duel – which is believed to be unsound and unnecessary."[1]
  2. ^The lower bound was unknown, but to simplify the discussion, this detail is generally omitted, taking the lower bound as known to be 1.
  3. ^Ruggles & Brodie is largely a practical analysis and summary, not a mathematical one – the estimation problem is only mentioned in footnote 3 on page 82, where they estimate the maximum as "sample maximum + average gap".
  4. ^As discussed inbirthday attack, one can expect a collision after 1.25H numbers, if choosing fromH possible outputs. This square root corresponds to half the digits. For example, in any base, the square root of a number with 100 digits is approximately a number with 50 digits.
  5. ^In a continuous distribution, there is no −1 term.
  6. ^Given a particular set of observations, this set is most likely to occur if the population maximum is the sample maximum, not a higher value (it cannot be lower).
  7. ^The sample maximum is never more than the population maximum, but can be less, hence it is abiased estimator: it will tend tounderestimate the population maximum.
  8. ^For example, the gap between 2 and 7 is (7 − 2) − 1 = 4, consisting of 3, 4, 5, and 6.

References

[edit]
  1. ^AGF policy statement. Chief of staff AGF. November 1943. MHI
  2. ^Ruggles & Brodie 1947, pp. 73–74.
  3. ^ab"Gavyn Davies does the maths – How a statistical formula won the war".The Guardian. 20 July 2006. Retrieved6 July 2014.
  4. ^Matthews, Robert (23 May 1998),"Data sleuths go to war, sidebar in feature "Hidden truths"",New Scientist, archived fromthe original on 18 April 2001
  5. ^Bob Carruthers (1 March 2012).Panther V in Combat. Coda Books. pp. 94–.ISBN 978-1-908538-15-4.
  6. ^Ruggles & Brodie 1947, pp. 82–83.
  7. ^Ruggles & Brodie 1947, p. 89.
  8. ^Ruggles & Brodie 1947, pp. 90–91.
  9. ^Volz 2008.
  10. ^abcJohnson 1994.
  11. ^"How many Commodore 64 computers were really sold?".pagetable.com. 1 February 2011. Archived fromthe original on 6 March 2016. Retrieved6 July 2014.
  12. ^"Rockets and Missiles".www.spaceline.org.
  13. ^Joyce, Smart."German Tank Problem".Logan High School. Archived fromthe original on 24 April 2012. Retrieved8 July 2014.
  14. ^Simon, Cory (2023)."A Bayesian Treatment of the German Tank Problem".The Mathematical Intelligencer.46 (2):117–127.arXiv:2301.00046.doi:10.1007/s00283-023-10274-6.PMC 11147940.PMID 38841650.
  15. ^Höhle, M.; Held, L. (2006)."Bayesian Estimation of the Size of a Population"(PDF).Technical Report SFB 386, No. 399, Department of Statistics, University of Munich. Retrieved17 April 2016.

Works cited

[edit]
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Degenerate
andsingular
Degenerate
Dirac delta function
Singular
Cantor
Families
Retrieved from "https://en.wikipedia.org/w/index.php?title=German_tank_problem&oldid=1336763973"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp