numpy.histogram(a,bins=10,range=None,normed=False,weights=None,density=None)[source]¶Compute the histogram of a set of data.
| Parameters: | a : array_like
bins : int or sequence of scalars or str, optional
range : (float, float), optional
normed : bool, optional
weights : array_like, optional
density : bool, optional
|
|---|---|
| Returns: | hist : array
bin_edges : array of dtype float
|
See also
Notes
All but the last (righthand-most) bin is half-open. In other words,ifbins is:
[1,2,3,4]
then the first bin is[1,2) (including 1, but excluding 2) andthe second[2,3). The last bin, however, is[3,4], whichincludes 4.
New in version 1.11.0.
The methods to estimate the optimal number of bins are well foundedin literature, and are inspired by the choices R provides forhistogram visualisation. Note that having the number of binsproportional to
is asymptotically optimal, which iswhy it appears in most estimators. These are simply plug-in methodsthat give good starting points for number of bins. In the equationsbelow,
is the binwidth and
is the number ofbins. All estimators that compute bin counts are recast to bin widthusing theptp of the data. The final bin count is obtained from``np.round(np.ceil(range / h))`.
.
The binwidth is proportional to the interquartile range (IQR)and inversely proportional to cube root of a.size. Can be tooconservative for small datasets, but is quite good for largedatasets. The IQR is very robust to outliers.
![h = \sigma \sqrt[3]{\frac{24 * \sqrt{\pi}}{n}}](/image.pl?url=https%3a%2f%2fdocs.scipy.org%2fdoc%2fnumpy-1.13.0%2freference%2fgenerated%2f..%2f..%2f_images%2fmath%2ff7dc925b2da3af8c6d7b9730ecbd5e387753a866.png&f=jpg&w=240)
The binwidth is proportional to the standard deviation of thedata and inversely proportional to cube root ofx.size. Canbe too conservative for small datasets, but is quite good forlarge datasets. The standard deviation is not very robust tooutliers. Values are very similar to the Freedman-Diaconisestimator in the absence of outliers.

The number of bins is only proportional to cube root ofa.size. It tends to overestimate the number of bins and itdoes not take into account data variability.

The number of bins is the base 2 log ofa.size. Thisestimator assumes normality of data and is too conservative forlarger, non-normal datasets. This is the default method in R’shist method.
![n_h = 1 + \log_{2}(n) + \log_{2}(1 + \frac{|g_1|}{\sigma_{g_1}})g_1 = mean[(\frac{x - \mu}{\sigma})^3]\sigma_{g_1} = \sqrt{\frac{6(n - 2)}{(n + 1)(n + 3)}}](/image.pl?url=https%3a%2f%2fdocs.scipy.org%2fdoc%2fnumpy-1.13.0%2freference%2fgenerated%2f..%2f..%2f_images%2fmath%2fb7318cd405779a78210d7be4ec1cc6d8c3c8ad32.png&f=jpg&w=240)
An improved version of Sturges’ formula that produces betterestimates for non-normal datasets. This estimator attempts toaccount for the skew of the data.

The simplest and fastest estimator. Only takes into account thedata size.
Examples
>>>np.histogram([1,2,1],bins=[0,1,2,3])(array([0, 2, 1]), array([0, 1, 2, 3]))>>>np.histogram(np.arange(4),bins=np.arange(5),density=True)(array([ 0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4]))>>>np.histogram([[1,2,1],[1,0,1]],bins=[0,1,2,3])(array([1, 4, 1]), array([0, 1, 2, 3]))
>>>a=np.arange(5)>>>hist,bin_edges=np.histogram(a,density=True)>>>histarray([ 0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5])>>>hist.sum()2.4999999999999996>>>np.sum(hist*np.diff(bin_edges))1.0
New in version 1.11.0.
Automated Bin Selection Methods example, using 2 peak random datawith 2000 points:
>>>importmatplotlib.pyplotasplt>>>rng=np.random.RandomState(10)# deterministic random data>>>a=np.hstack((rng.normal(size=1000),...rng.normal(loc=5,scale=2,size=1000)))>>>plt.hist(a,bins='auto')# arguments are passed to np.histogram>>>plt.title("Histogram with 'auto' bins")>>>plt.show()
