Movatterモバイル変換


[0]ホーム

URL:


Skip to content
Search Gists
Sign in Sign up

Instantly share code, notes, and snippets.

@juanpabloaj
Last activeJune 18, 2018 13:02
    • Star(13)You must be signed in to star a gist
    • Fork(1)You must be signed in to fork a gist
    Save juanpabloaj/dffc6900f80abcfe8ce121a39cffa743 to your computer and use it in GitHub Desktop.
    Total of pip packages downloaded, separated by Python versions

    From June 26, 2016 (python 3.5.2 release) to Aug. 31, 2016.

    Python versions from 2.6 to 3.5

    downloads_by_versions

    Without 2.7

    downloads_by_versions_without_2.7

    ###Python packages downloads by major version

    Imgur

    -- https://bigquery.cloud.google.com/dataset/the-psf:pypi
    SELECT concat(
    date(timestamp),'_', REGEXP_EXTRACT(details.python, r'^([2-3]).[0-9].')
    )as date_python,count(details.python)as downloads
    FROM (TABLE_DATE_RANGE([the-psf:pypi.downloads],
    TIMESTAMP('2016-06-26'),
    TIMESTAMP('2016-08-31')))
    group by date_python
    -- https://bigquery.cloud.google.com/dataset/the-psf:pypi
    -- https://bigquery.cloud.google.com/table/the-psf:pypi.downloads20160903
    SELECT concat(
    date(timestamp),'_', REGEXP_EXTRACT(details.python, r'^([2-3].[0-9]).')
    )as date_python,count(details.python)as downloads
    FROM (TABLE_DATE_RANGE([the-psf:pypi.downloads],
    TIMESTAMP('2016-06-26'),
    TIMESTAMP('2016-08-31')))
    group by date_python
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    # To plot chart from csv generated by bigquery
    importpandasaspd
    importmatplotlib.pyplotasplt
    plt.figure()
    ts=pd.read_csv('download_python_version_by_day.csv')
    ts['date']=pd.to_datetime(ts['date'])
    df=ts.pivot(index='date',columns='python',values='downloads')
    #df.plot()
    #df[[2.6, 2.7, 3.1, 3.2, 3.3, 3.4, 3.5]].plot()
    df[[2.6,3.1,3.2,3.3,3.4,3.5]].plot()
    plt.show()
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    importpandasaspd
    importmatplotlib.pyplotasplt
    ts=pd.read_csv(
    'download_python_major_version_by_day.csv',parse_dates=True,
    )
    ts['date']=pd.to_datetime(ts['date'])
    df=ts.pivot(index='date',columns='python',values='downloads')
    ax=df[[2,3]].plot(logy=True,figsize=(12,9))
    ax.set_ylabel('log(downloads)')
    ax.set_title('Python packages downloads')
    plt.show()
    @PeridexisErrant
    Copy link

    Try a log scale on the y-axis, it's as simple asdf.plot(logy=True) (docs)

    @rhiever
    Copy link

    rhiever commentedSep 3, 2016
    edited
    Loading

    Here's a log scale version of the same data.

    python-pip-downloads

    @rhiever
    Copy link

    I also think it's important for this analysis to filter down topip installs:

    SELECT  CONCAT(DATE(timestamp),'_', REGEXP_EXTRACT(details.python, r'^([2-3].[0-9]).') )AS date_python,COUNT(details.python)AS downloadsFROM (TABLE_DATE_RANGE([the-psf:pypi.downloads],TIMESTAMP('2016-06-26'),TIMESTAMP('2016-08-31')))WHEREdetails.installer.nameLIKE'pip'GROUP BY  date_python

    although it doesn't have a huge impact on the results.

    @kootenpv
    Copy link

    kootenpv commentedSep 4, 2016
    edited
    Loading

    Also, if you just want a relative comparison (rather than absolute), it might also be better to addWHERE ... AND details.cpu IS NOT NULL; more chance to get "actual" people installs rather than bots that are just mirroring (which might have more chance to be 2.7)

    Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

    [8]ページ先頭

    ©2009-2025 Movatter.jp