uclnlp/pycodesuggestPublic

NotificationsYou must be signed in to change notification settings
Fork30
Star156

Learning to Auto-Complete using RNN Language Models

License

MIT license

156 stars 30 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Papers		Papers
data		data
github-scraper		github-scraper
lm		lm
ngram		ngram
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md

Repository files navigation

Learning Python Code Suggestion with a Sparse Pointer Network

This repository contains the code used in the paper "Learning Python Code Suggestion with a Sparse Pointer Network"

Prerequisites

Python 3.5+
Git 1.7+
Tensorflow 0.9+ (Tested on 0.9 - 0.11)
Github3.py
GitPython
An account atGithub

Generating the Corpus

Step 1: Cloning the Repos

To recreate the corpus used in the paper, run:

python3 github-scraper/scraper.py --mode=recreate --outdir=<PATH-TO-OUTPUT-DIR> --dbfile=/FULL/PATH/TO/pycodesuggest/data/cloned_repos.dat --githubuser=<GITHUB USERNAME>

Where outdir is the path on your local machine where the repos will be cloned. Note that the dbfile path should be thefull path on your machine. You may be prompted for your Github password.

To obtain a fresh corpus based on a new search of Github, using the same criteria as the paper, run:

python3 github-scraper/scraper.py --mode=new --outdir=<PATH-TO-OUTPUT-DIR> --dbfile=cloned_repos.dat --githubuser=<GITHUB USERNAME>

Note that you may interrupt the process and continue where it left off later by providing the same dbfile.

There are a number of other parameters that allow you to create your own custom corpus, specifying theprogramming language or search term used to query Github amongst others. Runpython3 github-scraper/scraper.py -h for more information

Step 2: (OPTIONAL): Remove unnecessary files

Linux/Mac OS: Run the following command in your output directory to remove non Python files

find . -type f ! -name "*.py" -delete

Step 3: Normalisation

Run the following command to normalise all files with a .py extension by providing the output directory of step 1 as the path. The normalised files will be written to a new directory with "normalised" appended to the path.

python3 github-scraper/normalisation.py --path=<PATH TO DOWNLOADED CORPUS>

Files which can't be parsed as valid Python3 will be ignored. The list of successfully processed files is written to PATH/processed.txt which also allows for the normalisation to continue if interrupted.

Step 4: Split into train/dev/test

To use the same train/dev/test split as used in the paper, copy the files train_files.txt, valid_files.txt and test_files.txt from the data directory into the downloaded corpus and normalised corpus directories.

To generate a new split, run the following command which generates the list of train files (train_files.txt), validation files (valid_files.txt) and test files (test_files.txt) in the ratio 0.5/0.2/0.3. Use thenormalised path from the previous step. This will ensure that the list of files is available in both the normalised and unnormalised data sets.

python3 github-scraper/processFiles.py --path=<PATH TO NORMALISED CORPUS>

Then copy the 3 generated lists to the original un-normalised path.

Citing

If you make use of this code or the Python corpus, please cite:

@article{pycodesuggest,  author    = {Avishkar Bhoopchand and               Tim Rockt{\"{a}}schel and               Earl Barr and               Sebastian Riedel},  title     = {Learning Python Code Suggestion with a Sparse Pointer Network},  year      = {2016},  url       = {http://arxiv.org/abs/1611.08307}}

About

Learning to Auto-Complete using RNN Language Models

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Learning Python Code Suggestion with a Sparse Pointer Network

Prerequisites

Generating the Corpus

Step 1: Cloning the Repos

Step 2: (OPTIONAL): Remove unnecessary files

Step 3: Normalisation

Step 4: Split into train/dev/test

Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

uclnlp/pycodesuggest

Folders and files

Latest commit

History

Repository files navigation

Learning Python Code Suggestion with a Sparse Pointer Network

Prerequisites

Generating the Corpus

Step 1: Cloning the Repos

Step 2: (OPTIONAL): Remove unnecessary files

Step 3: Normalisation

Step 4: Split into train/dev/test

Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages