- Notifications
You must be signed in to change notification settings - Fork4
Magdalena Price's Thesis Project for Masters of Engineering, MIT 2022
License
Algorithmic-Alignment-Lab/OpenCodingForMachineLearning
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Open Coding for Machine Learning is an annotation interface that allows a single annotator to efficiently and effectively devise labels and descriptions for a large, unlabeled dataset.
This interface requires the installation of npm (MacOS example)Ubuntu Example. This interface also requires the installation ofFlask, andHugging Face Transformers.
Homebrew is recommended for MacOS users.
To start, open your terminal and navigate into theOpenCodingForMachineLearning directory. The commandscd andls may be helpful, as well asthis guide.
When finished, the commandls should list out the five main directories of this project -data,interface,results,server, andtraining, in addition to a few other files.
To help make library installations and running the application more smooth, we have provided three executable files,setup.sh,opencoding.sh andshutdown.sh. These files likely already haveread permissions, but we will need them to haveexecute permissions in order to run them later. More information about permissions can be foundhere.
Run the following commands to addexecute permissions to our.sh files.
$ chmod +x setup.sh$ chmod +x opencoding.sh$ chmod +x shutdown.shThis interface requires the installation of npm (MacOS example)Ubuntu Example.
Follow the instructions in the links above or your preferred method of installation for your machine to install npm.
Complete the remaining necessary installations by running the command
$ ./setup.shin the terminal within theOpenCodingForMachineLearning directory. If you have an M1 chip, you may run into issues installing the necesary dependies for theserver part of the project - please see theREADME.md file withinserver for troubleshooting guidelines.
If any issues arise with installations, you may also consult theDevelopment Instructions within thetraining,server, andinterface directories'README.md files (in order).
To run the application, type the following command within theOpenCodingForMachineLearning directory in the terminal and pressenter.
$ ./opencoding.shThen, navigate tohttp://localhost:3000/. Note that you may have to replace 'localhost' with your computer's IP address.
You should see the introduction page!
HappyDB is already available for annotation and label creation, in addition to a few other datasets. If you would like to upload your own dataset, please seeUsing Personal Data in theREADME.md file in thedata directory.
When you're done using the application, close thehttp://localhost:3000/ tab and enter the following command into your terminal:
$ ./shutdown.shIf you aren't able to type into your terminal, you may have to click on the terminal and pressenter first.
If you accidentally close the terminal before shutting down the application, just open a new terminal and navigate back toOpenCodingForMachineLearning to try executing the shutdown command again.
This repository is split into five main sections - data, interface, results, server, and training. Each section has a README further detailing it's implementation, but a short summary is given here.
The csv files loaded by the annotation interface are located here. The csv files must follow the format
ID,TEXT,0,DATASET_TITLE,1,TEXT_1,...,...,N,TEXT_N,or they will be unable to be processed. By default, a cleaned version ofHappyDB is provided.
In the "data" folder, feel free to upload any csv files with the format specified above. Note that each entrymust have a unique id.
Then, quit and re-run the executable. On the introduction page, the dropdown should now include your DATASET_TITLE.
All code necessary for the loading the webpage is located here, bootstrapped withCreate React App.
This is where your final, labeled csv file will be saved.
All code necessary for storing, loading, and generating data for the website is located here. The local database is built usingSQLite, and interactions between the Python backened and the Javascript frontend are achieved usingFlask.
All classification models and any model training code exist here.
About
Magdalena Price's Thesis Project for Masters of Engineering, MIT 2022
Resources
License
Uh oh!
There was an error while loading.Please reload this page.