![]() | |
Developer(s) | Freebase, thenGoogle, now open source community |
---|---|
Initial release | November 10, 2010; 14 years ago (2010-11-10) |
Stable release | |
Repository | |
Written in | Java[2] |
Platform | Microsoft Windows,Linux,macOS |
Available in | English, Italian, Chinese, Japanese, French, German |
Type | |
License | BSD License |
Website | openrefine![]() |
OpenRefine is anopen-source desktop application for data cleanup and transformation to other formats, an activity commonly known asdata wrangling.[3] It is similar tospreadsheet applications, and can handle spreadsheet file formats such asCSV, but it behaves more like a database.
It operates onrows of data which have cells undercolumns, similar to the manner in whichrelational database tables operate. OpenRefine projects consist of one table, whose rows can be filtered usingfacets that define criteria (for example, showing rows where a given column is not empty).
Unlike spreadsheets, most operations in OpenRefine are done on all visible rows, for example, the transformation of all cells in all rows under one column,[4] or the creation of a new column based on existing data. Actions performed on a dataset are stored the project and can be 'replayed' on other datasets. Formulas are not stored in cells, but are used to transform the data. Transformation is done only once.[5] Formula expressions can be written inGeneral Refine Expression Language (GREL),[6] inJython (i.e., Python), and inClojure.[7]
The program operates as a local web app: it starts aweb server and opens the default browser to127.0.0.1:3333.
Import is supported from following formats:[14]
If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools. Archived and compressed files are supported (.zip, .tar.gz, .tgz, .tar.bz2, .gz, or .bz2) and Refine can download input files from aURL. To use web pages as input, it is possible to import a list of URLs and then invoke a URL fetch function.
Export is supported in following formats:[16]
Whole OpenRefine projects in native format can be exported as a.tar.gz archive.
OpenRefine started life asFreebase Gridworks, developed byMetaweb and has been available as open source since January 2010.[17] On 16 July 2010,Google acquired Metaweb,[18] the creators ofFreebase, and on 10 November 2010 renamed Freebase GridwordsGoogle Refine, releasing version 2.0.[19] On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine.[20][21][22] Since then, the codebase has been in transition to an open source project named OpenRefine.[23]