|
|---|
| | Preferred | Acceptable |
|---|
| A. Formats | - Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
- Formats using well known schemas with public validation tool available
- Line-oriented, e.g.TSV,CSV, fixed-width
- Platform-independent open formats, e.g..db, .db3, .sqlite, .sqlite3
- Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel.xls or .xlsx,Shapefile)
- Character Encoding, in descending order of preference:
- UTF-8, UTF-16 (with BOM),
- US-ASCII or ISO 8859-1
- Other named encoding
| For data (in order of preference): - Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g.CDF,HDF
- Text-based data formats with available schema
For aggregation or transfer: - ZIP,RAR,tar,7z with no encryption, password or other protection mechanisms.
|
|---|
| B. Related Materials | Consult the appropriate sections of this document to identify the preferred formats for supplementary material | |
|---|
| C. Delivery Method, in order of preference | - Public download URLs
- Automated private download URLS with any necessary API keys or credentials
- Hard drive; CD-ROM; DVD-ROM
| |
|---|
| D. Metadata | - Deposits should include all applicable metadata, data dictionaries,XML schemas, and technical specifications as appropriate. Discipline-specific metadata standards should be used whenever possible
- As supported by format:
- Title
- Creator
- Creation date
- Place of publication
- Publisher/producer/distributor
- Contact information
- A list of software used to produce, render or compress the data (if applicable)
- Character encoding
- Include if available/applicable:
- A list of software used to produce, render or compress the data
- Language of work
- Other relevant identifiers (e.g., DOI, LCCN, canonical URL, etc.)
- Subject descriptors
- Abstracts
- Key or reference to each data field
- Checksums
- Permanent version specifiers (e.g., date, version number, etc.)
- Grant number Information
- Name and permalink or DOI to access the repository in which the dataset is shared or preserved
- Information about how the data was collected and any sampling or post-processing which as been applied
- Known copyright terms, especially for datasets which combine data from multiple sources
- For datasets serving as part of a database: proprietary database package and version
- For aggregate files: manifest or file list of payload content
| |
|---|
| E. Technological Measures | - Files must contain no measures (such as digital rights management [DRM] technologies or encryption) that control access to or prevent use of the digital work.
- Files in formats which support linking or embedding external resources (e.g. XML, JSON, Excel.xls or .xlsx) should be self-contained to remain useful in the event of external service changes.
- Files in formats which support executable code (e.g. Excel) do not contain executable code.
| Files in formats which support executable code do not depend on embedded programs for purposes other than display (e.g. search, filtering, etc.); the raw data is available without executing code. |
|---|