- Notifications
You must be signed in to change notification settings - Fork0
TestFileCreate - A small Linux app that creates test files in a single directory or in a tree of directories. Files can be identical or individually filled with random printable characters or binary data. Number of printable characters can be selected from the ASCII set (max 95). File size is selectable or random. Has a resource calculator.
Jim-JMCD/TestFilesCreate
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A small Linux app that creates test files filling them with random data in a single directory or a directory tree.
- Includes a calculator for creating data in directory trees. [ Option -C ]
- Will create a single file to directory trees of files, minimum file size = 2 bytes
- Files can be identical or individually filled with random content.
- File contents can either be printable or binary. All contents generated from /dev/urandom.
- Selecting the printable pool of characters determines the complexity of file contents.
- Files sizes can be identical or randomly sized within a given range.
- Run either interactively or unaccompanied in batch mode.
SeeComaparitive benchmark testing of data compression and deduplication section on how TestFileCreate can be used as a standardised benchmark for comparing data storage reduction techniques.
TestFilesCreate is a Linux portable x64 executable created from the bash scriptTFile_Create (private Github repository) using shc.
This requires a bash environment to run.An executable created from theshc utility always requires bash on x64
More :Github shc
TestFileCreate -C
User Inputs: Tree depth, width, number files per directory and file size
Output: Summary, tables of data trees of current and smaller trees. Tables contain data size and file numbers for each tree
DEFAULTS
- File contents are binary. Use -P or -D for creating compressable printable data
- If all files same size then contents will also be identical. Use -r option to randomise file contents.
- A time stamped output directory is created in users current directory. Alternatively use another directory with the -o option
- Interactive mode, requires confirmation to proceed after providing user with a summary. See examples.
Maximum permitted values, seeLimitations section
OPTIONS :
Directory LayoutAll mandatory
For the following,n is a number, minimum is 1
- -nn Number of files in each directory.
- -dn Depth. How many directories deep.
- -wn Width. How many directories wide.
Create single directory: -d 1 (-w if set, will be ignored)
Create tree of directories:
- Depth min is -d 2
- Width min is -w 1 and manditory
File SizeA file size is mandatory
Fixed File Size
- File sizes have to be designated by B, K, M or G. Minimum is 2B (2 bytes). Example 2KiB = 2K, 3MiB = 3M 4GiB = 4G
- -f Fixed file size, default [usage: -f 2K]
NOTE: Default content for fixed file size of printable data is: ALL FILES ARE IDENTICAL, use
- -r Random content is generated individually for every file.
Random File Size
- All files individually filled with random content.
- -s Smallest file size.
- -l Largest file size.
- If -s is omitted, the random range starts at 2B (2 bytes) if largest file size is <1G, smallest size will be 1M (1MiB) if largest file size is >= 1G
File ContentsDefault is random binary
-Pn Wheren is a number in the range 1 to 95. Selects the pool of printable characters from the ASCII set.
- n = 1 files only contain the uppercase 'A'
- n = 2 to 26 files only contain lowercase Latin alphabet characters
- n >26 files contain printable ASCII characters. Max n = 95
-Dn Wheren is a number in the range 1 to 10. Selects the pool of digit charcters from the ASCII set.
- n = 1 files only contain zeros '0'
- n > 1 files contain digits. Max n = 10
-r Random content for fixed file sizes.
INPUT, OUTPUT and LOGGING
- -b Batch/quiet run with no user checks. Default is interactive with user input.
- -o Output to anexisting directory.
- Defaults to current working directory
- Creates a new time stamped directory for content (tfc_YYMMDD_hhmm_ss).
- No logging. In batch mode user has to redirect output to a file
- Progress indicated by time stamping every ten directories filled with files.
- The 'script' command can be used in interactive mode to record all activty.
LIMITATIONS
Data creation bails out before any data creation if:
- The number of directories to be created exceeds 100 million
- The number of files to be created exceeds 100 million
- If the 'shuf' command is not available
- if the -c option not avaiable for the 'head' command.
- For x64 Linux. Do do list: AArch64/ARM64 version.
If the 'seq' command is not avaiable. The character pool will not be displayed in the inital summary. The seq command is not required for file creation.
Binary data is generated from /dev/urandom. This data will not compress that well. Binary data that is stored/transmitted may render data deduplication and compression ineffective.
FILE CONTENT VALIDATION
Validate contents: all Files:
od -N <bytes> -Ax -t x1z <file name>
- Where <bytes> is the number to check from beginning of file.
- Non-printable characters will appear as "dots"
Validate printable character distribution:
od -a <file name> | cut -b 9- | tr " " \\n | egrep -v "^$" | sort | uniq -cORsed 's/\(.\)/\1\n/g' <file name> | sort | uniq -c
Output
- Column 1 : Character count
- Column 2 : Character being counted. NOTE : This column should only contain a single charcter,if more than one character then file contents is binary data.
Validate printable character pool count:
Example: confirms that a complexity of 17 given by-P 17 contains a pool of 17 different characters.
od -a <file name> | cut -b 9- | tr " " \\n | egrep -v "^$" | sort | uniq -c | wc -lORsed 's/\(.\)/\1\n/g' <file name> | sort | uniq -c | wc -l
EXAMPLES
TestFilesCreate -P 28 -d 3 -w 5 -f 15M -n 50
DIRECTORY TREE each directory contains 5 directories and 50 filesThe tree is 3 levels deepOutput: /home/ted/test/tfc_240930-1759-37All files with identical contentsFiles created are all 15MStorage used...... 22.71G (max potential)File Contents..... Random selection from the 28 char set: !"#$%&'()*+,-./0123456789:;<Total data directories........30Total data files............1550Do you want to proceed? (y/n)
TestFilesCreate -D 5 -d 1 -f 600K -n 1000 -r -o /home/ted/test
SINGLE DIRECTORY containing 1000 filesOutput: /home/ted/test/tfc_240930-1802-53Random data created individually for all filesFiles created are all 600KStorage used...... 585.94M (max potential)File Contents..... Random selection from the 5 digit set: 01234Total data directories.........1Total data files............1000Do you want to proceed? (y/n)
TestFileCreate can be used as a standardised benchmark for comparing data storage reduction techniques.
In these examples theData Complexity is set by the-P option. A data complexity of 10 = -P 10 and a data complexity of 12 = -P 12
For more information on the creation of the charts seeTestFilesCreate datasheet.
About
TestFileCreate - A small Linux app that creates test files in a single directory or in a tree of directories. Files can be identical or individually filled with random printable characters or binary data. Number of printable characters can be selected from the ASCII set (max 95). File size is selectable or random. Has a resource calculator.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.