Soongja/basic-image-edaPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star55

A simple image dataset EDA tool (CLI / Code)

License

MIT license

55 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
basic_image_eda		basic_image_eda
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Repository files navigation

basic-image-eda

A simple multiprocessing EDA tool to check basic information of images under a directory(images are found recursively). This tool was made to quickly check info and prevent mistakes on reading, resizing, and normalizing images as inputs for neural networks. It can be used when first joining an image competition or training CNNs with images!

Notes:
- All images are converted to 3-channel(rgb) images. When images that have various channels are mixed, some results can be misleading.
- uint8 and uint16 data types are supported. If different data types are mixed, error occurs.
- Supported extensions: jpg, jpeg, jpe, png, tif, tiff, bmp, ppm, pbm, pgm, sr, ras, webp

Installation

pip install basic-image-eda

or (latest version)

pip install git+https://github.com/Soongja/basic-image-eda

prerequisites:

opencv-python
numpy
matplotlib
skimage.io
tifffile
tqdm

Usage(CLI/Code)

CLI

simple one line command!

basic-image-eda<data_dir>

basic-image-eda<data_dir> -e png tiff -t 12 --dimension_plot --channel_hist --nonzero --hw_division_factor 2.0> eda.txtOptions:  -e --extensions          target image extensions.if none, all supported extensions are included.(default=None)  -t --threads             number of multiprocessing threads.if 0, automatically count max threads.(default=0)  -d --dimension_plot      show dimension(height/width) scatter plot.(default=False)  -c --channel_hist        show channelwise pixel value histogram. takes longer time.(default=False)  -n --nonzero             calculate values only from non-zero pixels of the images.(default=False)  -f --hw_division_factor  divide height,width of the images by this factor to make pixel value calculation faster.                           Information on height, width are not changed and will be printed correctly.(default=1.0)  -V --version             show version.

Code

frombasic_image_edaimportBasicImageEDAif__name__=="__main__":# for multiprocessingdata_dir="./data"BasicImageEDA.explore(data_dir)# orextensions= ['png','jpg','jpeg']threads=0dimension_plot=Truechannel_hist=Truenonzero=Falsehw_division_factor=1.0BasicImageEDA.explore(data_dir,extensions,threads,dimension_plot,channel_hist,nonzero,hw_division_factor)

Results

Results onceleba dataset (test set)

found 19962 images.Using 12 threads. (max:12)*--------------------------------------------------------------------------------------*number of images                         |  19962dtype                                    |  uint8channels                                 |  [3]extensions                               |  ['jpg']min height                               |  85max height                               |  5616mean height                              |  591.8215108706543median height                            |  500min width                                |  85max width                                |  5616mean width                               |  490.2976655645727median width                             |  396mean height/width ratio                  |  1.207065732587525median height/width ratio                |  1.2626262626262625recommended input size(by mean)          |  [592 488] (h x w, multiples of 8)recommended input size(by mean)          |  [592 496] (h x w, multiples of 16)recommended input size(by mean)          |  [576 480] (h x w, multiples of 32)channel mean(0~1)                        |  [0.4954518  0.42574266 0.39330518]channel std(0~1)                         |  [0.3216056 0.3023355 0.3018837]*--------------------------------------------------------------------------------------*

download site:http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
paper: S. Yang, P. Luo, C. C. Loy, and X. Tang, "From Facial Parts Responses to Face Detection: A Deep Learning Approach", in IEEE International Conference on Computer Vision (ICCV), 2015

Results onNIH Chest X-ray dataset (images_001.tar.gz)

found 4999 images.Using 12 threads. (max:12)*--------------------------------------------------------------------------------------*number of images                         |  4999dtype                                    |  uint8channels                                 |  [1, 4]extensions                               |  ['png']min height                               |  1024max height                               |  1024mean height                              |  1024.0median height                            |  1024min width                                |  1024max width                                |  1024mean width                               |  1024.0median width                             |  1024mean height/width ratio                  |  1.0median height/width ratio                |  1.0recommended input size(by mean)          |  [1024 1024] (h x w, multiples of 8)recommended input size(by mean)          |  [1024 1024] (h x w, multiples of 16)recommended input size(by mean)          |  [1024 1024] (h x w, multiples of 32)channel mean(0~1)                        |  [0.5172472 0.5172472 0.5172472]channel std(0~1)                         |  [0.25274998 0.25274998 0.25274998]*--------------------------------------------------------------------------------------*

data provider: NIH Clinical Center
download site:https://nihcc.app.box.com/v/ChestXray-NIHCC
paper: Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald Summers, ChestX-ray8:Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization ofCommon Thorax Diseases, IEEE CVPR, pp. 3462-3471, 2017

License

MIT License

About

A simple image dataset EDA tool (CLI / Code)

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

basic-image-eda

Installation

Usage(CLI/Code)

CLI

Code

Results

Results onceleba dataset (test set)

Results onNIH Chest X-ray dataset (images_001.tar.gz)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

Soongja/basic-image-eda

Folders and files

Latest commit

History

Repository files navigation

basic-image-eda

Installation

Usage(CLI/Code)

CLI

Code

Results

Results onceleba dataset (test set)

Results onNIH Chest X-ray dataset (images_001.tar.gz)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages