- Notifications
You must be signed in to change notification settings - Fork8
TalentBoy2333/remote-sensing-image-caption
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This is a project of remote sensing image.
- Classification of remote sensing image
- Remote sensing image caption
I'm usingtorch 0.4,opencv-python,numpy,matplotlib inpython 3.6
I usedNWPU-RESISC45 dataset, you can download this dataset(http://www.escience.cn/people/JunweiHan/NWPU-RESISC45.html) to train your model, or you can download other dataset, but you should pay attention to the difference of the way to load data from daataset, we maybe used different way to load data.
All in all, check thedataset.py.
We can choose two pre-train models,resnet_v101 andmobilenet_v2.
classifier=Classifier(model_name,class_number,True)
Then, I just add a full connect layer and softmax to classify.
If you want to train you own model, you just need to prepare the dataset, then run thetrain.py.
python train.py
By the way, you can modify thetrain.py, to setpre-train model,batch size,epoch,learning rate, and continue training base on themodel which was saved in last training.
The training model will be saved in./models/train/.
If you want to predict the class of a new remote sensing image.
First, you should modify thepredict.py to set image name, load the parameters of model and you can also see the display of result.
image_name='test.jpg'classifier=get_classifier('mobilenet','./models/train/classifier_50.pkl')predict(classifier,image_name,True)
Then, just run thepredict.py.
python predict.py
I used theRSICD datasetLu X, Wang B, Zheng X, et al. Exploring Models and Data for Remote Sensing Image Caption Generation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017.
to train my model, you can download this dataset athttps://github.com/201528014227051/RSICD_optimal
Or, you can use your own dataset, same asClassification, modify thedata.py anddataloader.py for your dataset.
I usedShow, Attend and Tell model, you can read this paper:Xu, Kelvin, et al. “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044 (2015)., or you can refer tohttps://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
My model incloudencoder,decoder andattention.
Because of our project is used on ARM, so we must simplify the network, our encoder ismobilenet_v2, we delete the full connect layer for classification, make themobilenet_v2 output the feature map of image(size 77).
Attention part is composed of some full connect layer, input is the hidden layer's output of decoder, output is a tensor(size 149), reshape this tensor to size 7*7.
Then, we can get feature vector byattention tenser(7*7) andfeature map(7*7), and this feature vector is the input of decoder.
Attention image:
Decoder is base on LSTM, input is the embedding of words in dictorary(every word in dictionary is aone-hot code, and they will be transformed to feature vector by embedding layer), hidden layer is connected with full connect layer and softmax, output is the probability of the next word.
By the way, first input is a signal of beginning:<start>, and the last output is a signal of endding:..
I used a similar approach toSSD(https://arxiv.org/abs/1512.02325), in evary iteration, I change the value of random pixels of the mini-batch, add random lighting noise, randomly swap image channels, randomly adjust the contrast of image. Then, I randomly crop a part of the image sample in mini-batch and randomly mirror the image after sample cropping.
If you want to train your model, make sure that you have theRSICD dataset, if your dataset is different fromRSICD, you should modifydata.py anddataloader.py for your data.
if you want to change the details of the model, you should modifymodel.py andconfig.py.
Inconfig.py, you can also modify thelearning rate,batch size,epoch and so on.
Then, we can start training by runningtrain.py, you can modify the functiontrain() to decide to training from nothing or traning from last model parameters.
python train.py
I used beam search to find the best sentence of image caption because beam search consider more possibility.
Beam Search(Assuming that the dictionary is [a, b, c], beam size chooses 2):
Step 1: When generating the first word, choose the two words with the highest probability, then the current sequence is
aorb.
Step 2: When the second word is generated, we combine the current sequenceaorbwith all the words in the dictionary to get six new sequencesaa,ab,ac,ba,bb,bc, and then select two of them with the highest probability as the current sequence,aborbb.
Step 3: Repeat this process until the terminator('.') is encountered. The final output is two sequences with the highest probability.
I set the parameter of beam search to3, you can modify the parameter ineval.py.
defbeam_search(data,decoder,encoder_output,parameter_B=3):
If you want to predict an image, you should modifypredict.py to set test image name and the path of model parameters.
predict('test.jpg', ['./models/train/encoder_mobilenet_20000.pkl','./models/train/decoder_20000.pkl'])
Then, you can runpredict.py, and you will see theimage,sentence of image caption andthe distribution image of attention module for every word.
python predict.py
I useBLEU-4 to evaluate the quality of generated sentences.
BLEU-4:https://www.aclweb.org/anthology/P02-1040
My training BLEU:
About
remote sensing image classification and image caption by PyTorch
Resources
Uh oh!
There was an error while loading.Please reload this page.

