Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A system that detects scene text on traffic signs through images and videos

License

NotificationsYou must be signed in to change notification settings

nguyennpa412/scene-text-detection-for-driving-videos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Abstract
With the current trend of automation gradually dominating many aspects of human life, the demand for highly accurate and timely responsive automated systems has become essential. Specifically, in the context of transportation, self-driving vehicles, automated traffic monitoring and analysis systems require a capability to read and comprehend the traffic context at a given moment to make informed decisions. My research, "Scene Text Detection for Driving Videos", aims at supporting automated transportation systems in capturing textual information from traffic signs.

System Pipeline

Scene Text Detection for Driving Videos System pipeline

Module 1: Detect and classify traffic signs

ppyoloe_architecture
PP-YOLOE architecture🡵

Module 2: Detect text box on traffic signs

ppocrv3_det_architecture
PP-OCRv3 (detection) architecture🡵

Data

#DatasetDescriptionDetailM1 UsageM2 Usage
#1Vietnam Traffic Signs DatasetOpen source recorded traffic videos around Ho Chi Minh City40 videos (total length: 1h24m44s)Fine-tuning + TestingFine-tuning + Testing
#2VinTextLargest Vietnamese Scene text dataset2,000 labeled images, ~56,000 text objects (~10,500 unique objects)TestingFine-tuning + Testing
#3Zalo AI Challenge - Traffic Sign Detection DatasetZalo AI Challenge dataset for “Traffic Signs Detection" contest in 2020 with image data collected from Google Map Street View~8,000 traffic images with traffic sign labelsTestingTesting
#4ExtraSelf collected dataset around Ho Chi Minh City198 images, 393 traffic sign objectsImproved Fine-tuning + TestingTesting

Customized Vietnam Traffic Signs Dataset (Customized VTSD)

Since Dataset #1 was used in another project with different output, we need to re-process Dataset #1 to match with our project target

  • Splitting and filtering images from raw videos
  • UsingCVAT to label traffic signs and text
  • Label statistics:
# ofImages296
Traffic sign objects603
Traffic sign classes12
Word objects1,538 (274 unique words)
Textline objects628

traffic_sign_classes
Traffic sign classes and data distribution

Fine-tune

ModuleModelPre-trained datasetFine-tuned datasetPerformanceFPS
#1PP-YOLOE+Objects365Customized VTSDmAP: ~0.677~18.3
#2PP-OCRv3 (detection)Baidu images + public datasetsCustomized VTSD +VinTextH-mean: ~0.82~29.5
  • Improving M1 performance by combine Dataset #4 into Customized VTSD:
    • Total number of images and number of traffic sign objects increase by ~40%
    • After improved mAP: ~0.69

improved_sample
Improvement sample
(above and below images are before and after improvement, respectively)

Video output samples

#1sample_1
#2sample_2
#3sample_3

Future works

  • Fine-tuning and combine Scene text recognition module into the system
  • Building an End-to-end model based on Transformer
  • Developing a web application for demonstration

About

A system that detects scene text on traffic signs through images and videos

Topics

Resources

License

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp