Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
NotificationsYou must be signed in to change notification settings

techwithashish1/aws-content-categorization-app

Repository files navigation

Overview

The AWS Content Categorization System is an intelligent, serverless application that automatically analyzes and categorizes multimedia content uploaded to Amazon S3 based on age-appropriateness. The system leverages AWS AI/ML services to provide comprehensive content assessment and automated file organization.

Core Functionality

Automated Content Analysis: When users upload files (audio, image, or video) to an S3 bucket, the system automatically:

  • Detects the file type and routes to appropriate analyzers
  • Performs AI-powered content analysis using AWS Rekognition, Transcribe, and Comprehend
  • Assesses age-appropriateness across six categories (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+)
  • Organizes files into categorized folders based on content suitability
  • Tags files with metadata and stores detailed analysis results

Architecture Highlights

Event-Driven Serverless Design: Built entirely on AWS serverless technologies including:

  • EventBridge for event detection and workflow triggering
  • Step Functions for orchestrating complex multi-step analysis workflows
  • Lambda Functions for scalable, cost-effective content processing
  • AI/ML Services (Rekognition, Transcribe, Comprehend) for intelligent content analysis
  • DynamoDB for storing analysis results and audit trails

Key Benefits

  • Automated Content Moderation: Eliminates manual content review processes
  • Age-Appropriate Content Filtering: Ensures content is suitable for target audiences
  • Scalable Architecture: Handles varying workloads with automatic scaling
  • Cost-Effective: Pay-per-use serverless model with no infrastructure management
  • Comprehensive Audit Trail: Complete processing history and analysis metadata
  • Real-Time Processing: Immediate content analysis upon upload

Use Cases

  • Educational Platforms: Categorizing learning materials by age groups
  • Content Management Systems: Organizing media libraries by audience suitability
  • Parental Control Systems: Filtering content for child-safe environments
  • Media Companies: Automated content rating and classification
  • Digital Libraries: Age-based content curation and recommendation systems

High-Level Architecture Flow

alt text

Detailed Level Process Flow

┌─────────────────┐     ┌──────────────────┐    ┌─────────────────────┐│   User Upload   │     │   Amazon S3      │    │   Amazon EventBridge││                 │ --> │   Source Bucket  │ -->│   S3 Event Rule     ││   (Manual)      │     │   uploads/       │    │                     │└─────────────────┘     └──────────────────┘    └─────────────────────┘                                                          │                                                          ▼┌─────────────────────────────────────────────────────────────────────┐│                    AWS Step Functions                               ││                Content Categorization Workflow                      ││                                                                     ││  ┌─────────────────┐    ┌──────────────────────────────────────────┐││  │ Start Execution │──> │          File Type Detection             │││  └─────────────────┘    │                                          │││                         │  ┌─────────────────────────────────────┐ │││                         │  │     File Type Detector Lambda       │ │││                         │  │                                     │ │││                         │  │  • Analyze file extension           │ │││                         │  │  • Determine: audio/image/video     │ │││                         │  │  • Route to appropriate analyzer    │ │││                         │  └─────────────────────────────────────┘ │││                         └──────────────────────────────────────────┘││                                          │                          ││                         ┌────────────────┼────────────────┐         ││                         ▼                ▼                ▼         ││  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐        ││  │ Audio Analysis  │ │ Image Analysis  │ │ Video Analysis  │        ││  │                 │ │                 │ │                 │        ││  │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │        ││  │ │Audio Lambda │ │ │ │Image Lambda │ │ │ │Video Lambda │ │        ││  │ │             │ │ │ │             │ │ │ │             │ │        ││  │ │• Duration   │ │ │ │• Rekognition│ │ │ │• Rekognition│ │        ││  │ │• Complexity │ │ │ │• Safety     │ │ │ │• Scenes     │ │        ││  │ │• Content    │ │ │ │• Labels     │ │ │ │• Action     │ │        ││  │ │• Age Rating │ │ │ │• Age Rating │ │ │ │• Age Rating │ │        ││  │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │        ││  └─────────────────┘ └─────────────────┘ └─────────────────┘        ││           │                   │                   │                 ││           └───────────────────┼───────────────────┘                 ││                               ▼                                     ││  ┌─────────────────────────────────────────────────────────────────┐││  │                    Results Processing                           │││  │                                                                 │││  │  • Age Assessment (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+)   │││  │  • Confidence Scoring                                           │││  │  • S3 Object Tagging                                            │││  │  • File Organization & Movement                                 │││  │  • DynamoDB Storage                                             │││  └─────────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────────┘                                  │                                  ▼┌─────────────────────────────────────────────────────────────────────┐│                     Output Storage & Organization                   ││                                                                     ││  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐  ││  │   Amazon S3     │    │   Amazon S3     │    │  Amazon DynamoDB│  ││  │  Tagged Objects │    │ Organized Files │    │ Analysis Results│  ││  │                 │    │                 │    │                 │  ││  │ • ContentType   │    │ categorized/    │    │ • object_key    │  ││  │ • AgeCategory   │    │ ├── 3yr/        │    │ • analysis_time │  ││  │ • SafetyScore   │    │ ├── 3-7yr/      │    │ • file_type     │  ││  │ • Confidence    │    │ ├── 7-10yr/     │    │ • age_category  │  ││  │ • Status        │    │ ├── 10-13yr/    │    │ • confidence    │  ││  │                 │    │ ├── 13-18yr/    │    │ • analysis_data │  ││  │                 │    │ └── 18+/        │    │                 │  ││  └─────────────────┘    └─────────────────┘    └─────────────────┘  │└─────────────────────────────────────────────────────────────────────┘

Detailed Component Architecture

1.Trigger Layer

┌─────────────────┐    ┌─────────────────┐    ┌───────────────────┐│     User        │    │   Amazon S3     │    │ Amazon EventBridge││                 │    │                 │    │                   ││ • Manual Upload │───>│ • Object Store  │───>│ • Event Detection ││ • File Drop     │    │ • Versioning    │    │ • Rule Matching   ││ • Batch Upload  │    │ • Event Config  │    │ • Workflow Trigger│└─────────────────┘    └─────────────────┘    └───────────────────┘

2.Orchestration Layer

┌─────────────────────────────────────────────────────────────────┐│                    AWS Step Functions                           ││                                                                 ││  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ ││  │   Start     │─>│   Choice    │─>│   Parallel  │─>│   End   │ ││  │ Execution   │  │   State     │  │  Execution  │  │  State  │ ││  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ ││                          │                                      ││                          ▼                                      ││                   ┌─────────────┐                               ││                   │File Type    │                               ││                   │Detection    │                               ││                   │Logic        │                               ││                   └─────────────┘                               │└─────────────────────────────────────────────────────────────────┘

3.Processing Layer

┌─────────────────────────────────────────────────────────────────┐│                     Lambda Functions                            ││                                                                 ││ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐    ││ │ File Detector   │ │ Content Analyzers│ │ Common Layer    │    ││ │                 │ │                  │ │                 │    ││ │ • MIME Type     │ │ ┌─────────────┐  │ │ • Shared Utils  │    ││ │ • Extension     │ │ │Audio Lambda │  │ │ • Constants     │    ││ │ • Validation    │ │ │Image Lambda │  │ │ • DynamoDB Ops  │    ││ │ • Routing       │ │ │Video Lambda │  │ │ • S3 Operations │    ││ │                 │ │ └─────────────┘  │ │ • Age Logic     │    ││ └─────────────────┘ └──────────────────┘ └─────────────────┘    │└─────────────────────────────────────────────────────────────────┘

4.Analysis Layer

┌─────────────────────────────────────────────────────────────────┐│                    AI/ML Services                               ││                                                                 ││ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │Amazon Rekognition   │ │Amazon Transcribe│ │Amazon Comprehend│ ││ │                     │ │                 │ │                 │ ││ │ • Label Detection   │ │ • Speech-to-Text│ │ • Sentiment     │ ││ │ • Content Moderation│ │ • Language      │ │ • Entity Extract│ ││ │ • Text Detection    │ │   Detection     │ │ • Language      │ ││ │ • Face Detection    │ │ • Audio Analysis│ │   Detection     │ ││ └─────────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘

5.Storage Layer

┌─────────────────────────────────────────────────────────────────┐│                      Data Storage                               ││                                                                 ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐     ││ │   Amazon S3     │ │  Amazon DynamoDB│ │ CloudWatch Logs │     ││ │                 │ │                 │ │                 │     ││ │ • Source Files  │ │ • Analysis      │ │ • Function Logs │     ││ │ • Categorized   │ │   Results       │ │ • Error Logs    │     ││ │   Folders       │ │ • Metadata      │ │ • Metrics       │     ││ │ • Object Tags   │ │ • Audit Trail   │ │ • Monitoring    │     ││ └─────────────────┘ └─────────────────┘ └─────────────────┘     │└─────────────────────────────────────────────────────────────────┘

Data Flow Sequence

1. User Upload        │ 2. S3 Event         │ 3. EventBridge      │ 4. Step Function   ┌─────────┐        │   ┌─────────┐       │   ┌─────────┐       │   ┌─────────┐   │  File   │───────>│   │ Object  │──────>│   │  Rule   │──────>│   │Workflow │   │ Upload  │        │   │ Created │       │   │ Trigger │       │   │ Start   │   └─────────┘        │   └─────────┘       │   └─────────┘       │   └─────────┘                      │                     │                     │5. File Type Check    │ 6. Content Analysis │ 7. Age Assessment   │ 8. File Organization   ┌─────────┐        │   ┌─────────┐       │   ┌─────────┐       │   ┌─────────┐   │ Detect  │───────>│   │Analyze  │──────>│   │Category │──────>│   │ Move &  │   │ Format  │        │   │Content  │       │   │ Assign  │       │   │  Tag    │   └─────────┘        │   └─────────┘       │   └─────────┘       │   └─────────┘                      │                     │                     │9. Storage            │ 10. Monitoring      │ 11. Completion      │   ┌─────────┐        │   ┌──────────┐      │   ┌─────────┐       │   │DynamoDB │        │   │CloudWatch│      │   │Workflow │       │   │ Store   │        │   │  Logs    │      │   │  End    │       │   └─────────┘        │   └──────────┘      │   └─────────┘       │

Security & Permissions Architecture

┌─────────────────────────────────────────────────────────────────┐│                        IAM Roles & Policies                     ││                                                                 ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐     ││ │EventBridge Role │ │Step Function    │ │ Lambda Execution│     ││ │                 │ │ Execution Role  │ │ Roles           │     ││ │ • Start         │ │                 │ │                 │     ││ │   Execution     │ │ • Invoke Lambda │ │ • S3 Read/Write │     ││ │ • Step Function │ │ • CloudWatch    │ │ • DynamoDB R/W  │     ││ │   Access        │ │   Logging       │ │ • Rekognition   │     ││ │                 │ │ • Error Handling│ │ • Transcribe    │     ││ └─────────────────┘ └─────────────────┘ └─────────────────┘     │└─────────────────────────────────────────────────────────────────┘

This architecture provides:

  • Scalability: Serverless components scale automatically
  • Reliability: Built-in error handling and retry mechanisms
  • Security: Fine-grained IAM permissions
  • Monitoring: Comprehensive logging and metrics
  • Cost-Efficiency: Pay-per-use pricing model
  • Maintainability: Modular design with shared components

Project Repo Structure:

lambdas/├── common/│   ├── __init__.py│   ├── utils.py                    # Core utilities│   ├── image_constants.py          # Image criteria│   ├── audio_constants.py          # Audio criteria│   ├── video_constants.py          # Video criteria│   └── requirements.txt├── audio-analyzer/│   └── lambda_function.py         ├── image-analyzer/│   └── lambda_function.py          └── video-analyzer/    └── lambda_function.py

File Organization:

After analysis, files are automatically organized into folders by age category:

  • categorized/3yr/ - Safe content for toddlers
  • categorized/3-7yr/ - Educational content for young children
  • categorized/7-10yr/ - Adventure and learning content
  • categorized/10-13yr/ - Complex themes for pre-teens
  • categorized/13-18yr/ - Teen-appropriate content
  • categorized/18+/ - Adult content

Original uploaded files are deleted after successful categorization.

Key Components Created:

  1. Lambda Functions:
    • file-type-detector: Identifies if uploaded file is audio, video, image, or other
    • audio-analyzer: Analyzes audio content for age appropriateness
    • image-analyzer: Uses Amazon Rekognition for visual content analysis
    • video-analyzer: Combines audio/visual analysis for comprehensive assessment
  2. Step Function : Orchestrates the analysis workflow with error handling
  3. DynamoDB Table: Stores analysis results and metadata for each processed file
  4. EventBridge Rule: Triggers the workflow on S3 file uploads
  5. S3 Bucket : Source file uploads and organized categorized content

Age Categories Supported:

  • 3yr: Simple, safe content for toddlers
  • 3-7yr: Educational content for young children
  • 7-10yr: Adventure and learning content
  • 10-13yr: Complex themes for pre-teens
  • 13-18yr: Teen-appropriate content
  • 18+: Adult content

Data Storage:

Analysis results are stored in the DynamoDB tableContentAnalysisResults with the following structure:

  • Primary Key:object_key (S3 file path)
  • Sort Key:analysis_timestamp (ISO format timestamp)
  • Attributes:
    • bucket_name: Source S3 bucket
    • file_type: Detected file type (audio/image/video)
    • analysis_type: Type of analysis performed
    • processing_status: completed/failed
    • age_category: Recommended age category (3yr, 3-7yr, etc.)
    • confidence: Analysis confidence score
    • analysis_results: Detailed analysis data
    • age_assessment: Complete age assessment details

Deployment

  • Build the project:

      sam build
  • Deploy the stack:

      sam deploy --guided

Requirements

  • Python 3.11+
  • AWS SAM CLI
  • AWS Account with appropriate permissions
  • AWS CLI configured with valid credentials

Troubleshooting

If you encounter dependency resolution errors:

  • The Lambda functions use the built-in boto3 from AWS Lambda runtime
  • No additional Python packages are required for basic functionality
  • For custom dependencies, update the respective requirements.txt files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp