techwithashish1/aws-content-categorization-appPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
extras		extras
lambdas		lambdas
sample-test		sample-test
step-functions		step-functions
README.md		README.md
deploy.ps1		deploy.ps1
deploy.sh		deploy.sh
requirements.txt		requirements.txt
template.yaml		template.yaml

Repository files navigation

AWS Content Categorization System

Overview

The AWS Content Categorization System is an intelligent, serverless application that automatically analyzes and categorizes multimedia content uploaded to Amazon S3 based on age-appropriateness. The system leverages AWS AI/ML services to provide comprehensive content assessment and automated file organization.

Core Functionality

Automated Content Analysis: When users upload files (audio, image, or video) to an S3 bucket, the system automatically:

Detects the file type and routes to appropriate analyzers
Performs AI-powered content analysis using AWS Rekognition, Transcribe, and Comprehend
Assesses age-appropriateness across six categories (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+)
Organizes files into categorized folders based on content suitability
Tags files with metadata and stores detailed analysis results

Architecture Highlights

Event-Driven Serverless Design: Built entirely on AWS serverless technologies including:

EventBridge for event detection and workflow triggering
Step Functions for orchestrating complex multi-step analysis workflows
Lambda Functions for scalable, cost-effective content processing
AI/ML Services (Rekognition, Transcribe, Comprehend) for intelligent content analysis
DynamoDB for storing analysis results and audit trails

Key Benefits

Automated Content Moderation: Eliminates manual content review processes
Age-Appropriate Content Filtering: Ensures content is suitable for target audiences
Scalable Architecture: Handles varying workloads with automatic scaling
Cost-Effective: Pay-per-use serverless model with no infrastructure management
Comprehensive Audit Trail: Complete processing history and analysis metadata
Real-Time Processing: Immediate content analysis upon upload

Use Cases

Educational Platforms: Categorizing learning materials by age groups
Content Management Systems: Organizing media libraries by audience suitability
Parental Control Systems: Filtering content for child-safe environments
Media Companies: Automated content rating and classification
Digital Libraries: Age-based content curation and recommendation systems

High-Level Architecture Flow

Detailed Level Process Flow

┌─────────────────┐     ┌──────────────────┐    ┌─────────────────────┐│   User Upload   │     │   Amazon S3      │    │   Amazon EventBridge││                 │ --> │   Source Bucket  │ -->│   S3 Event Rule     ││   (Manual)      │     │   uploads/       │    │                     │└─────────────────┘     └──────────────────┘    └─────────────────────┘                                                          │                                                          ▼┌─────────────────────────────────────────────────────────────────────┐│                    AWS Step Functions                               ││                Content Categorization Workflow                      ││                                                                     ││  ┌─────────────────┐    ┌──────────────────────────────────────────┐││  │ Start Execution │──> │          File Type Detection             │││  └─────────────────┘    │                                          │││                         │  ┌─────────────────────────────────────┐ │││                         │  │     File Type Detector Lambda       │ │││                         │  │                                     │ │││                         │  │  • Analyze file extension           │ │││                         │  │  • Determine: audio/image/video     │ │││                         │  │  • Route to appropriate analyzer    │ │││                         │  └─────────────────────────────────────┘ │││                         └──────────────────────────────────────────┘││                                          │                          ││                         ┌────────────────┼────────────────┐         ││                         ▼                ▼                ▼         ││  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐        ││  │ Audio Analysis  │ │ Image Analysis  │ │ Video Analysis  │        ││  │                 │ │                 │ │                 │        ││  │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │        ││  │ │Audio Lambda │ │ │ │Image Lambda │ │ │ │Video Lambda │ │        ││  │ │             │ │ │ │             │ │ │ │             │ │        ││  │ │• Duration   │ │ │ │• Rekognition│ │ │ │• Rekognition│ │        ││  │ │• Complexity │ │ │ │• Safety     │ │ │ │• Scenes     │ │        ││  │ │• Content    │ │ │ │• Labels     │ │ │ │• Action     │ │        ││  │ │• Age Rating │ │ │ │• Age Rating │ │ │ │• Age Rating │ │        ││  │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │        ││  └─────────────────┘ └─────────────────┘ └─────────────────┘        ││           │                   │                   │                 ││           └───────────────────┼───────────────────┘                 ││                               ▼                                     ││  ┌─────────────────────────────────────────────────────────────────┐││  │                    Results Processing                           │││  │                                                                 │││  │  • Age Assessment (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+)   │││  │  • Confidence Scoring                                           │││  │  • S3 Object Tagging                                            │││  │  • File Organization & Movement                                 │││  │  • DynamoDB Storage                                             │││  └─────────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────────┘                                  │                                  ▼┌─────────────────────────────────────────────────────────────────────┐│                     Output Storage & Organization                   ││                                                                     ││  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐  ││  │   Amazon S3     │    │   Amazon S3     │    │  Amazon DynamoDB│  ││  │  Tagged Objects │    │ Organized Files │    │ Analysis Results│  ││  │                 │    │                 │    │                 │  ││  │ • ContentType   │    │ categorized/    │    │ • object_key    │  ││  │ • AgeCategory   │    │ ├── 3yr/        │    │ • analysis_time │  ││  │ • SafetyScore   │    │ ├── 3-7yr/      │    │ • file_type     │  ││  │ • Confidence    │    │ ├── 7-10yr/     │    │ • age_category  │  ││  │ • Status        │    │ ├── 10-13yr/    │    │ • confidence    │  ││  │                 │    │ ├── 13-18yr/    │    │ • analysis_data │  ││  │                 │    │ └── 18+/        │    │                 │  ││  └─────────────────┘    └─────────────────┘    └─────────────────┘  │└─────────────────────────────────────────────────────────────────────┘

Detailed Component Architecture

1.Trigger Layer

┌─────────────────┐    ┌─────────────────┐    ┌───────────────────┐│     User        │    │   Amazon S3     │    │ Amazon EventBridge││                 │    │                 │    │                   ││ • Manual Upload │───>│ • Object Store  │───>│ • Event Detection ││ • File Drop     │    │ • Versioning    │    │ • Rule Matching   ││ • Batch Upload  │    │ • Event Config  │    │ • Workflow Trigger│└─────────────────┘    └─────────────────┘    └───────────────────┘

2.Orchestration Layer

┌─────────────────────────────────────────────────────────────────┐│                    AWS Step Functions                           ││                                                                 ││  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ ││  │   Start     │─>│   Choice    │─>│   Parallel  │─>│   End   │ ││  │ Execution   │  │   State     │  │  Execution  │  │  State  │ ││  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ ││                          │                                      ││                          ▼                                      ││                   ┌─────────────┐                               ││                   │File Type    │                               ││                   │Detection    │                               ││                   │Logic        │                               ││                   └─────────────┘                               │└─────────────────────────────────────────────────────────────────┘

3.Processing Layer

┌─────────────────────────────────────────────────────────────────┐│                     Lambda Functions                            ││                                                                 ││ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐    ││ │ File Detector   │ │ Content Analyzers│ │ Common Layer    │    ││ │                 │ │                  │ │                 │    ││ │ • MIME Type     │ │ ┌─────────────┐  │ │ • Shared Utils  │    ││ │ • Extension     │ │ │Audio Lambda │  │ │ • Constants     │    ││ │ • Validation    │ │ │Image Lambda │  │ │ • DynamoDB Ops  │    ││ │ • Routing       │ │ │Video Lambda │  │ │ • S3 Operations │    ││ │                 │ │ └─────────────┘  │ │ • Age Logic     │    ││ └─────────────────┘ └──────────────────┘ └─────────────────┘    │└─────────────────────────────────────────────────────────────────┘

4.Analysis Layer

┌─────────────────────────────────────────────────────────────────┐│                    AI/ML Services                               ││                                                                 ││ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │Amazon Rekognition   │ │Amazon Transcribe│ │Amazon Comprehend│ ││ │                     │ │                 │ │                 │ ││ │ • Label Detection   │ │ • Speech-to-Text│ │ • Sentiment     │ ││ │ • Content Moderation│ │ • Language      │ │ • Entity Extract│ ││ │ • Text Detection    │ │   Detection     │ │ • Language      │ ││ │ • Face Detection    │ │ • Audio Analysis│ │   Detection     │ ││ └─────────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘

5.Storage Layer

┌─────────────────────────────────────────────────────────────────┐│                      Data Storage                               ││                                                                 ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐     ││ │   Amazon S3     │ │  Amazon DynamoDB│ │ CloudWatch Logs │     ││ │                 │ │                 │ │                 │     ││ │ • Source Files  │ │ • Analysis      │ │ • Function Logs │     ││ │ • Categorized   │ │   Results       │ │ • Error Logs    │     ││ │   Folders       │ │ • Metadata      │ │ • Metrics       │     ││ │ • Object Tags   │ │ • Audit Trail   │ │ • Monitoring    │     ││ └─────────────────┘ └─────────────────┘ └─────────────────┘     │└─────────────────────────────────────────────────────────────────┘

Data Flow Sequence

1. User Upload        │ 2. S3 Event         │ 3. EventBridge      │ 4. Step Function   ┌─────────┐        │   ┌─────────┐       │   ┌─────────┐       │   ┌─────────┐   │  File   │───────>│   │ Object  │──────>│   │  Rule   │──────>│   │Workflow │   │ Upload  │        │   │ Created │       │   │ Trigger │       │   │ Start   │   └─────────┘        │   └─────────┘       │   └─────────┘       │   └─────────┘                      │                     │                     │5. File Type Check    │ 6. Content Analysis │ 7. Age Assessment   │ 8. File Organization   ┌─────────┐        │   ┌─────────┐       │   ┌─────────┐       │   ┌─────────┐   │ Detect  │───────>│   │Analyze  │──────>│   │Category │──────>│   │ Move &  │   │ Format  │        │   │Content  │       │   │ Assign  │       │   │  Tag    │   └─────────┘        │   └─────────┘       │   └─────────┘       │   └─────────┘                      │                     │                     │9. Storage            │ 10. Monitoring      │ 11. Completion      │   ┌─────────┐        │   ┌──────────┐      │   ┌─────────┐       │   │DynamoDB │        │   │CloudWatch│      │   │Workflow │       │   │ Store   │        │   │  Logs    │      │   │  End    │       │   └─────────┘        │   └──────────┘      │   └─────────┘       │

Security & Permissions Architecture

┌─────────────────────────────────────────────────────────────────┐│                        IAM Roles & Policies                     ││                                                                 ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐     ││ │EventBridge Role │ │Step Function    │ │ Lambda Execution│     ││ │                 │ │ Execution Role  │ │ Roles           │     ││ │ • Start         │ │                 │ │                 │     ││ │   Execution     │ │ • Invoke Lambda │ │ • S3 Read/Write │     ││ │ • Step Function │ │ • CloudWatch    │ │ • DynamoDB R/W  │     ││ │   Access        │ │   Logging       │ │ • Rekognition   │     ││ │                 │ │ • Error Handling│ │ • Transcribe    │     ││ └─────────────────┘ └─────────────────┘ └─────────────────┘     │└─────────────────────────────────────────────────────────────────┘

This architecture provides:

Scalability: Serverless components scale automatically
Reliability: Built-in error handling and retry mechanisms
Security: Fine-grained IAM permissions
Monitoring: Comprehensive logging and metrics
Cost-Efficiency: Pay-per-use pricing model
Maintainability: Modular design with shared components

Project Repo Structure:

lambdas/├── common/│   ├── __init__.py│   ├── utils.py                    # Core utilities│   ├── image_constants.py          # Image criteria│   ├── audio_constants.py          # Audio criteria│   ├── video_constants.py          # Video criteria│   └── requirements.txt├── audio-analyzer/│   └── lambda_function.py         ├── image-analyzer/│   └── lambda_function.py          └── video-analyzer/    └── lambda_function.py

File Organization:

After analysis, files are automatically organized into folders by age category:

categorized/3yr/ - Safe content for toddlers
categorized/3-7yr/ - Educational content for young children
categorized/7-10yr/ - Adventure and learning content
categorized/10-13yr/ - Complex themes for pre-teens
categorized/13-18yr/ - Teen-appropriate content
categorized/18+/ - Adult content

Original uploaded files are deleted after successful categorization.

Key Components Created:

Lambda Functions:
- file-type-detector: Identifies if uploaded file is audio, video, image, or other
- audio-analyzer: Analyzes audio content for age appropriateness
- image-analyzer: Uses Amazon Rekognition for visual content analysis
- video-analyzer: Combines audio/visual analysis for comprehensive assessment
Step Function : Orchestrates the analysis workflow with error handling
DynamoDB Table: Stores analysis results and metadata for each processed file
EventBridge Rule: Triggers the workflow on S3 file uploads
S3 Bucket : Source file uploads and organized categorized content

Age Categories Supported:

3yr: Simple, safe content for toddlers
3-7yr: Educational content for young children
7-10yr: Adventure and learning content
10-13yr: Complex themes for pre-teens
13-18yr: Teen-appropriate content
18+: Adult content

Data Storage:

Analysis results are stored in the DynamoDB tableContentAnalysisResults with the following structure:

Primary Key:object_key (S3 file path)
Sort Key:analysis_timestamp (ISO format timestamp)
Attributes:
- bucket_name: Source S3 bucket
- file_type: Detected file type (audio/image/video)
- analysis_type: Type of analysis performed
- processing_status: completed/failed
- age_category: Recommended age category (3yr, 3-7yr, etc.)
- confidence: Analysis confidence score
- analysis_results: Detailed analysis data
- age_assessment: Complete age assessment details

Deployment

Build the project:
```
  sam build
```
Deploy the stack:
```
  sam deploy --guided
```

Requirements

Python 3.11+
AWS SAM CLI
AWS Account with appropriate permissions
AWS CLI configured with valid credentials

Troubleshooting

If you encounter dependency resolution errors:

The Lambda functions use the built-in boto3 from AWS Lambda runtime
No additional Python packages are required for basic functionality
For custom dependencies, update the respective requirements.txt files

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

techwithashish1/aws-content-categorization-app