AWS Content Categorization System The AWS Content Categorization System is an intelligent, serverless application that automatically analyzes and categorizes multimedia content uploaded to Amazon S3 based on age-appropriateness. The system leverages AWS AI/ML services to provide comprehensive content assessment and automated file organization.
Automated Content Analysis: When users upload files (audio, image, or video) to an S3 bucket, the system automatically:
Detects the file type and routes to appropriate analyzers Performs AI-powered content analysis using AWS Rekognition, Transcribe, and Comprehend Assesses age-appropriateness across six categories (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+) Organizes files into categorized folders based on content suitability Tags files with metadata and stores detailed analysis results Event-Driven Serverless Design: Built entirely on AWS serverless technologies including:
EventBridge for event detection and workflow triggering Step Functions for orchestrating complex multi-step analysis workflows Lambda Functions for scalable, cost-effective content processing AI/ML Services (Rekognition, Transcribe, Comprehend) for intelligent content analysis DynamoDB for storing analysis results and audit trails Automated Content Moderation: Eliminates manual content review processes Age-Appropriate Content Filtering: Ensures content is suitable for target audiences Scalable Architecture: Handles varying workloads with automatic scaling Cost-Effective: Pay-per-use serverless model with no infrastructure management Comprehensive Audit Trail: Complete processing history and analysis metadata Real-Time Processing: Immediate content analysis upon upload Educational Platforms: Categorizing learning materials by age groups Content Management Systems: Organizing media libraries by audience suitability Parental Control Systems: Filtering content for child-safe environments Media Companies: Automated content rating and classification Digital Libraries: Age-based content curation and recommendation systems High-Level Architecture Flow
Detailed Level Process Flow ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐│ User Upload │ │ Amazon S3 │ │ Amazon EventBridge││ │ --> │ Source Bucket │ -->│ S3 Event Rule ││ (Manual) │ │ uploads/ │ │ │└─────────────────┘ └──────────────────┘ └─────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────┐│ AWS Step Functions ││ Content Categorization Workflow ││ ││ ┌─────────────────┐ ┌──────────────────────────────────────────┐││ │ Start Execution │──> │ File Type Detection │││ └─────────────────┘ │ │││ │ ┌─────────────────────────────────────┐ │││ │ │ File Type Detector Lambda │ │││ │ │ │ │││ │ │ • Analyze file extension │ │││ │ │ • Determine: audio/image/video │ │││ │ │ • Route to appropriate analyzer │ │││ │ └─────────────────────────────────────┘ │││ └──────────────────────────────────────────┘││ │ ││ ┌────────────────┼────────────────┐ ││ ▼ ▼ ▼ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │ Audio Analysis │ │ Image Analysis │ │ Video Analysis │ ││ │ │ │ │ │ │ ││ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ ││ │ │Audio Lambda │ │ │ │Image Lambda │ │ │ │Video Lambda │ │ ││ │ │ │ │ │ │ │ │ │ │ │ │ ││ │ │• Duration │ │ │ │• Rekognition│ │ │ │• Rekognition│ │ ││ │ │• Complexity │ │ │ │• Safety │ │ │ │• Scenes │ │ ││ │ │• Content │ │ │ │• Labels │ │ │ │• Action │ │ ││ │ │• Age Rating │ │ │ │• Age Rating │ │ │ │• Age Rating │ │ ││ │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │ ││ └─────────────────┘ └─────────────────┘ └─────────────────┘ ││ │ │ │ ││ └───────────────────┼───────────────────┘ ││ ▼ ││ ┌─────────────────────────────────────────────────────────────────┐││ │ Results Processing │││ │ │││ │ • Age Assessment (3yr, 3-7yr, 7-10yr, 10-13yr, 13-18yr, 18+) │││ │ • Confidence Scoring │││ │ • S3 Object Tagging │││ │ • File Organization & Movement │││ │ • DynamoDB Storage │││ └─────────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────┐│ Output Storage & Organization ││ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │ Amazon S3 │ │ Amazon S3 │ │ Amazon DynamoDB│ ││ │ Tagged Objects │ │ Organized Files │ │ Analysis Results│ ││ │ │ │ │ │ │ ││ │ • ContentType │ │ categorized/ │ │ • object_key │ ││ │ • AgeCategory │ │ ├── 3yr/ │ │ • analysis_time │ ││ │ • SafetyScore │ │ ├── 3-7yr/ │ │ • file_type │ ││ │ • Confidence │ │ ├── 7-10yr/ │ │ • age_category │ ││ │ • Status │ │ ├── 10-13yr/ │ │ • confidence │ ││ │ │ │ ├── 13-18yr/ │ │ • analysis_data │ ││ │ │ │ └── 18+/ │ │ │ ││ └─────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────────┘
Detailed Component Architecture ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────┐│ User │ │ Amazon S3 │ │ Amazon EventBridge││ │ │ │ │ ││ • Manual Upload │───>│ • Object Store │───>│ • Event Detection ││ • File Drop │ │ • Versioning │ │ • Rule Matching ││ • Batch Upload │ │ • Event Config │ │ • Workflow Trigger│└─────────────────┘ └─────────────────┘ └───────────────────┘
┌─────────────────────────────────────────────────────────────────┐│ AWS Step Functions ││ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ ││ │ Start │─>│ Choice │─>│ Parallel │─>│ End │ ││ │ Execution │ │ State │ │ Execution │ │ State │ ││ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘ ││ │ ││ ▼ ││ ┌─────────────┐ ││ │File Type │ ││ │Detection │ ││ │Logic │ ││ └─────────────┘ │└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐│ Lambda Functions ││ ││ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ ││ │ File Detector │ │ Content Analyzers│ │ Common Layer │ ││ │ │ │ │ │ │ ││ │ • MIME Type │ │ ┌─────────────┐ │ │ • Shared Utils │ ││ │ • Extension │ │ │Audio Lambda │ │ │ • Constants │ ││ │ • Validation │ │ │Image Lambda │ │ │ • DynamoDB Ops │ ││ │ • Routing │ │ │Video Lambda │ │ │ • S3 Operations │ ││ │ │ │ └─────────────┘ │ │ • Age Logic │ ││ └─────────────────┘ └──────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐│ AI/ML Services ││ ││ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │Amazon Rekognition │ │Amazon Transcribe│ │Amazon Comprehend│ ││ │ │ │ │ │ │ ││ │ • Label Detection │ │ • Speech-to-Text│ │ • Sentiment │ ││ │ • Content Moderation│ │ • Language │ │ • Entity Extract│ ││ │ • Text Detection │ │ Detection │ │ • Language │ ││ │ • Face Detection │ │ • Audio Analysis│ │ Detection │ ││ └─────────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐│ Data Storage ││ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │ Amazon S3 │ │ Amazon DynamoDB│ │ CloudWatch Logs │ ││ │ │ │ │ │ │ ││ │ • Source Files │ │ • Analysis │ │ • Function Logs │ ││ │ • Categorized │ │ Results │ │ • Error Logs │ ││ │ Folders │ │ • Metadata │ │ • Metrics │ ││ │ • Object Tags │ │ • Audit Trail │ │ • Monitoring │ ││ └─────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘
1. User Upload │ 2. S3 Event │ 3. EventBridge │ 4. Step Function ┌─────────┐ │ ┌─────────┐ │ ┌─────────┐ │ ┌─────────┐ │ File │───────>│ │ Object │──────>│ │ Rule │──────>│ │Workflow │ │ Upload │ │ │ Created │ │ │ Trigger │ │ │ Start │ └─────────┘ │ └─────────┘ │ └─────────┘ │ └─────────┘ │ │ │5. File Type Check │ 6. Content Analysis │ 7. Age Assessment │ 8. File Organization ┌─────────┐ │ ┌─────────┐ │ ┌─────────┐ │ ┌─────────┐ │ Detect │───────>│ │Analyze │──────>│ │Category │──────>│ │ Move & │ │ Format │ │ │Content │ │ │ Assign │ │ │ Tag │ └─────────┘ │ └─────────┘ │ └─────────┘ │ └─────────┘ │ │ │9. Storage │ 10. Monitoring │ 11. Completion │ ┌─────────┐ │ ┌──────────┐ │ ┌─────────┐ │ │DynamoDB │ │ │CloudWatch│ │ │Workflow │ │ │ Store │ │ │ Logs │ │ │ End │ │ └─────────┘ │ └──────────┘ │ └─────────┘ │
Security & Permissions Architecture ┌─────────────────────────────────────────────────────────────────┐│ IAM Roles & Policies ││ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ │EventBridge Role │ │Step Function │ │ Lambda Execution│ ││ │ │ │ Execution Role │ │ Roles │ ││ │ • Start │ │ │ │ │ ││ │ Execution │ │ • Invoke Lambda │ │ • S3 Read/Write │ ││ │ • Step Function │ │ • CloudWatch │ │ • DynamoDB R/W │ ││ │ Access │ │ Logging │ │ • Rekognition │ ││ │ │ │ • Error Handling│ │ • Transcribe │ ││ └─────────────────┘ └─────────────────┘ └─────────────────┘ │└─────────────────────────────────────────────────────────────────┘
This architecture provides:
Scalability : Serverless components scale automaticallyReliability : Built-in error handling and retry mechanismsSecurity : Fine-grained IAM permissionsMonitoring : Comprehensive logging and metricsCost-Efficiency : Pay-per-use pricing modelMaintainability : Modular design with shared componentslambdas/├── common/│ ├── __init__.py│ ├── utils.py # Core utilities│ ├── image_constants.py # Image criteria│ ├── audio_constants.py # Audio criteria│ ├── video_constants.py # Video criteria│ └── requirements.txt├── audio-analyzer/│ └── lambda_function.py ├── image-analyzer/│ └── lambda_function.py └── video-analyzer/ └── lambda_function.py
After analysis, files are automatically organized into folders by age category:
categorized/3yr/
- Safe content for toddlerscategorized/3-7yr/
- Educational content for young childrencategorized/7-10yr/
- Adventure and learning contentcategorized/10-13yr/
- Complex themes for pre-teenscategorized/13-18yr/
- Teen-appropriate contentcategorized/18+/
- Adult contentOriginal uploaded files are deleted after successful categorization.
Lambda Functions:file-type-detector: Identifies if uploaded file is audio, video, image, or other audio-analyzer: Analyzes audio content for age appropriateness image-analyzer: Uses Amazon Rekognition for visual content analysis video-analyzer: Combines audio/visual analysis for comprehensive assessment Step Function : Orchestrates the analysis workflow with error handling DynamoDB Table: Stores analysis results and metadata for each processed file EventBridge Rule: Triggers the workflow on S3 file uploads S3 Bucket : Source file uploads and organized categorized content Age Categories Supported: 3yr: Simple, safe content for toddlers 3-7yr: Educational content for young children 7-10yr: Adventure and learning content 10-13yr: Complex themes for pre-teens 13-18yr: Teen-appropriate content 18+: Adult content Analysis results are stored in the DynamoDB tableContentAnalysisResults
with the following structure:
Primary Key :object_key
(S3 file path)Sort Key :analysis_timestamp
(ISO format timestamp)Attributes :bucket_name
: Source S3 bucketfile_type
: Detected file type (audio/image/video)analysis_type
: Type of analysis performedprocessing_status
: completed/failedage_category
: Recommended age category (3yr, 3-7yr, etc.)confidence
: Analysis confidence scoreanalysis_results
: Detailed analysis dataage_assessment
: Complete age assessment detailsBuild the project:
Deploy the stack:
Python 3.11+ AWS SAM CLI AWS Account with appropriate permissions AWS CLI configured with valid credentials If you encounter dependency resolution errors:
The Lambda functions use the built-in boto3 from AWS Lambda runtime No additional Python packages are required for basic functionality For custom dependencies, update the respective requirements.txt files