Multimodal Format
Comprehensive multimodal content format that enables AI understanding of images, videos, audio, and interactive media. Provides rich metadata, AI analysis results, cross-modal relationships, and accessibility information for complete media comprehension.
Media Types
Analysis Ready
Accessible
Modal Links
Images
Comprehensive image metadata with AI analysis, variants, attribution, and accessibility support
Videos
Rich video content with quality variants, subtitles, chapters, and semantic analysis
Audio
Audio content with transcription, speaker identification, and sentiment analysis
Interactive Visuals
3D models, VR/AR content, panoramas, and interactive media experiences
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Standardized Multimodal Content Format",
"description": "Reusable schema component for multimodal content across all entity types",
"type": "object",
"aimlVersion": "2.0.1",
"schemaVersion": "2.0.1",
"properties": {
"visualContent": {
"type": "object",
"description": "Visual content and metadata",
"properties": {
"images": {
"type": "array",
"items": {
"type": "object",
"properties": {
"imageId": { "type": "string" },
"url": { "type": "string", "format": "uri" },
"alt": { "type": "string" },
"width": { "type": "integer" },
"height": { "type": "integer" },
"assetRole": {
"type": "string",
"enum": ["primary", "secondary", "detail", "gallery", "background", "logo", "icon", "banner", "thumbnail"]
}
}
}
},
"videos": {
"type": "array",
"items": {
"type": "object",
"properties": {
"videoId": { "type": "string" },
"url": { "type": "string", "format": "uri" },
"title": { "type": "string" },
"duration": { "type": "number" },
"thumbnail": { "type": "string", "format": "uri" }
}
}
}
}
},
"audioContent": {
"type": "object",
"description": "Audio content and metadata",
"properties": {
"audioClips": {
"type": "array",
"items": {
"type": "object",
"properties": {
"audioId": { "type": "string" },
"url": { "type": "string", "format": "uri" },
"title": { "type": "string" },
"duration": { "type": "number" },
"transcript": { "type": "string" }
}
}
}
}
}
}
}Note: Multimodal format provides the structure standard, but entity schemas include media data directly rather than using $ref references.
{
"visualContent": {
"images": [
{
"imageId": "product-hero",
"url": "https://techbazaar.com/assets/product-hero.jpg",
"alt": "TechBazaar marketplace main interface showing product categories",
"width": 1200,
"height": 800,
"assetRole": "primary",
"semanticDescription": "Modern e-commerce interface with clean navigation and featured products"
}
],
"videos": [
{
"videoId": "platform-demo",
"url": "https://techbazaar.com/assets/demo.mp4",
"title": "TechBazaar Platform Overview",
"duration": 120,
"thumbnail": "https://techbazaar.com/assets/demo-thumb.jpg"
}
]
}
}AI-Powered Analysis
- Object detection with confidence scores
- Dominant color extraction
- Image type classification
- Style property analysis
- Sentiment scoring
- Text content extraction
Cross-Modal Relationships
- Media relationship mapping
- Time synchronization
- Spatial coordinate mapping
- Semantic element connections
- Translation equivalents
- Alternative representations
Accessibility Features
- Alternative text generation
- Extended descriptions
- Audio descriptions
- Closed captions
- Sign language support
- WCAG compliance tracking
Media Management
- Quality variant handling
- Attribution tracking
- License management
- Version control
- Display prioritization
- Device optimization
Primary Assets
Supporting Assets
Functional Assets
Cross-Modal Relationships
Link related content across different media types with time and spatial synchronization.
AI Content Analysis
Automated content understanding with object detection, sentiment analysis, and style classification.
Accessibility Integration
Complete accessibility support with WCAG compliance tracking and alternative content formats.
Multimodal format component enables comprehensive AI understanding of visual, audio, and interactive content with rich metadata and analysis capabilities.