Call-Emma.ai offers flexible configuration options that support either cost-optimized processing or high-speed processing.
For operations focused on minimizing deployment costs for generative AI features, we recommend two deployment types:
- The first option is more expensive but offers the security of keeping all data processing within a dedicated cloud instance.
- The second option provides higher speed and lower costs but relies on external AI processing via APIs from providers like Groq.com and AWS Bedrock.
For Operations focused on receiving the results as quickly as possible, then we recommend external processing of both the audio and AI Inference.
- Speed-Optimized Deployments: This is a cost-effective option with the advantage of being the quickest method to return results to the system. However, it relies on external processing of audio via Speech-to-Text (S2T) and AI via Large Language Model (LLM) APIs.
Summary of Cost Options
| Instance Type | LLM on Instance | Max Number of Agents Supported per Instance | Cost / Hour of Audio | Monthly Costs per Agent |
|---|---|---|---|---|
| g4dn.2xlarge | Yes | 35 agents | $0.063 | $10.00 /month |
| g5.xlarge | Yes | 75 agents | $0.040 | $6.40 /month |
| t4g.2xlarge | External AI Groq.com Llama 3.1 8B | 240 agents | $0.02 - $0.03 | $0.97 /month |
| t4g.2xlarge | External AI Groq.com Llama 3.3 70B | 240 agents | $0.03 - $0.04 | $5.27 /month |
Cost for System That Generates Results in Less Than 10 Minutes
| Instance Type | LLM on Instance | Max Number of Agents Supported per Instance | Cost / Hour of Audio | Monthly Costs per Agent |
|---|---|---|---|---|
| t4g.2xlarge | External AI Groq.com for audio and LLM | >1000 agents | $0.04 to $0.08 | $10 to $16 /month |
Cost Optimized Deployments
LLM Processing on Cloud Instance Deployment
The first table outlines the cost-optimized deployment option, where all data processing is contained within a single cloud instance. This configuration is priced between $0.04 to $0.06 per hour of audio processed, translating to approximately $6.50 to $10.00 per agent per month, assuming a standard workload of 40 hours per week over four weeks.
These estimates are based on on-demand pricing for Nvidia G4 or G5 series instances on AWS. Using reserved instances or spot pricing can further reduce costs. Note that these figures are based on running 8 AI prompts on a Llama 8B model. The more AI analysis prompts executed, the higher the operational costs.
| AWS Instance Options | Option 1 | Option 2 |
|---|---|---|
| Note: Speech Transcription and Gen AI Processing Done on Instance | ||
| Requires External LLM | No | No |
| AWS Instance Type | g4dn.2xlarge | g5.xlarge |
| Specs | 8 vCPU, 32 GiB Ram - NVIDIA T4 | 4 vCPU, 16 GiB Ram - NVIDIA a10g |
| Performance | ||
| Hours of Conversation Processed per hour of Cloud Instance Time | 12 Hours | 25 Hours |
| On Demand AWS Price/hour | $0.75 | $1.00 |
| Cost per Hour of Audio Processed | $0.063 | $0.040 |
| Max Performance Per Server Instance | ||
| Number of Hours of Conversation that can be Processed per 24 hour day | 288 Hours | 600 Hours |
| Supported Number of Call Center Agents for Results Ready Next Day (Assumed 8 Hour shifts) | 36 Agents | 75 Agents |
| Cost Ratios | Assuming 36 Agents | Assuming 75 Agents |
| Cost per Agent per Day (one 8 hour shift 100% on phone) | $0.50 /day | $0.32 /day |
| Cost per Agent per Month (5 day work weeks) | $10.00 /month | $6.40 /month |
LLM Processing Externally via an API with S2T on the Instance
The second table outlines the cost-optimized deployment option, where speech to text is processed on the instance but LLM inference is performed externally via an API. This configuration is priced between $0.006 to $0.033 per hour of audio processed, translating to approximately $1 to $6 per agent per month, assuming a standard workload of 40 hours per week over four weeks.
These estimates are based on on-demand pricing for AI Processing externally at Groq.com and AI processing of Speech-to-Text via the cloud instance. Using reserved instances or spot pricing can further reduce costs. Note that these figures are based on running 8 AI prompts externally. The more AI analysis prompts executed, the higher the operational costs.
| AWS Instance Options | Option 3 | Option 4 |
|---|---|---|
| Note: Speech transcription performed on Instance, AI inference performed via external API | ||
| Requires External LLM | Yes | Yes |
| AWS Instance Type | t4g.2xlarge | t4g.2xlarge |
| Specs | 8 vCPUs, 32 GiB RAM - NO GPU | 8 vCPUs, 32 GiB RAM - NO GPU |
| Performance | ||
| Hours of Conversation Processed per hour of Cloud Instance Time | 80 Hours | 80 Hours |
| On Demand AWS Price/hour | $0.27 | $0.27 |
| LLM Performance | ||
| Assumed token usage per hour of conversation: 30K tokens in, 15K tokens out, used in 8 LLM prompts. | Groq.com Llama 3.1 8B | Groq.com Llama 3.3 70B |
| Cost In ($/M tokens) | $0.05 | $0.59 |
| Cost Out ($/M tokens) | $0.08 | $0.79 |
| External API Cost / Hour of Call | $0.0027 | $0.0296 |
| Cost per Hour of Conversation Processed | $0.0061 | $0.0329 |
| Number of Hours of Conversation that can be Processed per 24 hour day | 1920 Hours | 1920 Hours |
| Supported Number of Call Center Agents for Results Ready Next Day (Assumed 8 Hour shifts) | 240 Agents | 240 Agents |
| Cost Ratios | Assuming 240 Agents | Assuming 240 Agents |
| Cost per Agent per Day (one 8 hour shift 100% on phone) | $0.049 /day | $0.263 /day |
| Cost per Agent per Month (5 day work weeks) | $0.97 /month | $5.27 /month |
Speed-Optimized Deployments
For speed-optimized deployments where receiving results quickly is paramount and cost is optimized accordingly, the most effective approach is to parallelize the processing of heavy workloads. This allows the Call-Emma.ai system to efficiently manage the workflow of jobs.
In this configuration, speech-to-text (S2T) processing is handled via API by a service such as Groq.com, utilizing the Whisper Large v3 Turbo model. With a speed factor of 216x, it can transcribe one hour of audio in approximately 17 seconds at a cost of $0.04 per hour, based on pricing as of May 22, 2025. Following transcription, AI inference is also performed through Groq.com, using models such as LLaMA 3.1 8B or LLaMA 3.3 70B. The expected processing time after a call is completed is typically a few minutes. For example, results for a 20-minute call may be available in less than five minutes.
This configuration is estimated to cost between $0.04 and $0.08 per hour of audio processed with the 8B Llama 3.1 model and $0.05 and $0.1 per hour of audio processed with the 70B Llama 3.3 model, translating to approximately $6 to $16 per agent per month, assuming a standard workload of 40 hours per week over four weeks.