Costs to Operate - AInspire Technology Consulting

Call-Emma.ai offers flexible configuration options that support either cost-optimized processing or high-speed processing.

For operations focused on minimizing deployment costs for generative AI features, we recommend two deployment types:

The first option is more expensive but offers the security of keeping all data processing within a dedicated cloud instance.
The second option provides higher speed and lower costs but relies on external AI processing via APIs from providers like Groq.com and AWS Bedrock.

For Operations focused on receiving the results as quickly as possible, then we recommend external processing of both the audio and AI Inference.

Speed-Optimized Deployments: This is a cost-effective option with the advantage of being the quickest method to return results to the system. However, it relies on external processing of audio via Speech-to-Text (S2T) and AI via Large Language Model (LLM) APIs.

Summary of Cost Options

Instance Type	LLM on Instance	Max Number of Agents Supported per Instance	Cost / Hour of Audio	Monthly Costs per Agent
g4dn.2xlarge	Yes	35 agents	$0.063	$10.00 /month
g5.xlarge	Yes	75 agents	$0.040	$6.40 /month
t4g.2xlarge	External AI Groq.com Llama 3.1 8B	240 agents	$0.02 - $0.03	$0.97 /month
t4g.2xlarge	External AI Groq.com Llama 3.3 70B	240 agents	$0.03 - $0.04	$5.27 /month

Cost for System That Generates Results in Less Than 10 Minutes

Instance Type	LLM on Instance	Max Number of Agents Supported per Instance	Cost / Hour of Audio	Monthly Costs per Agent
t4g.2xlarge	External AI Groq.com for audio and LLM	>1000 agents	$0.04 to $0.08	$10 to $16 /month

Cost Optimized Deployments

LLM Processing on Cloud Instance Deployment

The first table outlines the cost-optimized deployment option, where all data processing is contained within a single cloud instance. This configuration is priced between $0.04 to $0.06 per hour of audio processed, translating to approximately $6.50 to $10.00 per agent per month, assuming a standard workload of 40 hours per week over four weeks.

These estimates are based on on-demand pricing for Nvidia G4 or G5 series instances on AWS. Using reserved instances or spot pricing can further reduce costs. Note that these figures are based on running 8 AI prompts on a Llama 8B model. The more AI analysis prompts executed, the higher the operational costs.

AWS Instance Options	Option 1	Option 2
Note: Speech Transcription and Gen AI Processing Done on Instance
Requires External LLM	No	No
AWS Instance Type	g4dn.2xlarge	g5.xlarge
Specs	8 vCPU, 32 GiB Ram - NVIDIA T4	4 vCPU, 16 GiB Ram - NVIDIA a10g
Performance
Hours of Conversation Processed per hour of Cloud Instance Time	12 Hours	25 Hours
On Demand AWS Price/hour	$0.75	$1.00
Cost per Hour of Audio Processed	$0.063	$0.040
Max Performance Per Server Instance
Number of Hours of Conversation that can be Processed per 24 hour day	288 Hours	600 Hours
Supported Number of Call Center Agents for Results Ready Next Day (Assumed 8 Hour shifts)	36 Agents	75 Agents
Cost Ratios	Assuming 36 Agents	Assuming 75 Agents
Cost per Agent per Day (one 8 hour shift 100% on phone)	$0.50 /day	$0.32 /day
Cost per Agent per Month (5 day work weeks)	$10.00 /month	$6.40 /month

LLM Processing Externally via an API with S2T on the Instance

The second table outlines the cost-optimized deployment option, where speech to text is processed on the instance but LLM inference is performed externally via an API. This configuration is priced between $0.006 to $0.033 per hour of audio processed, translating to approximately $1 to $6 per agent per month, assuming a standard workload of 40 hours per week over four weeks.

These estimates are based on on-demand pricing for AI Processing externally at Groq.com and AI processing of Speech-to-Text via the cloud instance. Using reserved instances or spot pricing can further reduce costs. Note that these figures are based on running 8 AI prompts externally. The more AI analysis prompts executed, the higher the operational costs.

AWS Instance Options	Option 3	Option 4
Note: Speech transcription performed on Instance, AI inference performed via external API
Requires External LLM	Yes	Yes
AWS Instance Type	t4g.2xlarge	t4g.2xlarge
Specs	8 vCPUs, 32 GiB RAM - NO GPU	8 vCPUs, 32 GiB RAM - NO GPU
Performance
Hours of Conversation Processed per hour of Cloud Instance Time	80 Hours	80 Hours
On Demand AWS Price/hour	$0.27	$0.27
LLM Performance
Assumed token usage per hour of conversation: 30K tokens in, 15K tokens out, used in 8 LLM prompts.	Groq.com Llama 3.1 8B	Groq.com Llama 3.3 70B
Cost In ($/M tokens)	$0.05	$0.59
Cost Out ($/M tokens)	$0.08	$0.79
External API Cost / Hour of Call	$0.0027	$0.0296
Cost per Hour of Conversation Processed	$0.0061	$0.0329
Number of Hours of Conversation that can be Processed per 24 hour day	1920 Hours	1920 Hours
Supported Number of Call Center Agents for Results Ready Next Day (Assumed 8 Hour shifts)	240 Agents	240 Agents
Cost Ratios	Assuming 240 Agents	Assuming 240 Agents
Cost per Agent per Day (one 8 hour shift 100% on phone)	$0.049 /day	$0.263 /day
Cost per Agent per Month (5 day work weeks)	$0.97 /month	$5.27 /month

Speed-Optimized Deployments

For speed-optimized deployments where receiving results quickly is paramount and cost is optimized accordingly, the most effective approach is to parallelize the processing of heavy workloads. This allows the Call-Emma.ai system to efficiently manage the workflow of jobs.

In this configuration, speech-to-text (S2T) processing is handled via API by a service such as Groq.com, utilizing the Whisper Large v3 Turbo model. With a speed factor of 216x, it can transcribe one hour of audio in approximately 17 seconds at a cost of $0.04 per hour, based on pricing as of May 22, 2025. Following transcription, AI inference is also performed through Groq.com, using models such as LLaMA 3.1 8B or LLaMA 3.3 70B. The expected processing time after a call is completed is typically a few minutes. For example, results for a 20-minute call may be available in less than five minutes.

This configuration is estimated to cost between $0.04 and $0.08 per hour of audio processed with the 8B Llama 3.1 model and $0.05 and $0.1 per hour of audio processed with the 70B Llama 3.3 model, translating to approximately $6 to $16 per agent per month, assuming a standard workload of 40 hours per week over four weeks.

Previous Question & Answer Next Brochure