LLM Connection Management
Overview
LLM Connections allow you to configure and manage multiple AI model endpoints from various providers. Each connection defines how AI Guard communicates with different Large Language Models.
Supported Providers
AI Guard Developer Portal supports connections to:
Major Providers
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5
- Azure OpenAI: Azure-hosted OpenAI models
- GitHub Models: GitHub's AI model marketplace
- Anthropic: Claude models
- Google: Gemini models
- Custom Endpoints: Any OpenAI-compatible API
Creating LLM Connections
OpenAI Configuration
- Navigate to Settings > LLM Connections
- Click "Add New Connection"
- Fill in details:
LLM Name: OpenAI GPT-4 Turbo
Provider: OpenAI
Endpoint URL: https://api.openai.com/v1/chat/completions
API Key: sk-proj-xxxxxxxxxxxxx
Model Name: gpt-4-turbo
Set as Default: ☐
Status: Active
Where to Find:
- API Key: https://platform.openai.com/api-keys
- Model Names: https://platform.openai.com/docs/models
Available Models:
gpt-4-turbo- Latest GPT-4 (128K context)gpt-4- Standard GPT-4 (8K context)gpt-3.5-turbo- Fast, cost-effective
- Click "Test Connection"
- Click "Save"
Azure OpenAI Configuration
LLM Name: Azure GPT-4
Provider: Azure OpenAI
Endpoint URL: https://YOUR-RESOURCE.openai.azure.com/
API Key: your-azure-api-key
Model Name: YOUR-DEPLOYMENT-NAME
Set as Default: ☐
Status: Active
Important Azure Notes:
- Use your deployment name, not model name
- Endpoint must include your resource name
- Get API key from Azure Portal > Your Resource > Keys
Example:
Resource: mycompany-openai
Deployment: gpt-4-deployment
Endpoint: https://mycompany-openai.openai.azure.com/
Model Name: gpt-4-deployment
GitHub Models Configuration
LLM Name: GitHub GPT-4o
Provider: GitHub Models
Endpoint URL: https://models.inference.ai.azure.com/chat/completions
API Key: your-github-token
Model Name: gpt-4o
Set as Default: ☐
Status: Active
Getting GitHub Token:
- Go to GitHub Settings > Developer settings
- Personal access tokens > Tokens (classic)
- Generate new token
- Select scopes (minimal needed)
- Copy token
Available Models via GitHub:
gpt-4o- Latest GPT-4gpt-4o-mini- Lightweight versionclaude-3.5-sonnet- Anthropic Claudellama-3.1-70b- Meta Llama
Anthropic Claude Configuration
LLM Name: Claude 3.5 Sonnet
Provider: Anthropic
Endpoint URL: https://api.anthropic.com/v1/messages
API Key: sk-ant-xxxxxxxxxxxxx
Model Name: claude-3-5-sonnet-20241022
Set as Default: ☐
Status: Active
Get API Key:
- Console: https://console.anthropic.com/
- Account Settings > API Keys
Models:
claude-3-5-sonnet-20241022- Latest, most capableclaude-3-opus-20240229- Most intelligentclaude-3-sonnet-20240229- Balancedclaude-3-haiku-20240307- Fastest
Google Gemini Configuration
LLM Name: Gemini Pro
Provider: Google
Endpoint URL: https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
API Key: your-google-api-key
Model Name: gemini-pro
Set as Default: ☐
Status: Active
Get API Key:
- Google AI Studio: https://makersuite.google.com/app/apikey
Models:
gemini-pro- Text generationgemini-pro-vision- Multimodal (text + images)
Custom/Self-Hosted Models
LLM Name: Self-Hosted Llama
Provider: Custom
Endpoint URL: https://your-server.com/v1/chat/completions
API Key: your-custom-api-key (if required)
Model Name: llama-3.1-70b
Set as Default: ☐
Status: Active
Requirements:
- Must support OpenAI-compatible chat completion API
- POST endpoint accepting messages array
- Returns standard completion format
Compatible Frameworks:
- vLLM
- Ollama (with OpenAI compatibility)
- LiteLLM
- LocalAI
- Text Generation WebUI (OpenAI mode)
Managing Connections
Viewing Connections
LLM Connections List Shows:
- Connection name
- Provider type
- Model name
- Status (Active/Inactive)
- Default indicator
- Last tested
- Associated API keys count
Testing Connections
Why Test?
- Verify endpoint is reachable
- Confirm API key is valid
- Check model availability
- Validate configuration
How to Test:
- Edit LLM connection
- Click "Test Connection" button
- System sends sample request
- View results:
- ✓ Success: Connection works
- ✗ Error: See error message
Common Test Errors:
- Invalid API key
- Incorrect endpoint URL
- Model not found
- Network/firewall issues
- Rate limit exceeded
Editing Connections
- Click "Edit" button
- Modify any field
- Click "Test Connection"
- Click "Save"
Can Modify:
- ✓ Name/description
- ✓ Endpoint URL
- ✓ API key
- ✓ Model name
- ✓ Default status
- ✓ Active status
Cannot Modify:
- ✗ Provider type (create new instead)
- ✗ User ID
- ✗ Creation date
Setting Default Connection
Default Connection:
- Used when API key doesn't specify LLM
- Fallback for compatibility
- One per user
To Set:
- Edit connection
- Check "Set as Default"
- Save
- Previous default automatically unset
Deactivating Connections
Temporarily Disable:
- Edit connection
- Set Status to "Inactive"
- Save
Effects:
- API keys using this connection will fail
- Can reactivate anytime
- Configuration preserved
Use Cases:
- Model temporarily unavailable
- Cost control
- Testing alternatives
- Maintenance window
Deleting Connections
⚠️ Warning: Check Dependencies First
- View connection details
- Check "API Keys Using This Connection"
- If any exist, update them first
- Click "Delete"
- Confirm deletion
Effects:
- Connection removed permanently
- API keys using it will fail
- Cannot be undone
Advanced Configuration
Model Selection Guide
Choose Based On:
Task Complexity:
- Simple (FAQ, classification): GPT-3.5, Claude Haiku
- Medium (writing, analysis): GPT-4 Turbo, Claude Sonnet
- Complex (reasoning, coding): GPT-4, Claude Opus
Cost Optimization:
- Cheapest: GPT-3.5-turbo ($0.0005/1K tokens)
- Balanced: GPT-4 Turbo ($0.01/1K tokens)
- Premium: GPT-4 ($0.03/1K tokens)
Speed Requirements:
- Fastest: GPT-3.5, Claude Haiku
- Medium: GPT-4 Turbo, Claude Sonnet
- Slower: GPT-4, Claude Opus
Context Length:
- 4K tokens: GPT-3.5
- 8K tokens: GPT-4
- 128K tokens: GPT-4 Turbo, Claude models
- 200K tokens: Claude 3 Opus
Multi-Model Strategy
Best Practice: Different Models for Different Tasks
Example Setup:
Connection 1: GPT-3.5 Turbo
- Use for: Simple Q&A, classification
- API Keys: public-chatbot-key
- Cost: Low
Connection 2: GPT-4 Turbo
- Use for: Complex analysis, code generation
- API Keys: code-assistant-key, internal-tool-key
- Cost: Medium
Connection 3: Claude 3.5 Sonnet
- Use for: Long documents, creative writing
- API Keys: document-qa-key, writing-assistant-key
- Cost: Medium
Connection 4: Azure GPT-4
- Use for: Enterprise compliance requirements
- API Keys: hipaa-compliant-key
- Cost: Medium (with Azure credits)
Load Balancing
Multiple Connections to Same Model:
OpenAI GPT-4 - Key 1 (primary)
OpenAI GPT-4 - Key 2 (backup)
Azure GPT-4 (failover)
Benefits:
- Distribute rate limits
- Redundancy
- Avoid quota exhaustion
- Higher availability
Implementation:
- Create API keys for each connection
- Application logic selects key
- Automatic failover on error
Cost Management
Track Costs Per Connection:
- Settings > LLM Connections > [Connection]
- View "Cost Analytics" tab
- See:
- Total spend
- Tokens consumed
- Cost per API key
- Daily/weekly/monthly trends
Set Budgets:
Connection: GPT-4 Turbo
Monthly Budget: $500
Alert at: 80% ($400)
Auto-disable at: 100% ($500)
Cost Optimization Tips:
- Use GPT-3.5 for simple tasks
- Set max_tokens limits
- Implement caching
- Use streaming for UX (same cost)
- Monitor and optimize prompts
Provider-Specific Features
OpenAI Features
Function Calling:
{
"functions": [
{
"name": "get_weather",
"description": "Get current weather",
"parameters": {...}
}
]
}
Vision (GPT-4 Vision):
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}
]
}
Azure OpenAI Benefits
- Enterprise Support: SLA guarantees
- Data Residency: Choose region
- Private Networking: VNet integration
- Compliance: SOC 2, HIPAA, etc.
- Cost Control: Azure credits, budgets
Anthropic Claude Features
Extended Context:
- 200K token context (Claude Opus)
- Better for long documents
- Reduced hallucination
System Messages:
{
"system": "You are a helpful assistant...",
"messages": [...]
}
Security Best Practices
API Key Security
✓ DO:
- Store LLM API keys in environment variables
- Use secrets management (AWS Secrets Manager)
- Rotate keys every 90 days
- Separate keys per environment
- Monitor for unusual usage
✗ DON'T:
- Commit to version control
- Share between team members
- Use same key everywhere
- Store in plain text
Network Security
Recommended:
- Use HTTPS endpoints only
- Implement IP whitelisting (if provider supports)
- Monitor for unauthorized access
- Set up VPN/private networking (Azure)
Access Control
Within AI Guard:
- Only create connections you need
- Don't share user accounts
- Deactivate unused connections
- Regular audit of connections
Troubleshooting
Connection Test Failures
Error: "Connection timeout"
- Check endpoint URL is correct
- Verify network/firewall settings
- Test from different network
- Check provider status page
Error: "Invalid API key"
- Verify key is active
- Check for typos/spaces
- Regenerate key if needed
- Confirm key has correct permissions
Error: "Model not found"
- Verify model name spelling
- Check model availability in region
- For Azure: use deployment name, not model name
- Review provider documentation
Error: "Rate limit exceeded"
- Slow down test requests
- Check provider quota
- Upgrade provider plan if needed
- Try again in a few minutes
API Request Failures
Requests failing intermittently:
- Check provider status/uptime
- Review rate limits
- Monitor error patterns
- Implement retry logic
High latency:
- Choose closer geographic region
- Use faster model variant
- Reduce max_tokens
- Consider caching
Unexpected costs:
- Review token usage logs
- Check for runaway requests
- Set token limits per request
- Implement usage alerts
Migration Between Providers
Switching Providers
Steps:
- Create new LLM connection (new provider)
- Test thoroughly
- Update API keys to use new connection
- Monitor for issues
- Deactivate old connection
- Delete after validation period
Considerations:
- Different prompt formats
- Model behavior differences
- Cost changes
- Feature compatibility
- Rate limit differences
Model Upgrades
When OpenAI releases GPT-5:
- Create new connection: "GPT-5"
- Test with subset of traffic
- Compare quality/cost/speed
- Gradually migrate API keys
- Keep old connection as fallback
Best Practices Summary
Connection Management
- ✓ Use descriptive names
- ✓ Test after creation/modification
- ✓ Set one default connection
- ✓ Document each connection's purpose
- ✓ Regular testing (weekly)
Cost Optimization
- ✓ Use appropriate model for task
- ✓ Set monthly budgets
- ✓ Monitor spending daily
- ✓ Implement caching
- ✓ Optimize prompt length
Reliability
- ✓ Have backup connections
- ✓ Monitor provider status
- ✓ Implement retry logic
- ✓ Set appropriate timeouts
- ✓ Log all failures
Security
- ✓ Secure API key storage
- ✓ Regular key rotation
- ✓ Monitor unusual activity
- ✓ Use HTTPS endpoints
- ✓ Separate prod/dev keys
Next Steps
- Configure API Keys with your LLM connections
- Learn about Guardrails for each connection
- Set up Cost Monitoring and alerts
- Explore Advanced Features per provider
- Read Integration Best Practices