LLM Connection Management

Overview

LLM Connections allow you to configure and manage multiple AI model endpoints from various providers. Each connection defines how AI Guard communicates with different Large Language Models.

Supported Providers

AI Guard Developer Portal supports connections to:

Major Providers

OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5
Azure OpenAI: Azure-hosted OpenAI models
GitHub Models: GitHub's AI model marketplace
Anthropic: Claude models
Google: Gemini models
Custom Endpoints: Any OpenAI-compatible API

Creating LLM Connections

OpenAI Configuration

Navigate to Settings > LLM Connections
Click "Add New Connection"
Fill in details:

LLM Name: OpenAI GPT-4 Turbo
Provider: OpenAI
Endpoint URL: https://api.openai.com/v1/chat/completions
API Key: sk-proj-xxxxxxxxxxxxx
Model Name: gpt-4-turbo
Set as Default: ☐
Status: Active

Where to Find:

API Key: https://platform.openai.com/api-keys
Model Names: https://platform.openai.com/docs/models

Available Models:

gpt-4-turbo - Latest GPT-4 (128K context)
gpt-4 - Standard GPT-4 (8K context)
gpt-3.5-turbo - Fast, cost-effective

Click "Test Connection"
Click "Save"

Azure OpenAI Configuration

LLM Name: Azure GPT-4
Provider: Azure OpenAI  
Endpoint URL: https://YOUR-RESOURCE.openai.azure.com/
API Key: your-azure-api-key
Model Name: YOUR-DEPLOYMENT-NAME
Set as Default: ☐
Status: Active

Important Azure Notes:

Use your deployment name, not model name
Endpoint must include your resource name
Get API key from Azure Portal > Your Resource > Keys

Example:

Resource: mycompany-openai
Deployment: gpt-4-deployment
Endpoint: https://mycompany-openai.openai.azure.com/
Model Name: gpt-4-deployment

GitHub Models Configuration

LLM Name: GitHub GPT-4o
Provider: GitHub Models
Endpoint URL: https://models.inference.ai.azure.com/chat/completions
API Key: your-github-token
Model Name: gpt-4o
Set as Default: ☐  
Status: Active

Getting GitHub Token:

Go to GitHub Settings > Developer settings
Personal access tokens > Tokens (classic)
Generate new token
Select scopes (minimal needed)
Copy token

Available Models via GitHub:

gpt-4o - Latest GPT-4
gpt-4o-mini - Lightweight version
claude-3.5-sonnet - Anthropic Claude
llama-3.1-70b - Meta Llama

Anthropic Claude Configuration

LLM Name: Claude 3.5 Sonnet
Provider: Anthropic
Endpoint URL: https://api.anthropic.com/v1/messages
API Key: sk-ant-xxxxxxxxxxxxx
Model Name: claude-3-5-sonnet-20241022
Set as Default: ☐
Status: Active

Get API Key:

Console: https://console.anthropic.com/
Account Settings > API Keys

Models:

claude-3-5-sonnet-20241022 - Latest, most capable
claude-3-opus-20240229 - Most intelligent
claude-3-sonnet-20240229 - Balanced
claude-3-haiku-20240307 - Fastest

Google Gemini Configuration

LLM Name: Gemini Pro
Provider: Google
Endpoint URL: https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
API Key: your-google-api-key
Model Name: gemini-pro
Set as Default: ☐
Status: Active

Get API Key:

Google AI Studio: https://makersuite.google.com/app/apikey

Models:

gemini-pro - Text generation
gemini-pro-vision - Multimodal (text + images)

Custom/Self-Hosted Models

LLM Name: Self-Hosted Llama
Provider: Custom
Endpoint URL: https://your-server.com/v1/chat/completions
API Key: your-custom-api-key (if required)
Model Name: llama-3.1-70b
Set as Default: ☐
Status: Active

Requirements:

Must support OpenAI-compatible chat completion API
POST endpoint accepting messages array
Returns standard completion format

Compatible Frameworks:

vLLM
Ollama (with OpenAI compatibility)
LiteLLM
LocalAI
Text Generation WebUI (OpenAI mode)

Managing Connections

Viewing Connections

LLM Connections List Shows:

Connection name
Provider type
Model name
Status (Active/Inactive)
Default indicator
Last tested
Associated API keys count

Testing Connections

Why Test?

Verify endpoint is reachable
Confirm API key is valid
Check model availability
Validate configuration

How to Test:

Edit LLM connection
Click "Test Connection" button
System sends sample request
View results:
- ✓ Success: Connection works
- ✗ Error: See error message

Common Test Errors:

Invalid API key
Incorrect endpoint URL
Model not found
Network/firewall issues
Rate limit exceeded

Editing Connections

Click "Edit" button
Modify any field
Click "Test Connection"
Click "Save"

Can Modify:

✓ Name/description
✓ Endpoint URL
✓ API key
✓ Model name
✓ Default status
✓ Active status

Cannot Modify:

✗ Provider type (create new instead)
✗ User ID
✗ Creation date

Setting Default Connection

Default Connection:

Used when API key doesn't specify LLM
Fallback for compatibility
One per user

To Set:

Edit connection
Check "Set as Default"
Save
Previous default automatically unset

Deactivating Connections

Temporarily Disable:

Edit connection
Set Status to "Inactive"
Save

Effects:

API keys using this connection will fail
Can reactivate anytime
Configuration preserved

Use Cases:

Model temporarily unavailable
Cost control
Testing alternatives
Maintenance window

Deleting Connections

⚠️ Warning: Check Dependencies First

View connection details
Check "API Keys Using This Connection"
If any exist, update them first
Click "Delete"
Confirm deletion

Effects:

Connection removed permanently
API keys using it will fail
Cannot be undone

Advanced Configuration

Model Selection Guide

Choose Based On:

Task Complexity:

Simple (FAQ, classification): GPT-3.5, Claude Haiku
Medium (writing, analysis): GPT-4 Turbo, Claude Sonnet
Complex (reasoning, coding): GPT-4, Claude Opus

Cost Optimization:

Cheapest: GPT-3.5-turbo ($0.0005/1K tokens)
Balanced: GPT-4 Turbo ($0.01/1K tokens)
Premium: GPT-4 ($0.03/1K tokens)

Speed Requirements:

Fastest: GPT-3.5, Claude Haiku
Medium: GPT-4 Turbo, Claude Sonnet
Slower: GPT-4, Claude Opus

Context Length:

4K tokens: GPT-3.5
8K tokens: GPT-4
128K tokens: GPT-4 Turbo, Claude models
200K tokens: Claude 3 Opus

Multi-Model Strategy

Best Practice: Different Models for Different Tasks

Example Setup:

Connection 1: GPT-3.5 Turbo
- Use for: Simple Q&A, classification
- API Keys: public-chatbot-key
- Cost: Low

Connection 2: GPT-4 Turbo  
- Use for: Complex analysis, code generation
- API Keys: code-assistant-key, internal-tool-key
- Cost: Medium

Connection 3: Claude 3.5 Sonnet
- Use for: Long documents, creative writing
- API Keys: document-qa-key, writing-assistant-key
- Cost: Medium

Connection 4: Azure GPT-4
- Use for: Enterprise compliance requirements
- API Keys: hipaa-compliant-key
- Cost: Medium (with Azure credits)

Load Balancing

Multiple Connections to Same Model:

OpenAI GPT-4 - Key 1 (primary)
OpenAI GPT-4 - Key 2 (backup)
Azure GPT-4 (failover)

Benefits:

Distribute rate limits
Redundancy
Avoid quota exhaustion
Higher availability

Implementation:

Create API keys for each connection
Application logic selects key
Automatic failover on error

Cost Management

Track Costs Per Connection:

Settings > LLM Connections > [Connection]
View "Cost Analytics" tab
See:
- Total spend
- Tokens consumed
- Cost per API key
- Daily/weekly/monthly trends

Set Budgets:

Connection: GPT-4 Turbo
Monthly Budget: $500
Alert at: 80% ($400)
Auto-disable at: 100% ($500)

Cost Optimization Tips:

Use GPT-3.5 for simple tasks
Set max_tokens limits
Implement caching
Use streaming for UX (same cost)
Monitor and optimize prompts

Provider-Specific Features

OpenAI Features

Function Calling:

{
  "functions": [
    {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {...}
    }
  ]
}

Vision (GPT-4 Vision):

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }
  ]
}

Azure OpenAI Benefits

Enterprise Support: SLA guarantees
Data Residency: Choose region
Private Networking: VNet integration
Compliance: SOC 2, HIPAA, etc.
Cost Control: Azure credits, budgets

Anthropic Claude Features

Extended Context:

200K token context (Claude Opus)
Better for long documents
Reduced hallucination

System Messages:

{
  "system": "You are a helpful assistant...",
  "messages": [...]
}

Security Best Practices

API Key Security

✓ DO:

Store LLM API keys in environment variables
Use secrets management (AWS Secrets Manager)
Rotate keys every 90 days
Separate keys per environment
Monitor for unusual usage

✗ DON'T:

Commit to version control
Share between team members
Use same key everywhere
Store in plain text

Network Security

Recommended:

Use HTTPS endpoints only
Implement IP whitelisting (if provider supports)
Monitor for unauthorized access
Set up VPN/private networking (Azure)

Access Control

Within AI Guard:

Only create connections you need
Don't share user accounts
Deactivate unused connections
Regular audit of connections

Troubleshooting

Connection Test Failures

Error: "Connection timeout"

Check endpoint URL is correct
Verify network/firewall settings
Test from different network
Check provider status page

Error: "Invalid API key"

Verify key is active
Check for typos/spaces
Regenerate key if needed
Confirm key has correct permissions

Error: "Model not found"

Verify model name spelling
Check model availability in region
For Azure: use deployment name, not model name
Review provider documentation

Error: "Rate limit exceeded"

Slow down test requests
Check provider quota
Upgrade provider plan if needed
Try again in a few minutes

API Request Failures

Requests failing intermittently:

Check provider status/uptime
Review rate limits
Monitor error patterns
Implement retry logic

High latency:

Choose closer geographic region
Use faster model variant
Reduce max_tokens
Consider caching

Unexpected costs:

Review token usage logs
Check for runaway requests
Set token limits per request
Implement usage alerts

Migration Between Providers

Switching Providers

Steps:

Create new LLM connection (new provider)
Test thoroughly
Update API keys to use new connection
Monitor for issues
Deactivate old connection
Delete after validation period

Considerations:

Different prompt formats
Model behavior differences
Cost changes
Feature compatibility
Rate limit differences

Model Upgrades

When OpenAI releases GPT-5:

Create new connection: "GPT-5"
Test with subset of traffic
Compare quality/cost/speed
Gradually migrate API keys
Keep old connection as fallback

Best Practices Summary

Connection Management

✓ Use descriptive names
✓ Test after creation/modification
✓ Set one default connection
✓ Document each connection's purpose
✓ Regular testing (weekly)

Cost Optimization

✓ Use appropriate model for task
✓ Set monthly budgets
✓ Monitor spending daily
✓ Implement caching
✓ Optimize prompt length

Reliability

✓ Have backup connections
✓ Monitor provider status
✓ Implement retry logic
✓ Set appropriate timeouts
✓ Log all failures

Security

✓ Secure API key storage
✓ Regular key rotation
✓ Monitor unusual activity
✓ Use HTTPS endpoints
✓ Separate prod/dev keys

Next Steps

Configure API Keys with your LLM connections
Learn about Guardrails for each connection
Set up Cost Monitoring and alerts
Explore Advanced Features per provider
Read Integration Best Practices

LLM Connection Management

LLM Connection Management

Overview

Supported Providers

Major Providers

Creating LLM Connections

OpenAI Configuration

Azure OpenAI Configuration

GitHub Models Configuration

Anthropic Claude Configuration

Google Gemini Configuration

Custom/Self-Hosted Models

Managing Connections

Viewing Connections

Testing Connections

Editing Connections

Setting Default Connection

Deactivating Connections

Deleting Connections

Advanced Configuration

Model Selection Guide

Multi-Model Strategy

Load Balancing

Cost Management

Provider-Specific Features

OpenAI Features

Azure OpenAI Benefits

Anthropic Claude Features

Security Best Practices

API Key Security

Network Security

Access Control

Troubleshooting

Connection Test Failures

API Request Failures

Migration Between Providers

Switching Providers

Model Upgrades

Best Practices Summary

Connection Management

Cost Optimization

Reliability

Security

Next Steps

Related Articles