Knowledge Base » Configuration » LLM Connection Management

LLM Connection Management

LLM Connection Management

Overview

LLM Connections allow you to configure and manage multiple AI model endpoints from various providers. Each connection defines how AI Guard communicates with different Large Language Models.

Supported Providers

AI Guard Developer Portal supports connections to:

Major Providers

  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5
  • Azure OpenAI: Azure-hosted OpenAI models
  • GitHub Models: GitHub's AI model marketplace
  • Anthropic: Claude models
  • Google: Gemini models
  • Custom Endpoints: Any OpenAI-compatible API

Creating LLM Connections

OpenAI Configuration

  1. Navigate to Settings > LLM Connections
  2. Click "Add New Connection"
  3. Fill in details:
LLM Name: OpenAI GPT-4 Turbo
Provider: OpenAI
Endpoint URL: https://api.openai.com/v1/chat/completions
API Key: sk-proj-xxxxxxxxxxxxx
Model Name: gpt-4-turbo
Set as Default: ☐
Status: Active

Where to Find:

  • API Key: https://platform.openai.com/api-keys
  • Model Names: https://platform.openai.com/docs/models

Available Models:

  • gpt-4-turbo - Latest GPT-4 (128K context)
  • gpt-4 - Standard GPT-4 (8K context)
  • gpt-3.5-turbo - Fast, cost-effective
  1. Click "Test Connection"
  2. Click "Save"

Azure OpenAI Configuration

LLM Name: Azure GPT-4
Provider: Azure OpenAI  
Endpoint URL: https://YOUR-RESOURCE.openai.azure.com/
API Key: your-azure-api-key
Model Name: YOUR-DEPLOYMENT-NAME
Set as Default: ☐
Status: Active

Important Azure Notes:

  • Use your deployment name, not model name
  • Endpoint must include your resource name
  • Get API key from Azure Portal > Your Resource > Keys

Example:

Resource: mycompany-openai
Deployment: gpt-4-deployment
Endpoint: https://mycompany-openai.openai.azure.com/
Model Name: gpt-4-deployment

GitHub Models Configuration

LLM Name: GitHub GPT-4o
Provider: GitHub Models
Endpoint URL: https://models.inference.ai.azure.com/chat/completions
API Key: your-github-token
Model Name: gpt-4o
Set as Default: ☐  
Status: Active

Getting GitHub Token:

  1. Go to GitHub Settings > Developer settings
  2. Personal access tokens > Tokens (classic)
  3. Generate new token
  4. Select scopes (minimal needed)
  5. Copy token

Available Models via GitHub:

  • gpt-4o - Latest GPT-4
  • gpt-4o-mini - Lightweight version
  • claude-3.5-sonnet - Anthropic Claude
  • llama-3.1-70b - Meta Llama

Anthropic Claude Configuration

LLM Name: Claude 3.5 Sonnet
Provider: Anthropic
Endpoint URL: https://api.anthropic.com/v1/messages
API Key: sk-ant-xxxxxxxxxxxxx
Model Name: claude-3-5-sonnet-20241022
Set as Default: ☐
Status: Active

Get API Key:

  • Console: https://console.anthropic.com/
  • Account Settings > API Keys

Models:

  • claude-3-5-sonnet-20241022 - Latest, most capable
  • claude-3-opus-20240229 - Most intelligent
  • claude-3-sonnet-20240229 - Balanced
  • claude-3-haiku-20240307 - Fastest

Google Gemini Configuration

LLM Name: Gemini Pro
Provider: Google
Endpoint URL: https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
API Key: your-google-api-key
Model Name: gemini-pro
Set as Default: ☐
Status: Active

Get API Key:

  • Google AI Studio: https://makersuite.google.com/app/apikey

Models:

  • gemini-pro - Text generation
  • gemini-pro-vision - Multimodal (text + images)

Custom/Self-Hosted Models

LLM Name: Self-Hosted Llama
Provider: Custom
Endpoint URL: https://your-server.com/v1/chat/completions
API Key: your-custom-api-key (if required)
Model Name: llama-3.1-70b
Set as Default: ☐
Status: Active

Requirements:

  • Must support OpenAI-compatible chat completion API
  • POST endpoint accepting messages array
  • Returns standard completion format

Compatible Frameworks:

  • vLLM
  • Ollama (with OpenAI compatibility)
  • LiteLLM
  • LocalAI
  • Text Generation WebUI (OpenAI mode)

Managing Connections

Viewing Connections

LLM Connections List Shows:

  • Connection name
  • Provider type
  • Model name
  • Status (Active/Inactive)
  • Default indicator
  • Last tested
  • Associated API keys count

Testing Connections

Why Test?

  • Verify endpoint is reachable
  • Confirm API key is valid
  • Check model availability
  • Validate configuration

How to Test:

  1. Edit LLM connection
  2. Click "Test Connection" button
  3. System sends sample request
  4. View results:
    • ✓ Success: Connection works
    • ✗ Error: See error message

Common Test Errors:

  • Invalid API key
  • Incorrect endpoint URL
  • Model not found
  • Network/firewall issues
  • Rate limit exceeded

Editing Connections

  1. Click "Edit" button
  2. Modify any field
  3. Click "Test Connection"
  4. Click "Save"

Can Modify:

  • ✓ Name/description
  • ✓ Endpoint URL
  • ✓ API key
  • ✓ Model name
  • ✓ Default status
  • ✓ Active status

Cannot Modify:

  • ✗ Provider type (create new instead)
  • ✗ User ID
  • ✗ Creation date

Setting Default Connection

Default Connection:

  • Used when API key doesn't specify LLM
  • Fallback for compatibility
  • One per user

To Set:

  1. Edit connection
  2. Check "Set as Default"
  3. Save
  4. Previous default automatically unset

Deactivating Connections

Temporarily Disable:

  1. Edit connection
  2. Set Status to "Inactive"
  3. Save

Effects:

  • API keys using this connection will fail
  • Can reactivate anytime
  • Configuration preserved

Use Cases:

  • Model temporarily unavailable
  • Cost control
  • Testing alternatives
  • Maintenance window

Deleting Connections

⚠️ Warning: Check Dependencies First

  1. View connection details
  2. Check "API Keys Using This Connection"
  3. If any exist, update them first
  4. Click "Delete"
  5. Confirm deletion

Effects:

  • Connection removed permanently
  • API keys using it will fail
  • Cannot be undone

Advanced Configuration

Model Selection Guide

Choose Based On:

Task Complexity:

  • Simple (FAQ, classification): GPT-3.5, Claude Haiku
  • Medium (writing, analysis): GPT-4 Turbo, Claude Sonnet
  • Complex (reasoning, coding): GPT-4, Claude Opus

Cost Optimization:

  • Cheapest: GPT-3.5-turbo ($0.0005/1K tokens)
  • Balanced: GPT-4 Turbo ($0.01/1K tokens)
  • Premium: GPT-4 ($0.03/1K tokens)

Speed Requirements:

  • Fastest: GPT-3.5, Claude Haiku
  • Medium: GPT-4 Turbo, Claude Sonnet
  • Slower: GPT-4, Claude Opus

Context Length:

  • 4K tokens: GPT-3.5
  • 8K tokens: GPT-4
  • 128K tokens: GPT-4 Turbo, Claude models
  • 200K tokens: Claude 3 Opus

Multi-Model Strategy

Best Practice: Different Models for Different Tasks

Example Setup:

Connection 1: GPT-3.5 Turbo
- Use for: Simple Q&A, classification
- API Keys: public-chatbot-key
- Cost: Low

Connection 2: GPT-4 Turbo  
- Use for: Complex analysis, code generation
- API Keys: code-assistant-key, internal-tool-key
- Cost: Medium

Connection 3: Claude 3.5 Sonnet
- Use for: Long documents, creative writing
- API Keys: document-qa-key, writing-assistant-key
- Cost: Medium

Connection 4: Azure GPT-4
- Use for: Enterprise compliance requirements
- API Keys: hipaa-compliant-key
- Cost: Medium (with Azure credits)

Load Balancing

Multiple Connections to Same Model:

OpenAI GPT-4 - Key 1 (primary)
OpenAI GPT-4 - Key 2 (backup)
Azure GPT-4 (failover)

Benefits:

  • Distribute rate limits
  • Redundancy
  • Avoid quota exhaustion
  • Higher availability

Implementation:

  • Create API keys for each connection
  • Application logic selects key
  • Automatic failover on error

Cost Management

Track Costs Per Connection:

  1. Settings > LLM Connections > [Connection]
  2. View "Cost Analytics" tab
  3. See:
    • Total spend
    • Tokens consumed
    • Cost per API key
    • Daily/weekly/monthly trends

Set Budgets:

Connection: GPT-4 Turbo
Monthly Budget: $500
Alert at: 80% ($400)
Auto-disable at: 100% ($500)

Cost Optimization Tips:

  • Use GPT-3.5 for simple tasks
  • Set max_tokens limits
  • Implement caching
  • Use streaming for UX (same cost)
  • Monitor and optimize prompts

Provider-Specific Features

OpenAI Features

Function Calling:

{
  "functions": [
    {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {...}
    }
  ]
}

Vision (GPT-4 Vision):

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }
  ]
}

Azure OpenAI Benefits

  • Enterprise Support: SLA guarantees
  • Data Residency: Choose region
  • Private Networking: VNet integration
  • Compliance: SOC 2, HIPAA, etc.
  • Cost Control: Azure credits, budgets

Anthropic Claude Features

Extended Context:

  • 200K token context (Claude Opus)
  • Better for long documents
  • Reduced hallucination

System Messages:

{
  "system": "You are a helpful assistant...",
  "messages": [...]
}

Security Best Practices

API Key Security

✓ DO:

  • Store LLM API keys in environment variables
  • Use secrets management (AWS Secrets Manager)
  • Rotate keys every 90 days
  • Separate keys per environment
  • Monitor for unusual usage

✗ DON'T:

  • Commit to version control
  • Share between team members
  • Use same key everywhere
  • Store in plain text

Network Security

Recommended:

  • Use HTTPS endpoints only
  • Implement IP whitelisting (if provider supports)
  • Monitor for unauthorized access
  • Set up VPN/private networking (Azure)

Access Control

Within AI Guard:

  • Only create connections you need
  • Don't share user accounts
  • Deactivate unused connections
  • Regular audit of connections

Troubleshooting

Connection Test Failures

Error: "Connection timeout"

  • Check endpoint URL is correct
  • Verify network/firewall settings
  • Test from different network
  • Check provider status page

Error: "Invalid API key"

  • Verify key is active
  • Check for typos/spaces
  • Regenerate key if needed
  • Confirm key has correct permissions

Error: "Model not found"

  • Verify model name spelling
  • Check model availability in region
  • For Azure: use deployment name, not model name
  • Review provider documentation

Error: "Rate limit exceeded"

  • Slow down test requests
  • Check provider quota
  • Upgrade provider plan if needed
  • Try again in a few minutes

API Request Failures

Requests failing intermittently:

  • Check provider status/uptime
  • Review rate limits
  • Monitor error patterns
  • Implement retry logic

High latency:

  • Choose closer geographic region
  • Use faster model variant
  • Reduce max_tokens
  • Consider caching

Unexpected costs:

  • Review token usage logs
  • Check for runaway requests
  • Set token limits per request
  • Implement usage alerts

Migration Between Providers

Switching Providers

Steps:

  1. Create new LLM connection (new provider)
  2. Test thoroughly
  3. Update API keys to use new connection
  4. Monitor for issues
  5. Deactivate old connection
  6. Delete after validation period

Considerations:

  • Different prompt formats
  • Model behavior differences
  • Cost changes
  • Feature compatibility
  • Rate limit differences

Model Upgrades

When OpenAI releases GPT-5:

  1. Create new connection: "GPT-5"
  2. Test with subset of traffic
  3. Compare quality/cost/speed
  4. Gradually migrate API keys
  5. Keep old connection as fallback

Best Practices Summary

Connection Management

  • ✓ Use descriptive names
  • ✓ Test after creation/modification
  • ✓ Set one default connection
  • ✓ Document each connection's purpose
  • ✓ Regular testing (weekly)

Cost Optimization

  • ✓ Use appropriate model for task
  • ✓ Set monthly budgets
  • ✓ Monitor spending daily
  • ✓ Implement caching
  • ✓ Optimize prompt length

Reliability

  • ✓ Have backup connections
  • ✓ Monitor provider status
  • ✓ Implement retry logic
  • ✓ Set appropriate timeouts
  • ✓ Log all failures

Security

  • ✓ Secure API key storage
  • ✓ Regular key rotation
  • ✓ Monitor unusual activity
  • ✓ Use HTTPS endpoints
  • ✓ Separate prod/dev keys

Next Steps

  • Configure API Keys with your LLM connections
  • Learn about Guardrails for each connection
  • Set up Cost Monitoring and alerts
  • Explore Advanced Features per provider
  • Read Integration Best Practices