Performance
TagCache is designed for maximum performance with multi-shard architecture, optimized data structures, and efficient protocols. This guide covers benchmarking, optimization techniques, and performance tuning.
Performance Overview
TagCache delivers exceptional performance through:
- Multi-shard Design: DashMap with hash-based sharding for lock-free operations
- Memory-efficient Storage: Optimized data structures with minimal overhead
- Dual Protocols: TCP for ultra-low latency, HTTP for web applications
- Atomic Operations: Lock-free counters and conditional operations
- Tag Management: Efficient tag indexing with minimal memory overhead
Benchmark Results
Hardware Specifications
Tests performed on:
- CPU: Intel Core i7-12700K (12 cores, 20 threads)
- RAM: 32GB DDR4-3200
- Storage: NVMe SSD
- Network: Localhost (no network latency)
- OS: Ubuntu 22.04 LTS
TCP Protocol Performance
Pure Throughput Tests
| Operation | Ops/Second | Notes |
|---|---|---|
| SET | 1,200,000 | Small values (100 bytes) |
| GET | 1,500,000 | Cache hit rate: 100% |
| DELETE | 800,000 | Single key operations |
| INCREMENT | 1,100,000 | Atomic counters |
| GET (batch) | 2,000,000 | 10 keys per request |
Configuration: 256 shards, 16 client connections, pipelined requests
Latency Distribution (microseconds)
| Operation | P50 | P95 | P99 | P99.9 |
|---|---|---|---|---|
| SET | 45 | 120 | 280 | 850 |
| GET | 35 | 95 | 220 | 650 |
| DELETE | 40 | 110 | 250 | 750 |
| INCREMENT | 42 | 115 | 270 | 800 |
Configuration: Single-threaded client, synchronous requests
Mixed Workload (80% GET, 20% SET)
| Metric | Value |
|---|---|
| Total Ops/Second | 1,350,000 |
| Average Latency | 38μs |
| Cache Hit Rate | 95% |
| Memory Usage | 2.1GB |
| CPU Usage | 45% |
Test Duration: 10 minutes, 32 concurrent clients
HTTP API Performance
HTTP Throughput
| Operation | Ops/Second | Notes |
|---|---|---|
| POST /api/set | 450,000 | JSON payloads |
| GET /api/get/{key} | 520,000 | Direct key access |
| POST /api/increment | 400,000 | Atomic operations |
| GET /api/get/tag/{tag} | 180,000 | Tag-based retrieval |
Configuration: HTTP/1.1 with keep-alive, 64 concurrent connections
HTTP Latency (milliseconds)
| Operation | P50 | P95 | P99 | P99.9 |
|---|---|---|---|---|
| POST /api/set | 0.8 | 2.1 | 4.5 | 12.0 |
| GET /api/get/{key} | 0.6 | 1.8 | 3.8 | 10.5 |
| POST /api/increment | 0.9 | 2.3 | 4.8 | 13.2 |
| GET /api/stats | 1.2 | 3.0 | 6.2 | 15.8 |
Real-world Web Application
Simulating typical web app cache patterns:
| Metric | Value |
|---|---|
| Requests/Second | 285,000 |
| Session Lookups | 180,000/sec |
| Cache Updates | 45,000/sec |
| Tag Invalidations | 8,500/sec |
| Average Response | 1.2ms |
| 99th Percentile | 5.8ms |
Pattern: 65% GET, 25% SET, 10% tag operations
Benchmarking Tools
TagCache includes built-in benchmarking tools for performance testing.
TCP Benchmark Tool
| |
HTTP Benchmark
| |
Load Testing Script
| |
Performance Tuning
Server Configuration
Optimal Shard Count
| |
Guidelines:
- More shards = better concurrency, slightly higher memory overhead
- Fewer shards = less memory overhead, potential contention
- Sweet spot: 32-64 shards for most workloads
Memory Management
| |
Connection Pooling
| |
Client Optimization
Connection Reuse
| |
Batch Operations
| |
TCP for High Performance
| |
Memory Optimization
Efficient Data Structures
| |
Tag Management
| |
Monitoring and Profiling
Real-time Performance Monitoring
| |
Performance Profiling
| |
Performance Best Practices
1. Choose the Right Protocol
- TCP Protocol: Ultra-low latency, high throughput applications
- HTTP API: Web applications, easier integration, better debugging
2. Optimize Key Design
| |
3. Use Tags Strategically
| |
4. Set Appropriate TTLs
| |
5. Monitor and Alert
| |
Troubleshooting Performance Issues
Common Performance Problems
| Symptom | Likely Cause | Solution |
|---|---|---|
| High latency | Too few shards | Increase num_shards |
| Memory issues | No TTL set | Set appropriate TTLs |
| Low throughput | Single-threaded client | Use connection pooling |
| Cache misses | Keys expiring too fast | Increase TTL values |
| High CPU usage | Too many small operations | Batch operations when possible |
Diagnostic Commands
| |