Commvault Backup System Automated Maintenance – Setup & Best Practices
What Is Commvault Automated Backup Maintenance?
Automated maintenance in Commvault helps administrators keep backup environments healthy without manual intervention.
By scheduling automated tasks, organizations can:
-
Clean expired backup jobs
-
Optimize database performance
-
Verify backup integrity
-
Manage storage efficiently
-
Improve overall backup reliability
Automation is especially important in large environments where hundreds of backup jobs run daily.
This article shares the development process of a Python-based inspection tool using the Commvault REST API. It supports one-click generation of HTML inspection reports with visual charts, suitable for various online and offline deployment scenarios.
As a practitioner in the data protection field, have you ever faced these frustrations:
- Having dozens or even hundreds of clients in your backup system makes manual daily status checks time-consuming and exhausting.
- Certain clients are offline for long periods without being noticed in time.
- Job failure reasons are scattered across different places, lacking a unified analysis.
- Difficulty in quantitatively assessing RPO (Recovery Point Objective) compliance.
- Needing to organize massive amounts of data just to generate an inspection report for management review.
This article will share my entire process of developing an automated inspection tool for Commvault backup systems.
Requirements Analysis
Core Inspection Metrics
| Job Health | ||
| Client status | ||
| Protection status | ||
| RPO compliance | ||
| Failure analysis |
Deployment Scenario Versatility
- - Online environments — Third-party dependencies can be installed.
- - Offline environments — Only the Python standard library is available, with no network and no pip.
- - MCP Integration — Used as an MCP tool for Claude Code.
Technical Selection
API Call Strategy
Commvault provides a RESTful API, with primary endpoints including:
Dependency Strategy
Three versions were designed for different scenarios:
health_check_html.py | ||
health_check_portable.py | ||
health_check_pro.py |
No-Dependency Implementation Tips
Using Python's standard library `urllib` instead of `requests`:
class CommvaultClient:
def get(self, endpoint, params=None):
url = f"{self.base_url}/{endpoint.lstrip('/')}"
if params:
url += "?" + urllib.parse.urlencode(params)
request = urllib.request.Request(url)
request.add_header('Accept', 'application/json')
request.add_header('Authtoken', self.access_token)
# Handle self-signed certificates
context = ssl._create_unverified_context()
with urllib.request.urlopen(request, context=context) as response:
return json.loads(response.read().decode('utf-8'))
Core Function Implementation
Client Status Detection
Problem: The Commvault API job list does not directly return the online/offline status of a client.
Solution: Use backup activity as a proxy metric.
def get_client_status(jobs, clients, lookback_days=7):
# Get clients with backup activity (considered online)
clients_with_activity = set()
for job in jobs:
client_name = job.get('jobSummary', {}).get('subclient', {}).get('clientName')
if client_name:
clients_with_activity.add(client_name)
# status
online = [c for c in clients if c in clients_with_activity]
offline = [c for c in clients if c not in clients_with_activity]
return online, offline
Note: A disclaimer is added to the report: "* Online status is determined based on backup activity within the last X days."
RPO Compliance Analysis
Failure Reason Extraction
Failure information returned by the API is located in the `pendingReason` field:
for job in jobs:
if job['status'] == 'Failed':
# pendingReason Contains detailed error messages, which may include HTML tags
error = job.get('pendingReason', 'Unknown')
error = error.replace('
', ' | ').replace('
', ' | ')
failed_jobs.append({
'job_id': job['jobId'],
'client': job['client'],
'error': error[:500] # Limit length
})
Error Pattern Recognition
By analyzing error messages, common issues can be automatically identified:
def analyze_error_patterns(failed_jobs):
offline_errors = sum(1for j in failed_jobs
if any(k in j['error'] for k in
['unreachable', 'offline', 'cannot connect']))
timeout_errors = sum(1for j in failed_jobs
if'timeout'in j['error'].lower())
storage_errors = sum(1for j in failed_jobs
if any(k in j['error'] for k in
['storage', 'media agent', 'library']))
return {
'offline': offline_errors,
'timeout': timeout_errors,
'storage': storage_errors
}
Practical Application Case
Typical Inspection Results
Below is an actual inspection result:
=== Health Check Summary ===
Success Rate: 8.7%
Total Clients: 63
Offline Clients: 57
Unprotected Clients: 58
RPO Violations: 62
Failure Reason Analysis:
- - Most failures are due to clients being continuously offline for more than 10,080 minutes (7 days).
- - Commvault automatically terminates backup jobs for clients that have been offline for an extended period.
Recommended Actions:
- - Check client power status.
- - Confirm network connectivity.
- - Clean up unnecessary zombie clients.
Summary
Related Readings:
🔗 CentOS 9 System Hardening Plan
https://anfuitblog.blogspot.com/2026/02/centos-9-system-hardening-plan.html🔗 Fix Windows 11 Installation Requirements Error
https://anfuitblog.blogspot.com/2026/02/fixing-error-where-windows-11-says-it.html🔗 AnyBackup Failed to Back Up UIS Virtual Machine
https://anfuitblog.blogspot.com/2026/03/anybackup-failed-to-back-up-uis-virtual.html