Showing posts with label Backup Automation. Show all posts
Showing posts with label Backup Automation. Show all posts

Commvault Backup System Automated Maintenance – Setup & Best Practices

Commvault Backup System Automated Maintenance – Setup & Best Practices

What Is Commvault Automated Backup Maintenance?

Automated maintenance in Commvault helps administrators keep backup environments healthy without manual intervention.

By scheduling automated tasks, organizations can:

  • Clean expired backup jobs

  • Optimize database performance

  • Verify backup integrity

  • Manage storage efficiently

  • Improve overall backup reliability

Automation is especially important in large environments where hundreds of backup jobs run daily.

This article shares the development process of a Python-based inspection tool using the Commvault REST API. It supports one-click generation of HTML inspection reports with visual charts, suitable for various online and offline deployment scenarios.


As a practitioner in the data protection field, have you ever faced these frustrations:


- Having dozens or even hundreds of clients in your backup system makes manual daily status checks time-consuming and exhausting.

- Certain clients are offline for long periods without being noticed in time.

- Job failure reasons are scattered across different places, lacking a unified analysis.

- Difficulty in quantitatively assessing RPO (Recovery Point Objective) compliance.

- Needing to organize massive amounts of data just to generate an inspection report for management review.


This article will share my entire process of developing an automated inspection tool for Commvault backup systems.


Requirements Analysis


Core Inspection Metrics

Type
Details
threshold reference
Job Health
Success rate, number of failed jobs
Industry Standard: >95%
Client status
Online/Offline count
offline rate < 10%
Protection status
Unprotected client
Should be 0 or explicitly stated
RPO compliance
Time since last backup
Set according to business requirements
Failure analysis
Specific reason for job failure
Used for root cause analysis

Deployment Scenario Versatility

  • - Online environments — Third-party dependencies can be installed.
  • - Offline environments — Only the Python standard library is available, with no network and no pip.
  • - MCP Integration — Used as an MCP tool for Claude Code.


Technical Selection

API Call Strategy

Commvault provides a RESTful API, with primary endpoints including:



Dependency Strategy

Three versions were designed for different scenarios:

Version
dependencies
Applicable scenarios
health_check_html.py
null
Offline and production environments
health_check_portable.py
requests
Development environment, networked environment
health_check_pro.py
Commvault MCP Server
Claude Code, Openclaw integration

No-Dependency Implementation Tips

Using Python's standard library `urllib` instead of `requests`:

 
class CommvaultClient:
    def get(self, endpoint, params=None):
        url = f"{self.base_url}/{endpoint.lstrip('/')}"
        if params:
            url += "?" + urllib.parse.urlencode(params)

        request = urllib.request.Request(url)
        request.add_header('Accept', 'application/json')
        request.add_header('Authtoken', self.access_token)

        # Handle self-signed certificates
        context = ssl._create_unverified_context()
        with urllib.request.urlopen(request, context=context) as response:
            return json.loads(response.read().decode('utf-8'))


Core Function Implementation

Client Status Detection

Problem: The Commvault API job list does not directly return the online/offline status of a client.

Solution: Use backup activity as a proxy metric.

 
def get_client_status(jobs, clients, lookback_days=7):
    # Get clients with backup activity (considered online)
    clients_with_activity = set()
    for job in jobs:
        client_name = job.get('jobSummary', {}).get('subclient', {}).get('clientName')
        if client_name:
            clients_with_activity.add(client_name)

    # status
    online = [c for c in clients if c in clients_with_activity]
    offline = [c for c in clients if c not in clients_with_activity]

    return online, offline


Note: A disclaimer is added to the report: "* Online status is determined based on backup activity within the last X days."


RPO Compliance Analysis


def analyze_rpo(jobs, clients, threshold_hours=24): last_backup = {} # {client_name: timestamp} # Find the last successful backup time for each client. for job in jobs: if job['status'] == 'Completed': client = job['client'] last_backup[client] = max(last_backup.get(client, 0), job['time']) # Check for violations now = datetime.now().timestamp() violations = [] for client in clients: if client notin last_backup: violations.append({'client': client, 'hours': None, 'status': 'Never'}) else: hours = (now - last_backup[client]) / 3600 if hours > threshold_hours: violations.append({'client': client, 'hours': hours}) return violations

Failure Reason Extraction


Failure information returned by the API is located in the `pendingReason` field:

 
for job in jobs:
    if job['status'] == 'Failed':
        # pendingReason Contains detailed error messages, which may include HTML tags
        error = job.get('pendingReason', 'Unknown')
        
        error = error.replace('
', ' | ').replace('
', ' | ') failed_jobs.append({ 'job_id': job['jobId'], 'client': job['client'], 'error': error[:500] # Limit length })


Error Pattern Recognition

By analyzing error messages, common issues can be automatically identified:

 
def analyze_error_patterns(failed_jobs):
    offline_errors = sum(1for j in failed_jobs
                         if any(k in j['error'] for k in
                                ['unreachable', 'offline', 'cannot connect']))
    timeout_errors = sum(1for j in failed_jobs
                         if'timeout'in j['error'].lower())
    storage_errors = sum(1for j in failed_jobs
                         if any(k in j['error'] for k in
                                ['storage', 'media agent', 'library']))

    return {
        'offline': offline_errors,
        'timeout': timeout_errors,
        'storage': storage_errors
    }


Practical Application Case


Typical Inspection Results

Below is an actual inspection result:

 
=== Health Check Summary ===
  Success Rate: 8.7%
  Total Clients: 63
  Offline Clients: 57
  Unprotected Clients: 58
  RPO Violations: 62


Failure Reason Analysis:

  • - Most failures are due to clients being continuously offline for more than 10,080 minutes (7 days).
  • - Commvault automatically terminates backup jobs for clients that have been offline for an extended period.

Recommended Actions:

  • - Check client power status.
  • - Confirm network connectivity.
  • - Clean up unnecessary zombie clients.

Summary

Through the development of this tool, we have achieved:

✅ One-click generation of visual inspection reports
✅ Support for offline environment deployment (no third-party dependencies)
✅ Intelligent error analysis and recommendations
✅ Multiple configuration methods to flexibly adapt to different scenarios
✅ Out-of-the-box Python scripts

Target Audience:

Commvault backup administrators
Data protection engineers
IT operations personnel who need to regularly report backup status

References:

Understanding Veeam Intelligence Functions – Smart Backup, Threat Detection & Automated Recovery

 Understanding Veeam Intelligence Functions – Smart Backup, Threat Detection & Automated Recovery

Introduction

Modern businesses need more than just backups—they need intelligent systems that can detect threats, reduce risks, automate protection, and accelerate recovery.
This is why Veeam Intelligence Functions have become a core part of the Veeam platform, especially with Veeam Backup & Replication v12/v13, where AI-powered features help organizations protect their data against ransomware, malware, and human error.

Veeam Intelligence, as the AI-powered assistant within the Veeam product family, is revolutionizing how we work. It’s not only built into Veeam Backup & Replication but also integrated into other Veeam products such as Veeam ONE, delivering intelligent support across the entire data protection ecosystem.


This article will focus on Veeam Intelligence’s applications within Veeam Backup & Replication; in future discussions, we’ll explore its unique value in other products like Veeam ONE.

screenshot of Veeam Intelligence


Core Capabilities: Your 24/7 Expert Team

Veeam Intelligence is not merely a Q&A tool—it’s a full team of experts. Within Veeam Backup & Replication, whether you need an architect, support engineer, security advisor, or development engineer, it can assume the corresponding professional role. In other products like Veeam ONE, it demonstrates different expertise, providing intelligent support for monitoring, reporting, and analytics.


🏗️ Architect Role: Intelligent Advisor for System Design

When facing complex environment planning, Veeam Intelligence analyzes your VM count, business type, and RTO/RPO requirements to deliver comprehensive architectural design proposals. It not only helps predict storage growth and recommend optimal scaling timing but also identifies potential single points of failure and suggests redundancy solutions. Most importantly, it finds the most cost-effective hardware investment plan while meeting your business needs.


🔧 Support Engineer Role: Troubleshooting Powerhouse

When production issues arise, Veeam Intelligence rapidly analyzes error logs to pinpoint root causes. It doesn’t just check related configuration settings and uncover potential linked issues—it also provides clear, step-by-step troubleshooting guides to help you trace symptoms back to their true origins. Even better, it offers preventive measures to avoid recurrence.


🛡️ Security Advisor Role: Professional Guidance for Data Protection

When facing data security threats and compliance requirements, Veeam Intelligence acts like a dedicated security consultant, offering comprehensive protection recommendations. It not only analyzes current environment risks and suggests appropriate Malware Detection configurations but also provides defense strategies based on the latest threat intelligence. Notably, it delivers targeted security configuration guidance aligned with Veeam’s latest security feature updates from v12 to v13, ensuring your data protection framework consistently meets the latest security standards and compliance mandates.


💻 Development Engineer Role: Coding Partner for Automation

When you need to develop automation scripts or system integrations, Veeam Intelligence automatically generates PowerShell and Python script templates, provides REST API call examples, and delivers complete technical integration plans. This dramatically lowers the barrier to automation development, enabling tasks that once took weeks to be completed in just days.


Latest Highlight: Powered by enhanced foundational models and visible reasoning processes, each role can now see the AI’s professional analytical logic—ensuring accuracy and actionable recommendations.


New Features of Veeam Intelligence in Veeam Backup & Replication

Thanks to recent updates, Veeam Intelligence’s capabilities within Veeam Backup & Replication have taken a quantum leap. While similar features exist in other products like Veeam ONE, this article focuses specifically on VBR scenarios:


🎯 Fully Natural Language Conversations with Voice Input/Output Support

Imagine solving problems as easily as chatting with a colleague: “My backup job failed last night—error code 2934 affected my finance database backup. What should I do?” Veeam Intelligence fully understands your problem description and delivers precise solutions.


Even better, it supports voice input and output. Picture yourself sipping coffee in the morning, saying to your computer: “Give me a report on last night’s backups,” and the AI assistant instantly delivers a detailed summary. This natural interaction makes daily operations smoother and more enjoyable.


🎯 Thinking Mode Support

Veeam Intelligence follows mainstream AI trends by introducing Visible Thinking Process functionality—a now-standard feature in conversational AI. Veeam brings this convenience to the data protection field.


In Thinking Mode, the AI assistant reveals its full analytical process: from understanding the core problem, to querying relevant knowledge bases, to reasoning toward a conclusion. This transparent workflow lets you not only know “what” but also “why.”


This design helps users better understand the AI’s decision logic and enables them to ask follow-up questions about the reasoning process, creating truly meaningful human-AI dialogue experiences.


🎯 Basic and Advanced Modes

Veeam Intelligence offers two distinct working modes, striking a balance between usability and data privacy:


Basic Mode: Operates entirely on Veeam’s public knowledge base without sending your specific environment data to any external services. While it cannot access real-time data from your current VBR server, it’s sufficient for learning Veeam concepts, understanding best practices, or consulting configuration methods.


Advanced Mode: More powerful, this mode directly queries your VBR server information. It transmits relevant data from your backup server to Veeam’s AI model in the cloud, analyzes the data on your backup server, and provides tailored recommendations.