This primer focuses on Python functionalities that solve DevOps problems: infrastructure automation, CI/CD scripting, API interaction, and log parsing. It assumes you understand basic programming concepts and focuses on practical application.

What you’ll learn: How to replace brittle Bash scripts with maintainable Python, interact with APIs and cloud providers, generate configs dynamically, and build CLI tools your team will actually use.


1. Environment Variables and Secrets

Before anything else: never hardcode credentials. Environment variables are the standard way to pass secrets and configuration into scripts.

import os
 
# Get a required variable (raises KeyError if missing)
api_key = os.environ["API_KEY"]
 
# Get with a default fallback
env = os.getenv("ENVIRONMENT", "dev")
 
# Check if running in CI
if os.getenv("CI"):
    print("Running in CI pipeline")

For AWS specifically, use ~/.aws/credentials or set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The SDK picks these up automatically—no code changes needed.


2. System Interaction: The “Better Bash”

DevOps engineers use Python to replace complex Bash scripts because Python offers better error handling and readability.

The subprocess Module

Stop using os.system(). The subprocess module is the standard for running shell commands.

import subprocess
 
# Run a command and capture output
try:
    result = subprocess.run(
        ["ls", "-l", "/var/log"],
        capture_output=True,
        text=True,
        check=True,  # Raises CalledProcessError if return code != 0
    )
    print("STDOUT:", result.stdout)
except subprocess.CalledProcessError as e:
    print(f"Command failed with error: {e.stderr}")

Avoid shell=True Using shell=True opens you to shell injection attacks. If you're tempted to use it because your command "doesn't work otherwise," you probably need to split your command into a proper list: ["ls", "-l"] not "ls -l".

pathlib: Modern File Paths

String manipulation for file paths is error-prone. Use pathlib.

from pathlib import Path
 
# Create a path object
log_dir = Path("/var/log/myapp")
 
# Check if exists, create if not
if not log_dir.exists():
    log_dir.mkdir(parents=True, exist_ok=True)
 
# Iterate over files
for file in log_dir.glob("*.log"):
    print(f"Found log: {file.name}")
 
# Combine paths safely (works on any OS)
config_file = Path.home() / ".config" / "myapp" / "settings.yaml"

Related: shutil provides high-level file operations like copying directory trees and creating archives.


3. Data Serialization

DevOps is essentially moving data between JSON, YAML, and configuration files.

JSON (Built-in)

Used for API payloads and cloud policies.

import json
 
data = {"instance_id": "i-12345", "status": "running"}
 
# Dict to JSON string
json_str = json.dumps(data, indent=2)
 
# JSON string to Dict
parsed_data = json.loads(json_str)
 
# Reading from a file
with open("config.json", "r") as f:
    config = json.load(f)

YAML

Used for Kubernetes, Ansible, and CI/CD configs.

pip install pyyaml
import yaml
 
# Reading YAML
with open("deployment.yaml", "r") as f:
    k8s_config = yaml.safe_load(f)
 
print(k8s_config["spec"]["replicas"])
 
# Writing YAML
k8s_config["spec"]["replicas"] = 5
with open("deployment_updated.yaml", "w") as f:
    yaml.dump(k8s_config, f)

Always use safe_load() Never use yaml.load() without a Loader—it can execute arbitrary code. safe_load() is the secure default.


4. Building CLIs

If you’re writing a script for your team, don’t make them edit the code to change variables. Use argparse to create flags.

import argparse
 
def main():
    parser = argparse.ArgumentParser(description="Deploy Service Tool")
 
    # Positional argument
    parser.add_argument("service", help="Name of the service to deploy")
 
    # Optional flag
    parser.add_argument(
        "--env",
        choices=["dev", "staging", "prod"],
        default="dev",
        help="Target environment",
    )
 
    # Boolean flag
    parser.add_argument(
        "--force", action="store_true", help="Skip safety checks"
    )
 
    args = parser.parse_args()
 
    print(f"Deploying {args.service} to {args.env}...")
    if args.force:
        print("Force flag detected. Skipping checks.")
 
 
if __name__ == "__main__":
    main()

Usage:

python deploy.py my-api --env prod --force

Alternative: click For more complex CLIs, consider the click library. It uses decorators and is more intuitive for nested commands and interactive prompts.


5. Talking to Infrastructure: APIs

You will constantly interact with REST APIs (GitHub, Jenkins, Jira, cloud providers). The requests library is the standard.

pip install requests
import requests
import os
 
token = os.environ["GITHUB_TOKEN"]
headers = {"Authorization": f"token {token}"}
 
# GET request
response = requests.get(
    "https://api.github.com/user/repos",
    headers=headers,
    timeout=30,  # Always set a timeout
)
 
if response.status_code == 200:
    repos = response.json()
    print(f"Found {len(repos)} repositories.")
else:
    print(f"Error: {response.status_code} - {response.text}")
 
# POST request
payload = {"name": "new-devops-tool", "private": True}
r = requests.post(
    "https://api.github.com/user/repos",
    json=payload,
    headers=headers,
    timeout=30,
)

Handling Failures

Real-world API calls fail. Handle network errors and implement retries.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
 
def create_session_with_retries():
    session = requests.Session()
    retries = Retry(
        total=3,
        backoff_factor=1,  # Wait 1s, 2s, 4s between retries
        status_forcelist=[500, 502, 503, 504],
    )
    session.mount("https://", HTTPAdapter(max_retries=retries))
    return session
 
session = create_session_with_retries()
 
try:
    response = session.get("https://api.example.com/data", timeout=30)
    response.raise_for_status()  # Raises HTTPError for 4xx/5xx
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

6. Cloud Automation: AWS Boto3

If you use AWS, boto3 is unavoidable. It allows you to control AWS resources programmatically.

pip install boto3
import boto3
 
# Initialize a client (uses credentials from env or ~/.aws/credentials)
s3 = boto3.client("s3")
 
# List buckets
response = s3.list_buckets()
for bucket in response["Buckets"]:
    print(f"Bucket Name: {bucket['Name']}")

EC2 Example: Stop Dev Instances

import boto3
 
ec2 = boto3.resource("ec2", region_name="us-east-1")
filters = [{"Name": "tag:Environment", "Values": ["Dev"]}]
 
instances = ec2.instances.filter(Filters=filters)
for instance in instances:
    print(f"Stopping {instance.id}")
    instance.stop()

7. Templating with Jinja2

Generate config files dynamically (Nginx configs, Terraform files) based on variables. String concatenation is messy; Jinja2 is the solution. It’s also what Ansible uses internally.

pip install jinja2
from jinja2 import Template
 
nginx_template = """
server {
    listen {{ port }};
    server_name {{ domain }};
 
    location / {
        proxy_pass http://{{ upstream }};
    }
}
"""
 
t = Template(nginx_template)
conf = t.render(port=8080, domain="api.internal.local", upstream="127.0.0.1:3000")
 
print(conf)

For larger projects, load templates from files:

from jinja2 import Environment, FileSystemLoader
 
env = Environment(loader=FileSystemLoader("templates/"))
template = env.get_template("nginx.conf.j2")
output = template.render(services=my_services_list)

8. Error Handling Patterns

DevOps scripts run unattended. They need to fail gracefully and provide useful output.

import sys
import logging
 
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
 
 
def deploy_service(name: str, env: str) -> bool:
    """Deploy a service. Returns True on success."""
    try:
        logger.info(f"Deploying {name} to {env}")
        # ... deployment logic ...
        return True
 
    except PermissionError:
        logger.error(f"Permission denied deploying {name}")
        return False
 
    except Exception as e:
        logger.exception(f"Unexpected error deploying {name}: {e}")
        return False
 
 
def main():
    success = deploy_service("my-api", "prod")
    sys.exit(0 if success else 1)  # Exit codes matter for CI/CD
 
 
if __name__ == "__main__":
    main()

Key points:

  • Use logging instead of print() for timestamps and log levels
  • Return meaningful exit codes (0 = success, non-zero = failure)
  • Use logger.exception() to include stack traces
  • Type hints make scripts maintainable as they grow

9. Quick Reference

LibraryUse Case
os / sysEnvironment variables, exit codes
pathlibFile path manipulation
shutilCopying directories, creating archives
subprocessRunning shell commands
reParsing logs with regex
datetimeTimestamps, uptime calculations
jsonAPI payloads, configs (built-in)
yamlKubernetes, Ansible configs
requestsHTTP/REST API calls
boto3AWS automation
paramikoSSH connections
jinja2Config file templating
clickBuilding CLIs (alternative to argparse)
pytestTesting infrastructure code

10. Best Practices

Always use virtual environments. Never install into system Python.

python3 -m venv venv
source venv/bin/activate
pip install requests boto3

Add a shebang for executable scripts on Linux:

#!/usr/bin/env python3

Use type hints to keep growing scripts maintainable:

def restart_service(service_name: str, timeout: int = 30) -> bool:
    ...

Pin your dependencies in requirements.txt:

requests==2.31.0
boto3==1.28.0
pyyaml==6.0.1