info@stepindia.net

AI-Powered DevOps Engineer Program + MLOPS + AIOPS

Duration: 4 Months

About DevOps & Cloud Computing

The AI-Powered DevOps Engineer Program with MLOps and AIOps is a comprehensive, industry-focused course designed to take learners from beginner to advanced level. It begins with core fundamentals such as Linux, networking, and shell scripting, then gradually introduces cloud computing using Amazon Web Services. The course covers modern tools and technologies like Docker and Kubernetes for containerization and orchestration, along with automation tools like Ansible and Terraform. It also includes CI/CD pipeline creation using Jenkins, code quality analysis with SonarQube, and version control using Git, making it a complete DevOps learning experience.

In addition to traditional DevOps, this course stands out by integrating advanced concepts like MLOps and AIOps. You will learn how machine learning models are developed, deployed, and managed in production environments, and how AI can be used to monitor systems, detect anomalies, and predict failures. These modules help bridge the gap between DevOps and AI, giving you a strong understanding of how modern intelligent systems work in real-world scenarios. The course is highly practical, with hands-on labs and projects that simulate real industry use cases.

You should choose this course because it prepares you for one of the most in-demand careers in today’s tech industry. Instead of focusing on just one skill, it provides a complete ecosystem of development, deployment, automation, and monitoring with AI integration. With real-world projects like deploying ML models and building smart monitoring systems, you gain practical experience that companies value. By the end of the program, you will be job-ready with the skills required to manage production environments, automate workflows, and implement intelligent monitoring solutions, making you a future-ready DevOps engineer.

Course Curriculum:

  • Linux Architecture (Linux Flavors and Kernel)
  • Understanding core principle of Linux OS
  • Ubuntu 20.04 or 22.04 LTS version Installation as of today on AWS Cloud/WSL [Ubuntu 20.04 Server]
  • SSH and Password based authentication
  • Basic commands to handle Linux OS like cat, ls, date, free, top etc.
  • Core fundamentals for Root Filesystem (/) like /root, /proc, /lib, /bin etc.
  • Accessing Server and Managing networking (IP addresses and Classes etc.)
  • Understanding basic commands as initial stage like ssh, ls, ip, cp, mv, mkdir, apt etc.
  • Package management on Ubuntu
  • Installing packages on Ubuntu and understand the default files and folders to be managed Nginx/Apache, MySQL, WordPress applications
  • Linux file editors like nano, vi, vim
  • Linux File permissions and ownership management with chown and chmod commands
  • System and Network troubleshooting commands
  • Disk Management: partitions and LVM (Logical Volume Manager)
  • What are commands and shell scripts?
  • Types of shells support in Linux OS
  • Difference between sh and bash
  • Permissions and execution
  • Variables and Arrays
  • Conditions and Loops
  • IO Redirection
  • Shell Functions
  • Exit codes
  • Important commands for scripting and daily operations like grep, awk, sed, find etc.
  • Signals and numbers
  • IP address types like IPv4 and IPv6
  • What is IPv4?
  • Classes
  • IP address
  • Public and Private IP addresses
  • Subnet
  • Broadcast
  • CIDR
  • Gateway IP address and NAT Gateway
  • Routers and Switches
  • DNS
  • Hostname
  • How to calculate IP address using simple formulas?
  • How to assign IP address and handle network in Linux OS Machines (VMs or Bare-Metal Servers)
  • Network troubleshooting tools like ping, netstat, nmap, traceroute etc.
  • What is Cloud Computing?
  • What is IaaS, PaaS, and SaaS?
  • Why AWS Cloud?
  • AWS account creation and access
  • Secure AWS account without root access
  • Understanding Compute, Monitoring, Authentication & Authorization services, and their dashboards
  • IAM (Identity and Access Management)
  • What is Authentication and Authorization?
  • MFA and IAM administrative access
  • Users
  • Groups
  • Policies (Inline and JSON)
  • Roles
  • Account settings
  • STS (Security Token Service)
  • Differences between Bare-metal and Virtual Machines (Cloud and On-Premises)
  • Amazon Machine Image
  • Instance types (Flavors)
  • Security Group
  • Disk
  • Key pairs
  • How to connect to EC2 Instances
  • Instance types like On-Demand, Reserved, Spot, Dedicated Instance and Dedicated host
  • Launch Template
  • Image creation from EC2 Instance
  • What is VPC?
  • Network, Subnet, Route Table, Subnet Association
  • Internet gateway and NAT gateway
  • Elastic IP
  • VPC Peering
  • Transit Gateway
  • ACL
  • Egress Only Internet gateway
  • Endpoints and Carrier gateway
  • Endpoint Services
  • Site-to-site VPN
  • What is Object, Block and Filesystem Storage?
  • S3 buckets
  • EBS (Elastic Block Storage)
  • EFS (Elastic Filesystem)
  • AWS Storage Gateway
  • AWS Transfer Family
  • Databases

  • Understanding Relational and Non-Relational Databases
  • Databases provided by AWS RDS
  • What is a Database Engine?
  • Microsoft SQL Server
  • How to connect databases from different remote locations?
  • Backup and restore functionalities
  • What is LoadBalancer?
  • Difference between Layer 7 and Layer 4 load balancing?
  • Types of LoadBalancer support in AWS
  • CLB (Classic Load Balancer)
  • ALB (Application Load Balancer)
  • NLB (Network Load Balancer)
  • GLB (Gateway Load Balancer)
  • Target Groups
  • AWS Auto Scaling
  • What is autoscaling?
  • How does it help to spin multiple EC2 instances horizontally?
  • Creating Launch Configuration
  • How to create and manage Auto Scaling groups?
  • Notifications
  • AWS CloudWatch
  • What is CloudWatch?
  • Alarms
  • Logs
  • Events
  • Metrics
  • Insights
  • Application Monitoring
  • AWS CloudTrail for auditing AWS account
  • AWS CloudFormation Templates
  • Container and Orchestration services
  • ECS (Elastic Container Service) and Fargate
  • ECR (Elastic Container Registry)
  • EKS (Elastic Kubernetes Service) and Fargate
  • Accessing EKS Cluster
  • Private and Public EKS Cluster
  • VPC network
  • IAM roles and policies for EKS
  • Node Group
  • External DNS for AWS Route 53 for EKS Ingress hosts
  • What is DNS?
  • What are DNS domain zones?
  • DNS troubleshooting tools
  • Private and Public zones
  • Internal and External Domains in AWS Route 53
  • AWS Notification services
  • Simple Email Service
  • Simple Notification Services
  • Simple Queue Service
  • AWS Certificate Manager
  • AWS System Manager
  • AWS Backup
  • AWS ElastiCache
  • What is Version control system?
  • Difference between CVCS and DVCS?
  • Why Git?
  • Git workflow
  • Git commands (add, init, push, pull, commit, etc.)
  • Git logs
  • Git repositories like GitHub, AWS CodeCommit
  • Branching strategies
  • Tagging code
  • Reset vs Rebase
  • How to fix merge conflicts in git?
  • Integrate CI with Jenkins and GitHub
  • What is Configuration Management?
  • Opensource tools for Configuration Management
  • Why Ansible and Ansible architecture
  • Ansible setup and configuration
  • Add-hoc commands
  • Understanding hosts and ansible.cfg files
  • YAML Fundamentals
  • Variables and Facts
  • Groups_vars and host_vars etc.
  • Modules
  • Roles
  • Playbooks
  • What is Container and Images?
  • Applications on Virtual Machines
  • Applications on Containers
  • What are container images?
  • Why must we run applications on Containers?
  • Docker and Use cases?
  • Docker Fundamentals
  • Docker public images from Docker Hub
  • Docker private images
  • Docker Containers
  • Docker Networking
  • Host
  • Bridge
  • None
  • Overlay
  • Docker Storage
  • Volume mount
  • Bind mount
  • Tempfs mount
  • Docker Backup and Restore
  • Data backup
  • Images
  • Containers
  • Docker Registry
  • Local Registry
  • Custom Images build using Dockerfile
  • Scan tools like Trivy, Clair etc. to identify vulnerabilities in images
  • Docker Compose
  • Kubernetes Architecture
  • Kubernetes control plane and compute plane components
  • Kubernetes setup on local system using kubeadm
  • How to connect/reach k8s cluster using KUBECONFIG file
  • Kubernetes authentication and authorization
  • Kubernetes workloads
  • Pods
  • Replication Controller
  • ReplicaSet
  • Deployment
  • DaemonSet
  • StatefulSet
  • Jobs and CronJobs
  • Kubernetes Storage
  • Volumes
  • PV, PVC and StorageClass
  • CSI
  • Kubernetes ConfigMap and Secrets
  • Services
  • ClusterIP
  • NodePort
  • LoadBalancer
  • ExternalIP and ExternalName
  • Ingress Controller
  • Network Policy
  • CIS benchmarking tools like kube-bench etc.
  • What is CI/CD?
  • Jenkins components
  • Installing and Configuring Jenkins on AWS EC2 or VirtualBox
  • Jenkins authentication and authorization
  • Plugins
  • Global system settings
  • Jenkins Jobs
  • Free style Project
  • Multi-branch project
  • Folder
  • Pipeline Project
  • What is Static code analysis?
  • Tools available for static code analysis
  • SonarQube architecture and components
  • Sonar scanner for Terraform configuration analysis
  • Integrate SonarQube with Jenkins CI/CD for SCA
  • Analyzing report generated in Sonar Dashboard
  • What is Infrastructure as Code?
  • Opensource tools for IaC
  • Why only Terraform?
  • Terraform Setup and Configuration to communicate AWS Cloud provider
  • Terraform workflow
  • Terraform configuration language
  • Terraform top-level blocks
  • Terraform resources
  • Terraform variables
  • Terraform functions
  • Terraform values
  • Terraform datasources
  • Terraform backends
  • Terraform workspaces
  • Terraform modules public and custom
  • Terraform expressions
  • Terraform disaster recovery
  • Trivy for image vulnerability scan as part of Docker image build
  • Build CI/CD workflow to build, scan, and push images to ECR
  • Implement CIS benchmark check on Docker Engine machine using Docker security
  • Understanding the checks, their status codes, which to ignore and which must be fixed
  • Integrating CI/CD workflow to run checks on Kubernetes cluster using kube-bench tool
  • KubeLinter tool helps to analyze manifest files (YAML) and Helm charts as well
  • Trivy for image vulnerability scan as part of Docker image build
  • Build CI/CD workflow to build, scan, and push images to ECR
  • Objective
  • Give students understanding of:
  • How AI models go to production
  • How DevOps + ML work together
  • Introduction:
  • What is Machine Learning (ML)?
  • Difference: DevOps vs MLOps
  • Why MLOps is needed?
  • Data Collection
  • Data Preprocessing
  • Model Training
  • Model Evaluation
  • Model Deployment
  • Model Monitoring
  • Python basics (concept level)
  • ML model concepts (classification/regression)
  • API-based model serving
  • Model as API
  • Containerizing ML model using Docker
  • Deploy on Kubernetes
  • Host on Amazon Web Services
  • Use simple pre-trained ML model (no deep coding)
  • Convert model into API (Flask/FastAPI)
  • Dockerize the model
  • Deploy on Kubernetes
  • Access via public IP
  • Objective Show how AI helps in:
  • Monitoring
  • Predicting failures
  • Reducing downtime
  • Introduction
  • What is AIOps?
  • Difference: Monitoring vs AIOps
  • Log analysis
  • Metrics analysis
  • Anomaly detection
  • Root cause analysis
  • Predictive alerts
  • Prometheus
  • Grafana
  • AI-based tools overview (Datadog, Dynatrace)
  • CPU spike detection
  • Auto scaling
  • Alert prediction
  • Project: Smart Monitoring System
  • Deploy application on Kubernetes
  • Monitor using Prometheus + Grafana
  • Simulate load (CPU spike)
  • Analyze metrics
  • Detect anomaly manually (basic AI concept explanation)
DevOps & Cloud Computing Engineer1 DevOps & Cloud Computing Engineer2

Center Features: