Tech Mahindra·Cloud Infrastructure Engineer·Jul 2025–Present

Open to SRE & Cloud Opportunities

Hi, I'm

Sugandha
Vashishtha

Cloud & Infrastructure Engineer — specializing in AWS, Azure, SRE, and enterprise reliability at scale.

Noida, India·🏆 AWS Certified·Tech Mahindra

Download Resume

💼

9+ Yrs Exp

☁️

4+ Yrs Cloud

🏆

AWS Certified

scroll

01About Me

Where Operations meet Reliability

I'm Sugandha Vashishtha, a Cloud and Site Reliability Engineer with 9+ years of experience in IT, including 4+ years specializing in cloud operations and SRE across Amazon Web Services, Microsoft Azure, and Microsoft Nebula on-premises environments.

At Tech Mahindra, I've served as a Designated Response Individual (DRI)on agile SRE teams — owning service availability, incident response, and SLA/SLO accountability for production workloads at Microsoft scale. I've monitored and triaged alerts using Dynatrace, Jarvis, Hawkeye, and Microsoft ICM, and led root cause analyses for production incidents.

I believe the best SREs come from ops — because they've felt the 3 AM pages, traced the packet drops, and learned that reliability is a feature, not an afterthought.

Master of Computer Applications (MCA) — Pursuing

Manipal University Jaipur · 2026 – 2028

Bachelor of Computer Applications (BCA)

Manipal University Jaipur · 2022 – 2025

Current Focus

Cloud infrastructure engineering, Azure Bastion architecture, hybrid cloud security, and expanding toward CKA and Terraform certification

AWS & Azure Cloud Operations

SRE · Incident Management · DRI

Dynatrace, Jarvis & Hawkeye Observability

On-Prem · Hybrid Cloud · Azure Bastion

Years in IT

Years in Cloud & SRE

Cloud Platforms

Certifications

sugandha@cloud:~$

$ whoami

sugandha-vashishtha

$ cat /etc/current-role

Cloud Infrastructure Engineer @ Tech Mahindra

$ cat /etc/current-focus

Azure Bastion ▸ AWS ▸ SRE ▸ Observability ▸ CKA

$ echo $AVAILABILITY

open_to_opportunities=true

▌

02Skills

Technical Arsenal

Skills organized by depth of hands-on experience — from daily production use to active learning

Expertise

Core competencies — used daily in production

Microsoft Azure

AWS

Incident Management

SRE / DRI

Dynatrace

Microsoft ICM

Azure Bastion

Linux Administration

Windows Server

KQL (Kusto)

SQL

SLA / SLO Management

Proficient

Hands-on project & work experience

Terraform

Kubernetes

GitHub Actions

Prometheus & Grafana

Docker

SAST / DAST

Vulnerability Remediation

AWS EC2 & IAM

AWS Session Manager

Hawkeye & Jarvis

Shell Scripting

Python (Fundamentals)

CI/CD Pipelines

Git & GitHub

GitHub Copilot

TCP/IP & Networking

Learning & Building

Actively studying and expanding

Snowflake

CKA (Kubernetes Admin)

HashiCorp Terraform Associate

On-Premises Infrastructure

Enterprise datacenter & bare-metal operations — Microsoft Nebula environments

Microsoft Nebula

Cloudman

Fabric Manager

RAID Configuration

Bare Metal Ops

WDS

DHCP Scoping

Hardware Management

03Experience

Work History

9+ years building reliability — from network operations to cloud infrastructure and SRE at scale

Cloud Infrastructure Engineer

Tech Mahindra

📁 AT&T Bastion — Microsoft Azure & AWS

Jul 2025 – Present

Responsibilities

Architected and administered Azure Bastion hybrid cloud infrastructure for AT&T enterprise workloads — eliminated public RDP/SSH exposure across all managed VMs and enforced zero-trust access patterns
Diagnosed and resolved complex Azure-to-on-premises connectivity failures spanning Bastion, Load Balancers, Virtual Machines, and HTTP proxy; identified 3 recurring session-drop patterns and authored runbooks that cut resolution time from 40+ minutes to under 10
Managed Azure subscriptions, resource groups, virtual networks, IAM policies, and shared image galleries across Windows and Linux fleets; enforced configuration standards using automated health checks
Maintained SLA compliance through continuous metrics monitoring, log analysis, and proactive operational health checks across production cloud environments; used GitHub Copilot to accelerate code-level diagnostics

Azure BastionAWSIAMLoad BalancersVirtual NetworksLinuxGitHub Copilot

DevSecOps Engineer — Application OS Remediation

Tech Mahindra

📁 Tracfone–Verizon — AWS Security & Vulnerability Remediation

Aug 2024 – Jun 2025

Responsibilities

Performed SAST and DAST scans across 126 AWS-hosted applications to identify security vulnerabilities; tracked 300+ findings through full remediation lifecycle in alignment with compliance requirements
Conducted systematic vulnerability assessment and remediation of OS-level risks across AWS EC2 instances; prioritised HIGH and CRITICAL CVEs for immediate patching
Executed OS patching across 40+ EC2 instances aligned with Change Management protocols; ran pre- and post-patching validation checkpoints achieving zero downtime across all change windows
Accessed AWS EC2 instances via AWS Session Manager for patching and upgrades; managed server snapshots and rollback strategy for risk mitigation and business continuity

Key Outcomes

Assessed 126 applications through SAST/DAST in a single engagement cycle — identified and tracked 300+ HIGH/CRITICAL vulnerabilities to remediation closure
Achieved zero-downtime patching across 40+ EC2 instances through structured pre/post-validation checkpoints

SASTDASTAWS EC2Session ManagerOS PatchingVulnerability RemediationChange ManagementDevSecOps

Site Reliability Engineer (DRI)

Tech Mahindra

📁 Microsoft Nebula Score Cloud

Jul 2022 – Jul 2024

Responsibilities

Served as Designated Response Individual (DRI) on an agile SRE team — owned service availability, service health, incident response, and SLA/SLO accountability
Monitored services using Dynatrace and Microsoft ICM; participated in daily spike-management calls, triaged alerts, and led/contributed to root cause analyses (RCA) for production incidents
Monitored and supported production applications using Hawkeye and Jarvis; improved alert detection time through proactive monitoring and alert tuning
Managed Cloudman billing, OS patching, OS upgrades, infrastructure monitoring, and Fabric Manager operations; used KQL extensively to query telemetry and investigate production issues
Oversaw Microsoft Nebula architecture: offline nodes, storage tier services, hardware failures, host upgrades, reimaging, RAID configuration, WDS, fabric creation, and DHCP scoping

Key Outcomes

Maintained 99.9% SLA accountability as DRI across 8+ production services running on Microsoft Nebula infrastructure
Reduced mean time to detect (MTTD) by ~30% through proactive Dynatrace alert tuning and ICM incident triage optimisation
Coordinated with SETO, Private Lab Networks, and Microsoft Lab Services for hardware and network operations — supporting 2,000+ bare-metal nodes

SREDRIDynatraceMicrosoft ICMHawkeyeJarvisKQLNebulaCloudman

Network Support Engineer

Tech Mahindra

📁 Netgear — US, UK, Australia, Canada

Dec 2021 – May 2022

Responsibilities

Resolved enterprise network connectivity incidents across four countries; configured wireless controllers, routers, and switches; performed systematic routing and switching troubleshooting

NetworkingRoutingSwitchingWirelessTCP/IP

Technical Support & Customer Service

Earlier Experience

📁 Xavient Digital (TELUS) · HI3 Technologies (HP) · Convergys (AT&T)

May 2016 – Mar 2021

Responsibilities

Tier 2 technical support, root cause analysis, escalation management, and enterprise customer service across TELUS (Canada), HP India, and AT&T (USA) accounts

Technical SupportRoot Cause AnalysisNetworkingEnterprise Clients

04Certifications

Credentials & Learning Path

AWS & Red Hat certified — actively building toward Kubernetes and Terraform credentials

Completed

3 certifications

🟠

✅ Completed

AWS Certified Solutions Architect – Associate

Amazon Web Services

Issued May 2026

Designing resilient, cost-optimized, and high-performing architectures on AWS

Verified

Verify

🟠

✅ Completed

AWS Certified Cloud Practitioner

Amazon Web Services

Issued Jun 2025 · Valid through Jun 2028

Foundational AWS cloud concepts, services, security, and billing

Verified

Verify

🎩

✅ Completed

Red Hat System Administration I (RH124) — Ver. 9.3

Red Hat

Issued via Credly

Linux system administration fundamentals on Red Hat Enterprise Linux 9 — users, storage, networking, and services

Verified

Verify

In Progress

Actively studying

☸️

🔄 In Progress

Certified Kubernetes Administrator (CKA)

CNCF / Linux Foundation

Administering Kubernetes clusters in production environments

Studying now

🌍

🔄 In Progress

HashiCorp Certified: Terraform Associate

HashiCorp

Infrastructure as Code with Terraform for multi-cloud deployments

Studying now

05Projects

Things I've Built

Hands-on projects demonstrating cloud architecture, DevOps practices, and infrastructure automation

🏗️

View Repo

AWS Three-Tier Architecture

Problem

Need a scalable, fault-tolerant web application infrastructure that handles traffic spikes without manual intervention.

Solution

Production-grade three-tier architecture on AWS using EC2, RDS Multi-AZ, and ALB. Infrastructure provisioned entirely via Terraform with VPC segmentation, security groups, NAT Gateways, and CloudWatch monitoring.

Outcome

ASG auto-scales from 2→6 EC2 instances under load. Multi-AZ RDS failover completes in under 60 seconds. Terraform apply time reduced from hours of manual work to under 8 minutes.

★ Multi-AZ★ Auto-scaling★ IaC

AWSTerraformVPCEC2RDSALBCloudWatch

📊

View Repo

Kubernetes Monitoring Stack

Problem

Kubernetes clusters running without visibility into pod health, node resource usage, or SLO breach alerting.

Solution

Full observability stack with Prometheus, Grafana, and Alertmanager. Pre-built dashboards for cluster health, pod metrics, and custom SLO tracking. Deployed via Helm charts with GitOps-ready configuration.

Outcome

MTTD reduced from ~15 minutes (manual log review) to under 2 minutes via Alertmanager firing before user impact. Dashboard covers 20+ Kubernetes metrics across pods, nodes, and SLOs.

★ SLO Tracking★ Helm★ GitOps

KubernetesPrometheusGrafanaHelmAlertmanager

🌍

View Repo

Terraform Infrastructure Automation

Problem

Manual infrastructure provisioning across dev/staging/prod environments leads to configuration drift and inconsistent deployments.

Solution

Modular Terraform codebase for multi-environment AWS infrastructure using remote state management with S3/DynamoDB locking, workspaces, and reusable modules for VPC, EKS, RDS, and IAM. Integrated with GitHub Actions.

Outcome

Eliminated configuration drift across 3 environments (dev/staging/prod). Infra provisioning time cut from 4+ hours of manual clicks to a single `terraform apply` run in under 12 minutes.

★ Multi-env★ Remote State★ CI/CD

TerraformAWSGitHub ActionsS3EKSIAM

🔗

View Repo

Azure Hybrid Connectivity

Problem

On-premises workloads need secure, reliable connectivity to Azure virtual networks without exposing public endpoints.

Solution

Site-to-site VPN between on-premises network and Azure VNet using Azure VPN Gateway, local network gateway configuration, BGP routing, and NSG rules. Documented with step-by-step runbooks and tested failover procedures.

Outcome

Hybrid tunnel established with sub-100ms latency between on-prem and Azure VNet. BGP failover tested at under 30 seconds. Runbooks cut incident resolution time from 45+ minutes to under 10 minutes.

★ BGP Routing★ Failover★ Hybrid

AzureVPN GatewayBGPNSGHybrid Cloud

⚡

View Repo

DevOps CI/CD Pipeline

Problem

Containerized Node.js app deployed manually — no automated testing, no security scanning, and no rollback capability.

Solution

End-to-end CI/CD pipeline using GitHub Actions, Docker, and Kubernetes. Stages: linting → unit tests → Docker build/push to ECR → Trivy security scanning → automated deployment to EKS with rollback capability.

Outcome

End-to-end pipeline runs in under 6 minutes (lint → test → build → scan → deploy). Trivy blocks HIGH/CRITICAL CVEs before production. Rollback completes in under 2 minutes on failure — zero bad deployments reach users.

★ Security Scan★ Auto-rollback★ EKS

GitHub ActionsDockerKubernetesECRTrivyEKS

View All Projects on GitHub

06GitHub

Activity & Contributions

Building in public — infrastructure code, automation scripts, and DevOps tooling

Public Repos

Primary Languages

HCL · YAML

Security Scans

Trivy + SAST

Active Projects

IaC · K8s · CI/CD

📌 Featured Repositories

aws-three-tier-arch

Production-grade three-tier architecture on AWS with Terraform — VPC, EC2, RDS Multi-AZ, ALB, CloudWatch

HCL

k8s-monitoring-stack

Prometheus + Grafana + Alertmanager observability stack for Kubernetes clusters with SLO tracking

YAML

terraform-infra-automation

Modular Terraform codebase for multi-environment AWS infrastructure with remote state and GitHub Actions CI/CD

HCL

devops-cicd-pipeline

End-to-end CI/CD pipeline with GitHub Actions, Docker, Trivy security scanning, and EKS deployment with rollback

YAML

View GitHub Profile

07Writing

Technical Writing

Articles on Cloud, DevOps, AWS, and infrastructure — published on LinkedIn and Medium

Latest LinkedInJun 2026

GitHub vs GitLab: What Most Developers Get Wrong!!

Click to read on LinkedIn →

DevSecOps

Jan 2026

08Resume

My Resume

A full picture of my experience, skills, and education

Sugandha Vashishtha

Cloud & Site Reliability Engineer

📍 Noida, India📧 buildwithsugandha@gmail.com🔗 linkedin.com/in/sugandha-vashishtha

Key Skills

Cloud & Site Reliability Engineering (SRE)AWS & Azure Cloud OperationsIncident Management · DRI · SLA/SLOAzure Bastion & Hybrid CloudSAST / DAST Vulnerability RemediationDynatrace · Jarvis · Hawkeye · ICMLinux & Windows Server AdministrationShell Scripting · Python · KQL

Experience

Education

Certifications

Projects

Ready to download

9+ years of experience across cloud operations, SRE, AWS, Azure, incident management, and enterprise infrastructure. Available as a PDF for immediate download.

Download Resume (PDF)

Sugandha_Vashishtha_Resume.pdf

Last updated · June 2026

09Contact

Let's Connect

Open to Infrastructure, Cloud, DevOps, and SRE opportunities at enterprise technology organizations. Drop me a message — I respond within 24 hours.

Prefer a quick call? 📅

Schedule a 15-minute intro call — no commitment, just a conversation about how I can help.

Schedule a Call

buildwithsugandha@gmail.com

linkedin.com/in/sugandha-vashishtha

GitHub

github.com/buildwithsugandha

Location

Noida, India

Resume

Download PDF

Schedule a Call

Book a 15-min intro

Available for opportunities

Actively seeking Cloud Infrastructure, SRE, and DevOps roles at enterprise technology organizations. Remote and Noida/India-based positions welcome.

SugandhaVashishtha

Where Operations meet Reliability

Technical Arsenal

Expertise

Proficient

Learning & Building

On-Premises Infrastructure

Work History

Cloud Infrastructure Engineer

DevSecOps Engineer — Application OS Remediation

Site Reliability Engineer (DRI)

Network Support Engineer

Technical Support & Customer Service

Credentials & Learning Path

Completed

AWS Certified Solutions Architect – Associate

AWS Certified Cloud Practitioner

Red Hat System Administration I (RH124) — Ver. 9.3

In Progress

Certified Kubernetes Administrator (CKA)

HashiCorp Certified: Terraform Associate

Things I've Built

AWS Three-Tier Architecture

Kubernetes Monitoring Stack

Terraform Infrastructure Automation

Azure Hybrid Connectivity

DevOps CI/CD Pipeline

Activity & Contributions

📌 Featured Repositories

Technical Writing

GitHub vs GitLab: What Most Developers Get Wrong!!

DevOps vs DevSecOps: Why Speed Alone Is No Longer Enough

Streamlining Server Regression Analysis with Windows Deployment Kit

Speech to Text using AWS Transcribe, S3, and Lambda

Mastering API Fundamentals with POSTMAN and GraphQL

Understanding Web Servers: The Heart of the Internet

AWS Global Accelerator — Improving Latency and Design for Failure

The Benefits of AWS Global Accelerator — Case Study: EC2 Linux GUI

Empowering AI: Unleashing the Potential of ChatGPT Prompt Engineering

My Resume

Sugandha Vashishtha

Ready to download

Let's Connect

Sugandha
Vashishtha