Lead AI Cloud Infrastructure Architect

  • E-Commerce Product
  • Hyderabad
  • 1 week ago
  • Full Time

Location: India
Salary: ₹25 – ₹30 LPA

Role Summary:

We are building a next-generation, AI-powered cloud infrastructure platform to compete with the world’s largest cloud providers.
We’re looking for a visionary and highly skilled Lead AI Cloud Infrastructure Architect to design, prototype, and scale the platform from the ground up.
You will be responsible for designing core services, infrastructure automation, AI integration, networking, security, and developer tooling that will power a global, distributed cloud.

Key Responsibilities:

  • Design and implement the core architecture of an AI-driven cloud platform, including compute, storage, networking, container orchestration, and serverless functions.
  • Lead the design of multi-tenant, highly available, scalable and secure infrastructure, including data centers and edge computing.
  • Integrate AI and machine learning to optimize resource allocation, auto-scaling, predictive maintenance, and cost management.
  • Build APIs and control planes for internal and external services (similar to AWS IAM, EC2, Lambda, etc).
  • Define and implement CI/CD pipelines, infrastructure-as-code, and automated monitoring and logging systems.
  • Collaborate with the team to choose and implement container orchestration (e.g., Kubernetes) and virtualization strategies.
  • Implement network security, data encryption, identity and access management, and compliance frameworks.
  • Lead performance testing, capacity planning, and disaster recovery design.
  • Explore and integrate emerging technologies: edge AI, GPU acceleration, FPGAs, and serverless computing.
  • Provide technical leadership, mentoring, and documentation.

 

Tools & Technologies (suggested stack)

  • Cloud orchestration & provisioning: Kubernetes, Terraform, Ansible, Pulumi
  • Programming languages: Go, Python, Rust, Java
  • AI/ML frameworks: TensorFlow, PyTorch, Ray, custom in-house models
  • Infrastructure: Docker, containerd, KVM / Hyper-V, Ceph, MinIO, Istio
  • CI/CD: Jenkins, GitHub Actions, Argo CD
  • Observability: Prometheus, Grafana, ELK stack
  • Networking & Security: Envoy, Open Policy Agent, HashiCorp Vault
  • Storage: Object storage (S3 compatible), block & file storage, distributed file systems
  • Databases: PostgreSQL, Cassandra, Redis, ScyllaDB
  • Serverless / Functions: Knative, OpenFaaS, custom runtime

 

Qualifications:

  • 7+ years in cloud infrastructure, distributed systems, or large-scale platform engineering.
  • Deep understanding of compute, storage, networking, and security at scale.
  • Experience designing high availability, disaster recovery, and fault tolerance for multi-region systems.
  • Hands-on expertise with containerization and orchestration (Kubernetes) and infrastructure-as-code.
  • Familiarity with AI/ML systems integration and AI-driven automation.
  • Strong programming skills (Go, Rust, or Python preferred).
  • Experience with compliance, IAM, and encryption standards.
  • Excellent leadership, architecture documentation, and communication skills.

 

Why this role is unique:

  • Build a cloud platform from scratch, not just maintain existing systems.
  • Leverage AI to create self-optimizing, autonomous infrastructure.
  • Shape the technical and product vision at an early stage.