Articul8 AI Logo

Articul8 AI

Senior Site Reliability Engineer (SRE) - (Brazil)

Posted 17 Days Ago
Remote
2 Locations
Senior level
Remote
2 Locations
Senior level
The role involves ensuring system reliability for a GenAI SaaS, automating infrastructure, implementing monitoring solutions, and leading incident response efforts.
The summary above was generated by AI
About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities
  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.

  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.

  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.

  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.

  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.

  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.

  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.

  • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.

  • Implement and enforce security best practices across all systems and environments.

  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

QualificationsRequired
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience

  • 5+ years of experience in DevOps, SRE, or similar roles

  • Strong experience with cloud platforms (AWS, GCP, or Azure)

  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)

  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)

  • Solid background in containerization technologies (Docker, Kubernetes)

  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)

  • Strong understanding of CI/CD pipelines and automation

  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems

Preferred
  • Experience supporting AI/ML systems in production

  • Knowledge of GPU infrastructure management and optimization

  • Familiarity with distributed systems and high-performance computing

  • Experience with database systems (SQL and NoSQL)

  • Certifications in cloud platforms (AWS, GCP, Azure)

  • Experience with chaos engineering and resilience testing

  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow’s AI at Articul8 AI!

Top Skills

AWS
Azure
Bash
CloudFormation
Docker
Elk Stack
GCP
Go
Grafana
Kubernetes
Prometheus
Python
Terraform

Similar Jobs

5 Days Ago
Remote
USA
134K-214K Annually
Mid level
134K-214K Annually
Mid level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Sr. Site Reliability Engineer will automate incident and change management processes, optimize efficiency, and collaborate with stakeholders to maintain reliability at Toast.
Top Skills: AWSAzureFirehydrantGCPGoJIRAPythonTerraform
18 Days Ago
Remote
United States
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
As a Senior Site Reliability Engineer, you will manage corporate IAM systems, develop cloud-native applications, and enhance automation while ensuring system reliability and security.
Top Skills: AnsibleAzure AdC#DockerDuoGoGoogle WorkspaceJavaKubernetesOktaPingPythonRubyTerraform
18 Days Ago
Remote
United States
140K-165K Annually
Senior level
140K-165K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Site Reliability Engineer will enhance system reliability and observability, support cloud deployment optimizations, provide mentorship, and improve incident management while ensuring software quality and operational integrity.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account