DevOps AWS - SRE -

  • Buenos Aires, Argentina
  • Full-Time
  • On-Site

Job Description:

Job Specification: SRE (Site Reliability Engineer)

Location: Buenos Aires

Team: Sales & Research Production Management

Work Pattern: M-F; Hybrid (Belgrano)

Role Overview

We are seeking experienced Site Reliability Engineers (SREs) to support the transition

and operational uplift of a key application being handed off to our team. This application

underpins critical workflows for our Global Markets business, and your work will directly

impact the stability, reliability, and efficiency of our client-facing technology.

You'll be joining a high-performing production management team responsible for

ensuring operational excellence across the platforms that power our Sales and

Research professionals. This is a hands-on role with a clear mandate: take ownership

of the application, drive improvements in supportability and observability, and ensure a

seamless handover to our production management team.

Key Responsibilities

1. BAU Support & Ownership

o Provide day-to-day (BAU) support for the application's processes and

workflows, ensuring stability, availability, and swift response to end-user

issues.

o Act as the primary point of contact for all production support matters

related to the application.

2. Application Maturity Assessment

o Evaluate the current state of the application with respect to supportability,

reliability, and observability.

o Identify gaps and areas for improvement, documenting findings and

recommendations.

3. Observability Integration & Enhancement

o Remediate observability gaps by integrating the application's processes

with our standard monitoring and ing tools, including Dynatrace, Splunk,

Grafana, and Geneos.

o Ensure robust monitoring coverage and actionable ing for all critical

workflows.

4. Operational Toil Reduction & Automation

o Identify and remediate sources of operational toil and manual intervention.

o Build automation solutions or integrate with existing in-house platforms to

streamline support activities and improve operational efficiency.

5. Documentation & Handover

o Develop comprehensive support documentation and runbooks, ensuring

all procedures, troubleshooting steps, and escalation paths are clearly

captured.

o Prepare and execute a structured handover to the permanent production

management team at the end of the engagement.

Technical Environment

Hosting: AWS and internal cloud platforms

Orchestration: Astronomer (Airflow jobs)

Monitoring & Observability: Dynatrace, Splunk, Grafana, Geneos

Required Skills & Experience

SRE & Production Support: Proven experience in SRE, production

management, or application support roles within large-scale, mission-critical

environments.

Cloud Platforms: Hands-on expertise with AWS and internal cloud platforms.

Programming: Proficiency in at least one programming language such as

Python or Java/Spring Boot, with the ability to script, automate, and troubleshoot

application workflows.

CI/CD Tools: Experience with continuous integration and continuous delivery

tools, such as Jules, Jenkins, GitLab, or Terraform, to support automated build,

deployment, and infrastructure management.

Observability Tooling: Strong background in observability such as white and

black box monitoring, service level objective ing, and telemetry collection using

tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others

Workflow Orchestration: Experience with workflow orchestration tools, ideally

Astronomer/Airflow.

Containers & Orchestration: Familiarity with container technologies and

orchestration platforms such as ECS and Kubernetes, including deployment and

operational best practices.

Automation & Toil Reduction: Demonstrated ability to automate operational

tasks and reduce manual toil, preferably using in-house or open-source

solutions.

Documentation: Excellent documentation skills and experience creating

runbooks for production support teams.

Communication: Strong communication and stakeholder management skills,

with a collaborative and proactive approach.