Job Title: |
|
Senior DevOps Engineer |
Category: |
|
Software Development |
Total Positions: |
|
2 |
Job Location: |
|
Lahore |
Gender: |
|
Female |
Minimum Education: |
|
Bachelors |
Career Level: |
|
Experienced Professional |
Minimum Experience: |
|
5 Years |
Salary Range: |
|
PKR 200,000 to 250,000 per Month |
Apply By: |
|
Jun 6, 2025 |
|
|
|
|
|
|
|
|
Job Description: |
You will play a critical role in setting observability standards and driving automation within our engineering teams. Your responsibilities will include managing and configuring the Datadog observability platform using Infrastructure-as-Code (IaC) practices.
This is a hands-on role focused on ensuring end-to-end visibility into Java applications, Kubernetes workloads, and containerized infrastructure, with a focus on scaling observability efficiently and cost-effectively.
You will collaborate closely with Site Reliability Engineering (SRE), DevOps, and Software Engineering teams to standardize monitoring and logging practices, contributing to the development of scalable and reliable observability solution.
Key Responsibilities:
- Define and implement observability standards for Java applications, Kubernetes workloads, and cloud infrastructure.
- Configure and manage Datadog using Terraform and IaC best practices.
- Lead the adoption of structured JSON logging, distributed tracing, and custom metrics across Java and Python services.
- Optimize Datadog usage with cost governance strategies, log filtering, sampling, and automated reporting.
- Collaborate with Java developers and platform engineers to standardize instrumentation and alerting.
- Troubleshoot issues related to logs, metrics, and traces, ensuring proper instrumentation and data flow into Datadog.
- Participate in incident response activities, providing insights for actionable alerting, root cause analysis (RCA), and reliability improvements.
- Serve as the main point of contact for Datadog-related inquiries and internal support.
- Continuously audit and improve monitor configurations, reducing false positives and improving alert quality.
- Maintain clear documentation on Datadog usage, standards, integrations, and IaC workflows.
- Evaluate and suggest improvements to the observability stack, including new Datadog features and OpenTelemetry adoption.
- Mentor engineers and develop training programs on Datadog, observability-as-code, and log pipeline architecture.
Qualifications:
- Bachelors degree in Computer Science, Engineering, Mathematics, Physics, or a related technical field.
- 5+ years of experience in DevOps, Site Reliability Engineering, or related roles, with a strong focus on observability and Infrastructure-as-Code.
- Extensive experience managing and scaling Datadog programmatically through Terraform, APIs, and CI/CD workflows.
- Deep knowledge of Datadog features including APM, logs, metrics, tracing, dashboards, and audit trails.
- Experience integrating Datadog observability into CI/CD pipelines (e.g., GitLab CI, AWS CodePipeline, GitHub Actions).
- Strong understanding of AWS services and best practices for monitoring Kubernetes infrastructure.
- Preferred background in Java application development.
Why Join Us?
- Join a dynamic team where youll shape the future of observability and make a real impact on system reliability and performance.
- Youll work with cutting-edge technologies, lead key initiatives, and mentor engineers.
- We offer a competitive salary, starting up to $1000 USD per month.
- Enjoy a collaborative, growth-focused environment where your contributions truly matter.
|
Company Information |
|
Company Name: Staff Link Company Description:
|
|