Back

Job Description

Position: Site Reliability Engineer
Location: Pulchowk

We are looking for a Mid-Level Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The ideal candidate will play a key role in building observability, improving system uptime, managing incidents, and driving automation across infrastructure and application environments. This role requires strong collaboration with engineering and operations teams to maintain high availability and operational excellence in a fast-paced, high-transaction environment.

Qualification and Experience

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
2–4 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles
Experience working in production support, incident management, or cloud-based environments
Exposure to high-availability and distributed system environments (preferred fintech or high-transaction systems)

Job Description

Implement and maintain monitoring, logging, alerting, and observability solutions across systems and services
Monitor infrastructure, applications, APIs, databases, and system performance to ensure optimal availability
Configure proactive alerts, dashboards, and centralized observability platforms
Respond to production incidents and coordinate resolution, escalation, and service restoration activities
Prepare incident reports, RCA, impact analysis, and preventive action plans
Maintain postmortems, runbooks, SOPs, and operational documentation
Collaborate with engineering teams for deployment validation, change management, and operational readiness
Improve system reliability, scalability, and performance through continuous analysis and optimization
Identify operational risks, bottlenecks, and propose automation and infrastructure improvements
Support SLI/SLO definition, error budget tracking, and reliability governance
Participate in disaster recovery, business continuity planning, and incident response processes
Provide clear communication and updates during incidents and operational events

Required Skills

Strong experience in Linux/Unix system administration
Hands-on experience with monitoring and observability tools (Grafana, Prometheus, ELK, AppDynamics)
Knowledge of logging, metrics, tracing, and alerting systems
Proficiency in scripting (Bash, Python, or Shell)
Understanding of APIs, microservices, networking, and distributed systems
Experience with CI/CD pipelines and DevOps practices
Knowledge of Docker, Kubernetes, and containerized environments
Familiarity with cloud infrastructure and high availability concepts

Benefits of Working at eSewa

Stellar opportunity to work with a rising company
The amazing and passionate young team, beautiful office space
Trust of the biggest FinTech company.
One-of-a-kind company culture and growth opportunities to accelerate your career progression.

How to apply?

We are always keen to meet energetic and talented professionals who would like to join our team. Click on the button below and submit your application to apply for the post.

Application Deadline: May 31, 2026
To apply for this job please visit career.f1soft.com.

This job has been expired on 2026-05-31