Search

Information Technology_USA - USA_Engineer

Real Soft, Inc.
locationJacksonville, FL, USA
PublishedPublished: 6/29/2026
Full time
Local to Naperville ONLY!

Job Summary:
We are looking for a highly experienced Senior Site Reliability Engineer (SRE) / Application Reliability Engineer with AWS knowledge over and over 10+ years of expertise in incident management, system reliability, and enterprise application support. The role focuses on ensuring high availability, operational stability, and continuous improvement of critical financial and ERP systems in a 24×7 environment.

The ideal candidate will have strong hands-on experience in monitoring, troubleshooting, root cause analysis, and supporting cloud-based and on-prem enterprise platforms.

Key Responsibilities:
Reliability Engineering & Operations
Ensure high availability and reliability of enterprise applications in a 24×7 production environment.
Monitor applications, batch jobs, and workflows to maintain operational continuity.
Incident, Problem & Change Management
Lead and manage major incidents (P1/P2) and drive resolution to minimize business impact.
Perform root cause analysis (RCA) and implement preventive measures.
Ensure adherence to SLA/SLO and ITIL-based incident, problem, and change management processes.
Monitoring & Observability
Design and maintain monitoring dashboards.
Implement proactive alerting and improve system observability.
Troubleshooting & Support
Diagnose and resolve application and data-related issues using SQL queries and log analysis.
Provide backend validation and technical support across distributed environments.
Release & Deployment Support
Support release deployments, change validation, and post-deployment activities.
Participate in disaster recovery testing and release readiness validation.
Collaboration & Documentation
Collaborate with infrastructure, DBA, and development teams to resolve technical issues.
Create and maintain operational documentation, runbooks, and knowledge base articles.

Required Skills & Qualifications:
Core Skills
Site Reliability Engineering (SRE) and Application Support
Incident & Problem Management
Root Cause Analysis (RCA)
SLA / SLO Compliance
Batch Monitoring & Scheduling
ITIL Framework

Technical Skills
CI/CD Tools: GitHub
Cloud Platforms: AWS (EC2, S3, VPC)
Databases: Oracle, SQL Server
Languages: SQL, SQR, Basic Java
Ticketing Tools: ServiceNow, Jira
Operating Systems: UNIX, Linux, Windows

Experience:
10+ years of experience in Application Support / Reliability Engineering roles.
Strong experience in BFSI or enterprise application environments.
Proven track record in managing production support operations and high-severity incidents., Project Code :