🌎
This job posting isn't available in all website languages

Senior Analyst, Site Reliability Engineering

📁
Technology
📅
2400024V Requisition #

Saks Cloud Services is looking for a Senior Analyst to join the Site Reliability Engineering (SRE) team.  The ideal candidate for this role would be someone who is outgoing, obsessed with customer service and has strong analytical and communication skills. This candidate should also strive for continuous improvement, be enthusiastic about new ideas and enjoy opportunities to “think outside the box”.

 

Position: Sr. Analyst SRE



What this position is all about

The successful candidate will primarily identify and analyze technical problems in systems and applications across all supported divisions. Work closely with cross-functional IT Teams to troubleshoot and resolve application-related issues. Play a key role in implementing new solutions that improve the efficiency and effectiveness of the team and organization. The ideal candidate for this role should have a strong technical background and communicate effectively with technical and non-technical stakeholders.

 

Role description:

       5+ years of experience working within DevOps or SRE teams.

       3+ years of experience with any Cloud platforms (preferably AWS, Azure) 

       Ability to program (structured and OO) with one or more high-level languages, such as JavaScript, Java,Python and bash.

       Participate in on-call rotations (PagerDuty/Opsgenie) and respond to incidents outside of regular hours.

       Run the production environment by monitoring availability and taking a holistic view of system health

       Part of building and implementing services to make IT and support better at their jobs.

       Improve reliability, quality, and time-to-market of our suite of software solutions

       Measure and optimize system performance, to push our capabilities forward, get ahead of customer needs, and innovate to improve continually

       Validate the NFR/SLx with production logs or business analytics.

       Conduct proof-of-concepts to showcase the benefit of the recommendation.

       Instrument the target environment to capture relevant monitoring metrics for analysis.

       Contribute to grooming SRE in core concepts and build a knowledge repository by adding point-of-view documents and blogs.

       Document the engineering strategy and analysis reports.

       Document every action so your findings turn into repeatable actions–and then into automation.

       Hands-on experience with Distributed Version Control Systems such as GIT, AWS Code Commit or equivalent.

       Must have experience with Docker, Kubernetes, Terraform, and Ansible.

       Know your way around Linux and the Unix Shell.

       Experience or familiarity with ELK stack

       Balance feature development speed and reliability with well-defined service level objectives

       Monitor systems and telemetry of Salesforce Commerce Cloud and Salesforce Service Cloud for operational health in terms of site stability, reliability, and performance.

       Prioritize and develop automated administrative and operational tasks to continuously improve site stability, capacity, reliability, and performance.

       Provide active incident response support, investigate major problems, and ensure the timely and effective return to normal operations of the Digital Commerce and CRM platforms during major incidents.

       Provide periodic on-call support based on established 24/7/365 support schedules.

       Collaborate with Digital Development, and QA teams to ensure that Production environments are deployment-ready by Change Management processes and the Digital release schedules.

       Support Development teams in the provision and configuration of lower environments including CICD pipeline support

       Support incident management and problem management efforts with root cause analysis to effectively identify and resolve issues related to platform reliability, stability, and performance through the careful analysis of telemetry data and system logs.

       Collaborate with Engineering and Project teams to perform production readiness assessments and ensure that proper controls and processes are in place.

       Support / execute production change management requests on behalf of the Digital Engineering teams.

       Evaluate and propose tools and techniques to improve operational activities.

       Support Development teams in the provision and configuration of lower environments.

Key Qualifications:

       5+ years of related work experience, preferably in SRE or DevOps-related fields.

       Understand customer business processes & transactions

       Understand application architecture/design, analyze non-functional requirements, SLI/SLO

       Independently troubleshoot performance, scalability, capacity, resilience & reliability issues & correlate to application code & configurations.

       Involve in code, design and Architecture reviews and ensure meeting application reliability goals

       Strong troubleshooting, analytical, and problem-solving skills

       Strong verbal and written communication skills.

       Experience in the administration and support of Digital Retail Platforms, e.g. Salesforce CC, Shopify, Magento, IBM WebSphere Commerce, etc. is an asset.

       Experience with monitoring, logging & telemetry tools like New Relic, Mpulse, Splunk, Nagios, SolarWinds, Prometheus, AWS Cloudwatch, Datadog, etc.

       Experience with cloud infrastructure administration (i.e., AWS, GCP, Azure)

       Basic understanding of Networking, Content Delivery Networks (CDN, e.g. Akamai, Cloudflare), and Saas solutions

       Hands-on experience with scripting languages and in maintaining Automation frameworks (PowerShell, Python, Ruby, AWK, SED, Shell, etc.) to run health checks and self-healing capabilities for the platforms.

       Experience with automation and tools such as (but not limited to) GitHub Actions, Chef, Terraform, Ansible, etc.

       Experience with Web/development technologies (i.e., JavaScript, Node.js, React, HTML, XML, CSS, REST)

       Experience with ticketing and collaboration tools (i.e., JSM, Jira Work Management, ServiceNow)

       3+ years of SRE experience working on telemetry, observation, self-healing solutions, and platform automation



Your Life and Career at Saks Cloud Services

       Be part of a world-class team; work adventurously; think and act like an owner-operator!

       Exposure to rewarding career advancement opportunities, from retail to supply chain, to digital or corporate.

       A culture that promotes a healthy, fulfilling work/life balance.

       Benefits package for all eligible full-time employees (including medical, vision, and dental).

       amazing employee discount

Previous Job Searches

Activity Feed

20625
Job shares through Hudson's Bay Company
Someone applied to the Beauty Specialist - Armani - Saks Fifth Avenue position. 1 day ago
Someone applied to the Beauty Specialist - Men's Fragrance - Saks Fifth Avenue position. 1 day ago
Someone applied to the Asset Protection Associate | Prince George position. 1 day ago
Someone applied to the Associate, Expense Accounts Payable position. 1 day ago
Someone applied to the Alterations Associate - Saks Fifth Avenue position. 2 days ago

Similar Listings

India, KARNATAKA, BANGALORE, I001 - BANGALORE OFFICE

📁 Technology

Requisition #: 2400024R

India, KARNATAKA, BANGALORE, I001 - BANGALORE OFFICE

📁 Technology

Requisition #: 24000250

India, KARNATAKA, BANGALORE, I001 - BANGALORE OFFICE

📁 Technology

Requisition #: 2400024Z