The Software Engineering community are responsible for creating world class software solutions to provide our customers with the best possible retail experience.
Our Engineers work within a clear framework of accountability, ensuring substantial personal responsibility and promoting autonomy.
Our platform strategy delivers cloud based infrastructure across all our digital touchpoints supporting an in-house written platform.
The role of Operations Engineering Manager to manage the team of Site Reliability and DevOps Engineers to maintain the highest standards of operation, uptime and performance for the Arcadia Digital Platforms whilst at the same time embedding the practices of automation, speed and performance into our software engineering teams and delivery processes.
KEY TASKS & RESPONSIBILITIES
The Operations Engineering Squad’s primary tasks are divided into three areas:
• Site Reliability Engineering – Flawless customer experience
o Own and operate the cloud infrastructure for our in-house built e-commerce platform, exploiting real-time telemetry to prevent operational issues leading to poor customer experiences.
o Creating feedback loops to continuously eradicate errors from the platform
o Own and operate platform alerting, 3rd line incident response, post mortems.
o Capacity planning and performance improvements
o Designing and testing for failures to ensure the platform is resilient
o Ensuring the customer sensitive and payment data is safe and secured as it transits the platform
• Cloud Infrastructure Support – Operational Excellence
o Support and maintain our wider digital services cloud infrastructure across in-house and 3rd party applications such as Customer Care, Order Management, CDN and CMS.
o Drive an automation first culture seeking to optimise and accelerate at every opportunity.
o Ensure the highest levels of uptime, performance and security are continuously maintained
• DevOps – Focus Delivery Speed
o Own and operate the delivery pipeline, enabling rapid delivery by continuously pushing forward with our CI/CD transformation.
o Utilising release automation to assist the software engineers their eco-system.
o Owning environment creation and configuration wherever possible utilising infrastructure as code.
The Operations Engineering Manager supports this by:
• Implementing the Operations Engineering strategy for the Customer Domain, constructing and prioritising non functional requirements and tasks.
• Create/improve standards, build and deploy processes and contribute to a healthy SDLC.
• Ensuring technical solution designs are secure/scalable/maintainable/supportable.
• Working in an agile, cross functional team taking responsibility for the squad deliverables and quality.
• Resolving and moving blockers, brokering conversations with other squads/QAs to progress tasks.
• Line managing the Operations Engineering Team, measuring performance, defining platform and team KPIs and ensuring continual improvement
• Having the ability to be hands on and able to contribute to the technical debate.
• Coaching and mentor the wider engineering team to help drive a DevOps culture.
• Maintaining a keen eye on new technologies/innovation in the industry and leverage these to benefit the organization.
• Demonstrable experience of leading DevOps and Site Reliability teams to deliver strong technical and commercial results.
• Experience of driving change within an organisation, pushing through resistance and success in adopting new ways of working
• Successful experience of implementing continuous integration/delivery,
• Attention to detail to ensure code management, code workflow, security and performance analysis standards are adhered to
• Excellent experience of building a practice that adopts a data driven continuous improvement, taking metrics, analysing data and building technical pipelines of improvement tasks
• Strong leadership and communication skills, chairing stand-ups with both internal and external teams, leading squad members to deliver, seeking commercial approvals to implement strategy
• Highly creative, enthusiastic, conscientious, detail oriented self-starter
• You understand that prevention is better than cure
• Keep up to date on in the industry, continuously improve on and blog about areas of technology interest.
This is a leadership role that requires strong relationships with many areas of the engineering community. Solid technical grounding and experience in the following is therefore required
• Cloud compute – AWS Elastic Beanstalk, EC2, ElastiCache Redis, DynamoDB
• Code and Containers – Github, DockerHub
• Logging – NewRelic, ELK stack, Cloudwatch
• Networks – Akamai CDN and WAF, Cloudfront, Route 53
• Automation and configuration – Jenkins, Terraform, Ansible
• Some exposure to a broad and diverse range of web technologies such as NodeJS, ReactJS.