Accela is the industry pioneer in government licensing, permitting, service request, and inspection solutions, with more than 20 years of experience. We offer cloud based Civic Applications and a robust, scalable solutions platform informed by industry best practices. In short, Accela helps governments innovate, so they can improve the business and citizen experience, promoting community development and creating an environment where citizens and businesses thrive.
At Accela, employees enjoy a culture that emphasizes performance, productivity and collaboration. You can’t help but feel empowered and motivated when you work with like-minded individuals who are passionate about contributing to a market-leading, high-growth software organization with proven technology.
In this role:
Accela provides cutting edge technology for government agencies to engage and serve their citizens. The cornerstone of our technology is the Civic Cloud Platform that allows an entire ecosystem development and business partners to create endless number of solutions to serve the government and the public. Site Reliability Engineering (SRE) is an alternative approach to the traditional split of IT Operations and Product Development teams, pioneered by Google. SRE is driven by Software Engineers running Cloud Operations. Our SRE mission is to protect and improve Accela’s Civic Platform Services - with an emphasis on availability, latency, performance, and capacity.
As with traditional Operations groups, SRE keeps important systems up and running despite all sources of disruption. As an SRE on the SRE team, you will have the opportunity to tackle the complex problems of scale and availability while using your expertise in coding, algorithms, complexity analysis and system design.
Impact you will make in the role (Responsibilities):
- Deep troubleshooting and Root Cause Analysis during Production Incidents, and driving follow-on Corrective Actions.
- Develop and implement software to improve Accela's software system availability, scalability, latency, and efficiency. Scale solutions to the business need.
- Key drive of problem solving for critical services and build automation to prevent future recurrence. Drive response automation to all non-critical service conditions to increase productivity through decreased operational load.
- Ensure customer Availability, Performance and Security through participation in a balanced on-call rotation.
- Conduct operations support including the execution of software releases, production data updates, and utilize system expertise to answer user questions around system function.
- Ensuring alignment with our infrastructure provisioning orchestration (Terraform) and Build Automation efforts (Ansible)
- Functioning as Release Captain and/or Execution Lead for deployment of releases and for migration cutovers as needed
- Design, implement, support and scale server architecture on Microsoft Azure to provide fast and reliable services to a rapidly growing customer base.
- Build out and maintain a suitable tracking dashboards and metrics.
- Play a key role in the Production Incident Management process, and the follow-on Root Cause Analysis, Postmortem, and Problem Management processes. Improve organizational maturity for these processes.
Expertise you will bring in (Skills and Qualifications):
- 8+ years software engineering and/or production systems engineering experience in a SaaS environment
- Experience working in a Microsoft environment
- Experience working in a Linux environment
- Experience with software version control
- Expertise in designing, analyzing and troubleshooting complex systems
- Familiarity with distributed systems
- Demonstrated systematic approach to problem solving
- Use of Azure (CLI/API) (Primary)
- Deep knowledge of OS systems internals and System Administration functions
- Experience with Infrastructure as Code tools – Terraform in particular
- Experience with Configuration Management tools such as Ansible, Salt, Chef, Puppet – Ansible in particular. Experience with ARM templates in Azure a plus.
- Knowledge of scripting languages such as Bash, Python, Ruby or Go
- Mastery of production monitoring tools, trending, streaming metrics and logging tools
- Deep troubleshooting and Root Cause Analysis skills
- Expert troubleshooting “full stack” during production incidents
- Ability to drive organization maturity in the Incident, Problem and Change Management processes, through leading by example
- PowerShell experience a plus
Accela is a place where everyone can grow. So however you identify and whatever background you bring with you, please apply if this is a role that would make you excited to come into work every day.
Benefits and Perks:
Beyond a stellar work environment, great people and inspiring, innovative work – we have some great benefits and perks:
- Competitive salaries
- Medical, dental and vision coverage for you and your family, along with other wellness and disability plans
- 12 paid holidays and a competitive paid time off policy
- Catered lunches, fully stocked kitchens, walking trails and nearby access to restaurants, food trucks and farmer’s markets in some of our locations
All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, national origin, protected veteran status, or on the basis of disability, gender identity, and sexual orientation.