Overview
This Service Level Agreement (SLA) is an agreement between the University of Oregon Information Services (IS) and users of the data archive service. Under this SLA, IS agrees to provide a system suitable for the storage and disaster-recovery backup of large amounts of archival data for the purposes outlined in this document and in the corresponding Memorandum of Understanding (MOU). This SLA documents agreed-upon systems and services, performance and reliability targets objectives, and escalation procedures.
Information
Definition
The Data Archive Service is defined as the system or set of systems sufficient to provide access to centralized, minimal performance storage infrastructure and associated disaster-recovery backup services capable of meeting the requirements of this SLA and of the corresponding MOU.
Purpose
The purpose of this SLA is to establish the level of service and the responsibilities of parties involved in the Archive Service project.
Scope
The scope of this document is intended to cover normal business use of the Archive Service. Normal business use is defined robust data protection that meets the following needs:
Timeline
- The service will be available for use and supported for a term of no less than one year
- Detailed service duration considerations are provided for in the corresponding MOU
Storage Protocol
- Storage will be presented via CIFS and NFS
Data Validation
- Data verification and validations are done using cyclic redundancy checks (CRC) on the network. The CRC is created on the client and checked after network transfer.
- If there is a mismatch the data packets are retransmitted. More info on CRC can be found here.
Data Protection and Recovery Expectations
- In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery point objective (RPO, or the maximum length of time from which data could be lost) will be 30 days.
- In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery time objective (RTO, or the maximum period of time allowed to restore service) will be unlimited.
- At least one full backup copy of production data will be retained at any given time after at least one full backup has been created.
- All off-site data must be encrypted in transit and at rest.
- At least one complete set of data must be held at a secure, off-site (outside of the city-limits of the cities of Eugene, OR or Springfield, OR) storage facility at any given time after initial, full backup has completed and has been moved fully off-site.
- IS will be responsible for the management of the data sets including:
- Labeling and inventory management
- Monitoring of technology health and secure destruction of damaged or otherwise unusable technology
- Sending data sets to chosen off-site facility and requesting return of data sets
- There are no speed requirements for this service
- Active user workloads (i.e., database or virtual machine workload, high contiguous user numbers, etc) are not a consideration for the Archive Service. SLA non-compliance caused directly or indirectly by such workloads shall not be deemed to constitute SLA non-compliance.
- IS reserves the right to utilize unused or underutilized components of the Archive Service infrastructure for uses unrelated to the Archive Service, provided such use does not cause the Archive Service to fall into non-compliance with this SLA.
Service Responsibilities
This table defines responsible parties for each Archive Service component:
Service Components |
Responsible Party |
End-User Support |
Customer (If applicable in MOU) |
Data Management |
Customer (If applicable in MOU) |
Storage Client Server Administration |
Customer (If applicable in MOU) |
Backup Administration |
IS |
Storage Administration |
IS |
Network Administration |
IS |
Hardware Maintenance |
IS |
Duties to be performed by the Responsible Party for the corresponding Service Component include:
- Hardware, OS, and software installation, configuration, patching, maintenance, administration, troubleshooting, and training
- License code management
- Backup Permissions management
- Log monitoring and analysis
Additionally, IS will perform the following duties for the systems listing IS as the responsible party:
- Incident response in accordance to IS Systems on-call procedures related to disruptions of service including but not limited to disk failures, data corruption, controller failures, network failures, and other hardware or software failures
- Performance monitoring of existing hardware with timely communication to Customer about storage and system performance problems and bottlenecks with suggested resolution paths
- Management of Service Component capacity and utilization
- Managing and processing renewal of all support agreements covering hardware or software related to the Archive Service defined in the SLA
- Additionally, Customer will perform the following duties for the systems listed as the responsible party:
- Giving timely notice to IS of substantial backup need increases
- IS will be in control of, and responsible for, technical design of Service Components listing IS as the Responsible Party where not otherwise specified by these documents.
- Technical design is defined as conceptual design and implementation of the Archive Service, including the following components of the Archive service:
- Use of virtual vs physical components
- Location of physical components
- Support and maintenance contracts
- Networking
- Software
- Hardware
- Technical design changes to such infrastructure will be preceded by notification to Customer
Service Expectations
Service Availability
- 24x7x365 minus scheduled maintenance windows
- Service will maintain an uptime of 99% excluding maintenance window downtime
- The Archive Service will not be resilient to Computing Center site failure
Technical Response Availability
- 8:00 a.m. - 5:00 p.m. / Monday – Friday
Emergency Response Availability
- According to Systems on-call policy
- Monitoring and Alerting
Storage Space
- Alerts describing events and incidents relating to storage space will be sent to IS Enterprise Systems.
Backup Alerts
- Alerts describing events and incidents relating to backup operations will be sent to IS Systems team.
Hardware Alerts
- Alerts describing events and incidents related to the operation of the service hardware will be sent to IS Systems team
Maintenance Windows
General Maintenance
- Storage
- Wednesdays, 5:00 - 7:00 a.m.
- Backup infrastructure
- Thursdays, 5:00 - 7:00 a.m.
- Emergency Maintenance
- Scheduled and communicated as needed to customers
Support
Tier 1 - Campus Unit Delegated Administrator
- Role: Data Management Support
- Support Hours: 9:00 a.m. – 5:00 p.m. / Monday - Friday
- Response Time: 1 Business Day
Tier 2 - Information Services
- Contact: Systems Team
- Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday
- Response Time: 1 Business Day
- Emergency Escalation
- Role: System Infrastructure Support
- Contact: Systems Hotline (541) 346-1758
- Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday
- All other times response time: Best Effort
Business Contacts
University of Oregon Information Services Systems Team