Tape Data Recovery Copy SLA

Tags dr tape-dr

Overview

This Service Level Agreement (SLA) is an agreement between the University of Oregon Information Services (IS) and users of the data archive service. Under this SLA, IS agrees to provide a system suitable for the storage and disaster-recovery backup of large amounts of archival data for the purposes outlined in this document and in the corresponding Memorandum of Understanding (MOU). This SLA documents agreed-upon systems and services, performance and reliability targets objectives, and escalation procedures.

Information

Definition

The Data Archive Service is defined as the system or set of systems sufficient to provide access to centralized, minimal performance storage infrastructure and associated disaster-recovery backup services capable of meeting the requirements of this SLA and of the corresponding MOU.  

Purpose

The purpose of this SLA is to establish the level of service and the responsibilities of parties involved in the Archive Service project.

Scope

The scope of this document is intended to cover normal business use of the Archive Service. Normal business use is defined robust data protection that meets the following needs: 

Timeline

  • The service will be available for use and supported for a term of no less than one year
  • Detailed service duration considerations are provided for in the corresponding MOU

Storage Protocol

  • Storage will be presented via CIFS and NFS

Data Validation

  • Data verification and validations are done using cyclic redundancy checks (CRC) on the network. The CRC is created on the client and checked after network transfer.
  • If there is a mismatch the data packets are retransmitted. More info on CRC can be found here.

Data Protection and Recovery Expectations

  • In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery point objective (RPO, or the maximum length of time from which data could be lost) will be 30 days. 
  • In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery time objective (RTO, or the maximum period of time allowed to restore service) will be unlimited. 
  • At least one full backup copy of production data will be retained at any given time after at least one full backup has been created.  
  • All off-site data must be encrypted in transit and at rest. 
  • At least one complete set of data must be held at a secure, off-site (outside of the city-limits of the cities of Eugene, OR or Springfield, OR) storage facility at any given time after initial, full backup has completed and has been moved fully off-site. 
  • IS will be responsible for the management of the data sets including: 
    • Labeling and inventory management 
    • Monitoring of technology health and secure destruction of damaged or otherwise unusable technology 
    • Sending data sets to chosen off-site facility and requesting return of data sets 
  • There are no speed requirements for this service  
  • Active user workloads (i.e., database or virtual machine workload, high contiguous user numbers, etc) are not a consideration for the Archive Service. SLA non-compliance caused directly or indirectly by such workloads shall not be deemed to constitute SLA non-compliance. 
  • IS reserves the right to utilize unused or underutilized components of the Archive Service infrastructure for uses unrelated to the Archive Service, provided such use does not cause the Archive Service to fall into non-compliance with this SLA.  

Service Responsibilities

This table defines responsible parties for each Archive Service component: 

Service Components Responsible Party
End-User Support Customer (If applicable in MOU)
Data Management Customer (If applicable in MOU)
Storage Client Server Administration Customer (If applicable in MOU)
Backup Administration IS
Storage Administration IS
Network Administration IS
Hardware Maintenance IS

Duties to be performed by the Responsible Party for the corresponding Service Component include:

  • Hardware, OS, and software installation, configuration, patching, maintenance, administration, troubleshooting, and training
  • License code management
  • Backup Permissions management
  • Log monitoring and analysis 

Additionally, IS will perform the following duties for the systems listing IS as the responsible party: 

  • Incident response in accordance to IS Systems on-call procedures related to disruptions of service including but not limited to disk failures, data corruption, controller failures, network failures, and other hardware or software failures 
  • Performance monitoring of existing hardware with timely communication to Customer about storage and system performance problems and bottlenecks with suggested resolution paths 
  • Management of Service Component capacity and utilization 
    • Managing and processing renewal of all support agreements covering hardware or software related to the Archive Service defined in the SLA 
      • Additionally, Customer will perform the following duties for the systems listed as the responsible party:
      • Giving timely notice to IS of substantial backup need increases
  • IS will be in control of, and responsible for, technical design of Service Components listing IS as the Responsible Party where not otherwise specified by these documents. 
    • Technical design is defined as conceptual design and implementation of the Archive Service, including the following components of the Archive service:  
      • Use of virtual vs physical components
      • Location of physical components
      • Support and maintenance contracts
      • Networking
      • Software
      • Hardware
    • Technical design changes to such infrastructure will be preceded by notification to Customer 

Service Expectations

Service Availability

  • 24x7x365 minus scheduled maintenance windows 
  • Service will maintain an uptime of 99% excluding maintenance window downtime 
  • The Archive Service will not be resilient to Computing Center site failure

Technical Response Availability 

  • 8:00 a.m. - 5:00 p.m. / Monday – Friday 

Emergency Response Availability 

  • According to Systems on-call policy 
  • Monitoring and Alerting 

Storage Space 

  • Alerts describing events and incidents relating to storage space will be sent to IS Enterprise Systems. 

Backup Alerts 

  • Alerts describing events and incidents relating to backup operations will be sent to IS Systems team. 

Hardware Alerts 

  • Alerts describing events and incidents related to the operation of the service hardware will be sent to IS Systems team

Maintenance Windows

General Maintenance 

  • Storage 
    • Wednesdays, 5:00 - 7:00 a.m. 
  • Backup infrastructure 
    • Thursdays, 5:00 - 7:00 a.m. 
  • Emergency Maintenance 
    • Scheduled and communicated as needed to customers

Support

Tier 1 - Campus Unit Delegated Administrator 
  • Role: Data Management Support 
  • Support Hours: 9:00 a.m. – 5:00 p.m. / Monday - Friday 
  • Response Time: 1 Business Day 
Tier 2 - Information Services 
  • Contact: Systems Team 
    • TDNext 
  • Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday 
  • Response Time: 1 Business Day 
  • Emergency Escalation 
    • Role: System Infrastructure Support 
      • Contact: Systems Hotline (541) 346-1758  
    • Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday 
      • Response Time: 1 Hour 
    • All other times response time: Best Effort

Business Contacts

University of Oregon Information Services Systems Team

Details

Article ID: 70372
Created
Thu 1/17/19 12:23 PM
Modified
Wed 5/3/23 1:43 PM