Tape Data Recovery Copy SLA

dr tape-dr

Overview

This Service Level Agreement (SLA) is an agreement between the University of Oregon Information Services (IS) and users of the data archive service. Under this SLA, IS agrees to provide a system suitable for the storage and disaster-recovery backup of large amounts of archival data for the purposes outlined in this document and in the corresponding Memorandum of Understanding (MOU). This SLA documents agreed-upon systems and services, performance and reliability targets objectives, and escalation procedures.

Information

Definition

The Data Archive Service is defined as the system or set of systems sufficient to provide access to centralized, minimal performance storage infrastructure and associated disaster-recovery backup services capable of meeting the requirements of this SLA and of the corresponding MOU.

Purpose

The purpose of this SLA is to establish the level of service and the responsibilities of parties involved in the Archive Service project.

Scope

The scope of this document is intended to cover normal business use of the Archive Service. Normal business use is defined robust data protection that meets the following needs:

Timeline

The service will be available for use and supported for a term of no less than one year
Detailed service duration considerations are provided for in the corresponding MOU

Storage Protocol

Storage will be presented via CIFS and NFS

Data Validation

Data verification and validations are done using cyclic redundancy checks (CRC) on the network. The CRC is created on the client and checked after network transfer.
If there is a mismatch the data packets are retransmitted. More info on CRC can be found here.

Data Protection and Recovery Expectations

In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery point objective (RPO, or the maximum length of time from which data could be lost) will be 30 days.
In the event of a disaster at the University of Oregon campus causing any loss of production data, the recovery time objective (RTO, or the maximum period of time allowed to restore service) will be unlimited.
At least one full backup copy of production data will be retained at any given time after at least one full backup has been created.
All off-site data must be encrypted in transit and at rest.
At least one complete set of data must be held at a secure, off-site (outside of the city-limits of the cities of Eugene, OR or Springfield, OR) storage facility at any given time after initial, full backup has completed and has been moved fully off-site.
IS will be responsible for the management of the data sets including:
- Labeling and inventory management
- Monitoring of technology health and secure destruction of damaged or otherwise unusable technology
- Sending data sets to chosen off-site facility and requesting return of data sets
There are no speed requirements for this service
Active user workloads (i.e., database or virtual machine workload, high contiguous user numbers, etc) are not a consideration for the Archive Service. SLA non-compliance caused directly or indirectly by such workloads shall not be deemed to constitute SLA non-compliance.
IS reserves the right to utilize unused or underutilized components of the Archive Service infrastructure for uses unrelated to the Archive Service, provided such use does not cause the Archive Service to fall into non-compliance with this SLA.

Service Responsibilities

This table defines responsible parties for each Archive Service component:

Service Components	Responsible Party
End-User Support	Customer (If applicable in MOU)
Data Management	Customer (If applicable in MOU)
Storage Client Server Administration	Customer (If applicable in MOU)
Backup Administration	IS
Storage Administration	IS
Network Administration	IS
Hardware Maintenance	IS

Duties to be performed by the Responsible Party for the corresponding Service Component include:

Hardware, OS, and software installation, configuration, patching, maintenance, administration, troubleshooting, and training
License code management
Backup Permissions management
Log monitoring and analysis

Additionally, IS will perform the following duties for the systems listing IS as the responsible party:

Incident response in accordance to IS Systems on-call procedures related to disruptions of service including but not limited to disk failures, data corruption, controller failures, network failures, and other hardware or software failures
Performance monitoring of existing hardware with timely communication to Customer about storage and system performance problems and bottlenecks with suggested resolution paths
Management of Service Component capacity and utilization
- Managing and processing renewal of all support agreements covering hardware or software related to the Archive Service defined in the SLA
  - Additionally, Customer will perform the following duties for the systems listed as the responsible party:
  - Giving timely notice to IS of substantial backup need increases
IS will be in control of, and responsible for, technical design of Service Components listing IS as the Responsible Party where not otherwise specified by these documents.
- Technical design is defined as conceptual design and implementation of the Archive Service, including the following components of the Archive service:
  - Use of virtual vs physical components
  - Location of physical components
  - Support and maintenance contracts
  - Networking
  - Software
  - Hardware
- Technical design changes to such infrastructure will be preceded by notification to Customer

Service Expectations

Service Availability

24x7x365 minus scheduled maintenance windows
Service will maintain an uptime of 99% excluding maintenance window downtime
The Archive Service will not be resilient to Computing Center site failure

Technical Response Availability

8:00 a.m. - 5:00 p.m. / Monday – Friday

Emergency Response Availability

According to Systems on-call policy
Monitoring and Alerting

Storage Space

Alerts describing events and incidents relating to storage space will be sent to IS Enterprise Systems.

Backup Alerts

Alerts describing events and incidents relating to backup operations will be sent to IS Systems team.

Hardware Alerts

Alerts describing events and incidents related to the operation of the service hardware will be sent to IS Systems team

Maintenance Windows

General Maintenance

Storage
- Wednesdays, 5:00 - 7:00 a.m.
Backup infrastructure
- Thursdays, 5:00 - 7:00 a.m.
Emergency Maintenance
- Scheduled and communicated as needed to customers

Support

Tier 1 - Campus Unit Delegated Administrator

Role: Data Management Support
Support Hours: 9:00 a.m. – 5:00 p.m. / Monday - Friday
Response Time: 1 Business Day

Tier 2 - Information Services

Contact: Systems Team
- TDNext
Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday
Response Time: 1 Business Day
Emergency Escalation
- Role: System Infrastructure Support
  - Contact: Systems Hotline (541) 346-1758
- Support Hours: 8:00 a.m. - 5:00 p.m. / Monday - Friday
  - Response Time: 1 Hour
- All other times response time: Best Effort

Business Contacts

University of Oregon Information Services Systems Team

0 reviews

Report Problem Print Article

Related Services / Offerings (1)

Server and Storage Management Problem

Updating...