CT320 Storage

CT 320: Network and System Administration

Colorado State University

Computer Science Department

Original slides from Dr. James Walden at Northern Kentucky University.

Topics

Disk interfaces
Disk components
Performance
Reliability
RAID
Adding a disk
Logical volumes
Filesystems

Disk Interfaces

SCSI
- Standard interface for servers
IDE, EIDE
- Historical interface for PCs
SATA
- Serial ATA standard on PCs
Fibre Channel
- High bandwidth, SCSI or ATA
USB
- Fast enough for slow devices on PCs

SCSI

Small Computer Systems Interface
- Pronunskiation
- Fast, reliable, expensive
A bus, not a simple PC to device interface
- Each device has a target # ranging 0–7 or 0–15.
- Devices can communicate directly without CPU
Many versions
- Original: SCSI-1 (1979) — 5MB/s
- Current: SCSI-3 (2003) — 640MB/s
Serial Attached SCSI (SAS)
- Up to 128 devices
- Up to 750 MB/s full duplex

IDE

Integrated Drive Electronics / AT attachment
- Slower, less reliable, cheap
- Only allows 2 devices per interface
- ATAPI standard added removable devices
Many versions
- Original: IDE / ATA (1984) — 16.7 MB/s
- Current: Ultra-ATA/167 (2010) — 167MB/s
Serial ATA
- Up to 128 devices
- Original: SATA Revision 1.0 — 150 MB/s
- Current: SATA Revision 3.0 — 600 MB/s

SATA vs. SCSI

SCSI offers better performance/scale
- Faster bus
- Faster hard drives (up to 15,000rpm)
- Lower CPU usage
- Better handling of multiple requests
SATA often best for workstations
Convergence
- SATA2 and SAS converging on a single standard

Hard Drive Components

Actuator
- Moves arm across disk to read/write data.
- Arm has multiple read/write heads (often 2/platter.)
Spindle Motor
- Spins platters ~7200 rpm
- Speed determines disk latency

Hard Drive Components

Platters
- Rigid substrate material
- Thin magnetic material coating stores data
- Coating type determines density: 1.34 Tbit/in² in 2015
Cache
- hard disk: 8–256MB cache, SSD: 4GB cache
- Reliability: write-through vs. write-back
  - Write-through: write to both cache & disk
  - Write-back (alias write-behind): write to cache, posting to disk later, as needed

Disk Information: hdparm

    # hdparm -i /dev/sda1

    /dev/sda1:

     Model=Hitachi HTS543216L9A300, FwRev=FB2OC40C, SerialNo=081107FB2232LCHTGKLA
     Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
     RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
     BuffType=DualPortCache, BuffSize=7114kB, MaxMultSect=16, MultSect=16
     CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=312581808
     IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
     PIO modes:  pio0 pio1 pio2 pio3 pio4
     DMA modes:  mdma0 mdma1 mdma2
     UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
     AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
     Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

     * signifies the current active mode

Disk Performance

Seek Time
- Time to move head to desired track (~9 ms)
Rotational Delay
- Time until head over desired block (8ms for 7200 rpm)
Latency
- Seek Time + Rotational Delay
Throughput
- Data transfer rate (~1Gb/s)
- Affected by both rotational speed & data density

Latency vs. Throughput

Which is more important?
- Depends on the type of application
Sequential access (Throughput)
- Multimedia on a single workstation
Random access (Latency)
- Content on most web servers
How to improve performance
- Faster disks
- Disk Caching
- More spindles (disks)
- More disk controllers

Disk Performance: hdparm

    # hdparm -tT /dev/sda1

    /dev/sda1:
     Timing cached reads:   1256 MB in  2.00 seconds = 627.87 MB/sec
     Timing buffered disk reads: 172 MB in  3.02 seconds =  57.00 MB/sec

Reliability

MTBF
- Mean Time Between Failure (>100,000 hours)
Real failure curves
- Early phase: high failure rate from defects
- Constant failure rate phase: MTBF valid
- Wearout phase: high failure rate from wear
Failures more likely on traumatic events
- Power on/off
Systems often wear out before MTBF
- However, disk drives still crash!

RAID

Redundant Array of Inexpensive Disks
Redundant Array of Independent Disks
Can be implemented in hardware or software.
Hardware RAID controllers:
- Supports caching
- Higher capacity
- Higher reliability
- Better throughput

RAID Levels

RAID 0: Striped evenly for performance
- MTBF = (average MTBF)/# disks
JBOD: Concatenated for capacity
- Or does it mean no RAIDing at all?
- Only data on bad disk is lost, no performance effect
RAID 1: Mirrored for reliability
- Every write goes to each disk of set
- Seek time halved as reads split between disks
RAID 0 + 1: Striped + mirrored
RAID 5: Striped with parity
- Block striping, not disk striping
- Can lose one disk of set without losing data.

Redundancy

Adding a Disk

Install new hardware
- Verify disk recognized by BIOS.
Find the device name:
- cat /proc/partitions
- df -h
Partition: fdisk /dev/sdb or gparted /dev/sdb
Create filesystem: mkfs -v -t ext4 /dev/sdb1
Add to /etc/fstab: /dev/sdb1 /proj ext3 defaults 0 2
Mount all disks: mount -a

When don’t you need a filesystem?

Swap space

    mkswap -v /dev/sdb1

Server applications (use their own)
- Oracle (OCFS)
- VMWare Server (VMFS)

Logical Volumes

What are logical volumes?
- Appear to user as a physical volume.
- But can span multiple partitions and/or disks.
Why logical volumes?
- Aggregate disks for performance/reliability.
- Grow and shrink logical volumes on the fly.
- Move logical volumes between physical devices.
- Replace volumes without interrupting service.

LVM

Logical Volume Manager

LVM Components

Logical Volume Group (LVG)
- Set of physical volumes (partitions or disks.)
- May be divided into logical volumes (LVs.)
LVs made up of fixed sized logical extents
- Each LE is 4MB.
- Physical extents are the same size.

Mapping Modes

Linear Mapping
- LVs assigned to continguous areas of PV space.
Striped Mapping
- LEs interleaved across PVs to improve performance.

Setting up an LVG and LV

Initialize physical volumes

    # pvcreate /dev/hda1
    # pvcreate /dev/hdb1

Initialize a volume group

    vgcreate nku_proj /dev/hda1 /dev/hdb1

Use vgextend to add more PVs later.

Create logical volumes

    lvcreate -n nku1 --size 100G nku_proj1

Create filesystem

    mkfs -v -t ext3 /dev/nku_proj/nku1

Extending a LV

Set absolute size

    lvextend -L120G /dev/nku_proj/nku1

Or set relative size

    lvextend -L+20G /dev/nku_proj/nku1

Expand the filesystem without unmounting

    ext2online -v /dev/nku_proj/nku1

Check size

    df -h

Swap

Can use swapfile instead of swap partition

    dd if=/dev/zero of=/swapfile bs=1024k count=512
    mkswap /swapfile

Enable swap

    swapon /swapfile
    swapon /dev/sda2

Disable swap

    swapoff /swapfile
    swapoff /dev/sda2

Check swap resource usage

    cat /proc/swaps

Filesystems

ext2
- Old Linux non-fragmenting fast filesystem
- Can be converted to ext3 by adding a journal:

    tune2fs -j /dev/sda1

ext3
- Journaling “eliminates” need for fsck
ext4
- Current common Linux filesystem
- Big files (16TB)
- Extents (range of contiguous physical blocks)
- 34-bit seconds + nanosecond timestamps
  - 2038 ≫ y2k

2038

$ perl -wle 'print 0x7fffffff'
2147483647
$ date -d 'january 1, 1970 + 2147483647 seconds'
Tue Jan 19 03:14:07 MST 2038

You probably don’t remember January 1st, 2000, but you sure will remember January 19, 2038.

Other Filesystems

tmpfs, ramfs: all in memory
vfat, ntfs: Windows
exFAT: flash drive (spreads out the work)
hfs: Mac OS
procfs: /proc
cramfs, squashfs: Read-only compressed file systems
ISO9660: CD-ROM & DVD-ROM disks
UDF: CD-RW & DVD-RW

Mounting

To use a filesystem

    # mount /dev/sda1 /mnt
    # df -h /mnt

Automatic mounting
- Add entry to /etc/fstab
Unmount
- umount /dev/sda1
- Cannot unmount a volume in use.

fstab

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>

UUID=77f85028-b4c1-4439-be1c-5a3ba7f59dd1  /       ext3  defaults 0 1
LABEL=windows                              /win    vfat  user,rw  0 0
/dev/hdc8                                  /home   ext3  defaults 0 2
/dev/hdc7                                  none    swap  sw       0 0
proc                                       /proc   proc  defaults 0 0

/etc/fstab first field

The first field of an /etc/fstab can be:

An actual device, e.g., /dev/sda1
Fails if you add another disk and the order changes.
UUID=whatever
Robust, but cryptic. Obtain via blkid.
LABEL=whatever
Robust and self-documenting. Obtain via blkid; set via mkfs.type, e2label, fatlabel, etc.
Ignored for pseudo-filesystems such as proc or tmpfs.

fsck: check + repair fs

Filesystem corruption sources
- Power failure
- System crash
Types of corruption
- Unreferenced inodes.
- Bad superblocks.
- Unused data blocks not recorded in block maps.
- Data blocks listed as free that are used in files.
fsck can fix these and more
- Asks user to make more complex decisions.
- Stores unfixable files in lost+found
  - And where is this lost+found, precisely?

Lots of filesystem flavors

$ cd /sbin

$ ls -F mkfs.*
mkfs.cramfs*  mkfs.ext3*  mkfs.fat*    mkfs.msdos@  mkfs.vfat@
mkfs.ext2*    mkfs.ext4*  mkfs.minix*  mkfs.ntfs@   mkfs.xfs*

$ ls -F fsck.*
fsck.cramfs*  fsck.ext3*  fsck.fat*    fsck.msdos@  fsck.vfat@
fsck.ext2*    fsck.ext4*  fsck.minix*  fsck.ntfs@   fsck.xfs*

$ ls -F mount.*
mount.cifs*   mount.glusterfs*	 mount.nfs4@	 mount.ntfs-fuse@
mount.fuse*   mount.lowntfs-3g@  mount.ntfs@	 mount.smb3@
mount.fuse3*  mount.nfs*	 mount.ntfs-3g@

References

Aeleen Frisch, Essential System Administration, 3rd edition, O’Reilly, 2002.
Charles M. Kozierok, “Reference Guide—Hard Disk Drives,” http://www.pcguide.com/ref/hdd/, 2005.
A.J. Lewis, LVM HOWTO, https://www.tldp.org/HOWTO/LVM-HOWTO/index.html, 2005.
H. Mauelson and M. O’Keefe, “The Linux Logical Volume Manager,” Red Hat Magazine, https://www.redhat.com/magazine/009jul05/features/lvm2/, July 2005.
Octane, “SCSI Technology Primer,” http://arstechnica.com/paedia/s/scsi-1.html, 2002.
RedHat, RHEL4 System Administration Guide, https://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/sysadmin-guide/, 2005.

CT320: Network and System Administration

Fall 2019

Storage

CT320 Storage

Topics

Disk Interfaces

SCSI

IDE

SATA vs. SCSI

Hard Drive Components

Hard Drive Components

Disk Information: hdparm

Disk Performance

Latency vs. Throughput

Disk Performance: hdparm

Reliability

RAID

RAID Levels

Redundancy

Adding a Disk

When don’t you need a filesystem?

Logical Volumes

LVM

LVM Components

Mapping Modes

Setting up an LVG and LV

Extending a LV

Swap

Filesystems

2038

Other Filesystems

Mounting

fstab

/etc/fstab first field

fsck: check + repair fs

Lots of filesystem flavors

References