Smartmontools

From The World according to Vissie
Jump to navigation Jump to search

Lets check that your drive is not reporting issues

Try using smartmontools, specifically smartctl --scan. This might show you how to talk to the drives.

sudo apt-get install smartmontools
smartctl --scan 

gives me a good idea of what to try to get SMART data from my Toshiba drive. So, I try what it gives me, along with the -a flag of course. Lo and behold..

To ensure the hard disk supports SMART and is enabled, use the following command (in this example for the hard disk /dev/sdc):

sudo smartctl -i /dev/sdc

Example Output:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.0-39-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4 Serial ATA
Device Model:     WDC WD5003ABYX-01WERA1
Serial Number:    WD-WMAYP5453158
LU WWN Device Id: 5 0014ee 00385d526
Firmware Version: 01.01S02
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Sep  2 14:06:57 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The last two lines are the most important as these indicate whether SMART support is available and enabled.

And I managed to get all my S.M.A.R.T. for all my USB drives too

sudo smartctl /dev/sdg -d sat -a

Nice guide on scheduling tests

https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

Sample scripts:

https://forums.freenas.org/index.php?threads/set-up-smart-reporting-via-email.6211/
https://forums.freenas.org/index.php?threads/scripts-to-report-smart-zpool-and-ups-status-hdd-cpu-t%C2%B0-hdd-identification-and-backup-the-config.27365/ 

When to react on errors

basically if any of the values of

187 Reported_Uncorrect
197 Current_Pending_Sector
198 Offline_Uncorrectable 

become higher than 0 i would not use that drive again.


My scrips

I found some wonderfull script to use here:

https://forums.freenas.org/index.php?threads/scripts-to-report-smart-zpool-and-ups-status-hdd-cpu-t%C2%B0-hdd-identification-and-backup-the-config.27365/

I had to edit them a bit to suit my needs. Here is my script:

vim: /somewhere/smart_report.sh
#!/bin/sh
 
### Parameters ###
logfile="/tmp/smart_report.tmp"
email="U@gmail.com"
subject="SMART Status Report for Vissie"
drives="sda sdb sdc sdd sde sdf sdg"
#drives="sdc"
tempWarn=40
tempCrit=45
sectorsCrit=10
testAgeWarn=1
warnSymbol="?"
critSymbol="!"
 
### Set email headers ###
(
    echo "To: ${email}"
    echo "Subject: ${subject}"
    echo "Content-Type: text/html"
    echo "MIME-Version: 1.0"
    echo -e "\r\n"
) > "$logfile"
 
### Set email body ###
echo "<pre style=\"font-size:14px\">" >> "$logfile"
 
###### summary ######
(
    echo ""
    echo "########## SMART status report summary for all drives ##########"
    echo ""
    echo "+------+-----------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"
    echo "|Device|Serial           |Temp|Power|Start|Spin |ReAlloc|Current|Offline |UDMA  |Seek  |High  |Command|Last|"
    echo "|      |                 |    |On   |Stop |Retry|Sectors|Pending|Uncorrec|CRC   |Errors|Fly   |Timeout|Test|"
    echo "|      |                 |    |Hours|Count|Count|       |Sectors|Sectors |Errors|      |Writes|Count  |Age |"
    echo "+------+-----------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"
) >> "$logfile"
for drive in $drives
do
    (
        smartctl -A -i -v 7,hex48 /dev/"$drive" -d sat| \
        awk -v device="$drive" -v tempWarn="$tempWarn" -v tempCrit="$tempCrit" -v sectorsCrit="$sectorsCrit" \
        -v testAgeWarn="$testAgeWarn" -v warnSymbol="$warnSymbol" -v critSymbol="$critSymbol" \
        -v lastTestHours="$(smartctl -l selftest /dev/"$drive" | grep "# 1" | awk '{print $9}')" '\
       /Serial Number:/{serial=$3} \
       /Temperature_Celsius/{temp=$10} \
       /Power_On_Hours/{onHours=$10} \
       /Start_Stop_Count/{startStop=$10} \
       /Spin_Retry_Count/{spinRetry=$10} \
       /Reallocated_Sector/{reAlloc=$10} \
       /Current_Pending_Sector/{pending=$10} \
       /Offline_Uncorrectable/{offlineUnc=$10} \
       /UDMA_CRC_Error_Count/{crcErrors=$10} \
       /Seek_Error_Rate/{seekErrors=("0x" substr($10,3,4));totalSeeks=("0x" substr($10,7))} \
       /High_Fly_Writes/{hiFlyWr=$10} \
       /Command_Timeout/{cmdTimeout=$10} \
       END {
           testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);
           if (temp > tempCrit || reAlloc > sectorsCrit || pending > sectorsCrit || offlineUnc > sectorsCrit)
               device=device " " critSymbol;
           else if (temp > tempWarn || reAlloc > 0 || pending > 0 || offlineUnc > 0 || testAge > testAgeWarn)
               device=device " " warnSymbol;
               if (match(onHours, "h")) onHours= substr(onHours, 0, index(onHours, "h"));
           seekErrors=sprintf("%d", seekErrors);
           totalSeeks=sprintf("%d", totalSeeks);
           if (totalSeeks == "0") {
               seekErrors="N/A";
               totalSeeks="N/A";
           }
           if (hiFlyWr == "") hiFlyWr="N/A";
           if (cmdTimeout == "") cmdTimeout="N/A";
           printf "|%-6s|%-17s| %s |%5s|%5s|%5s|%7s|%7s|%8s|%6s|%6s|%6s|%7s|%4s|\n",
           device, serial, temp, onHours, startStop, spinRetry, reAlloc, pending, offlineUnc, \
           crcErrors, seekErrors, hiFlyWr, cmdTimeout, testAge;
       }'
    ) >> "$logfile"
done
(
    echo "+------+-----------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"
    echo ""
    echo ""
) >> "$logfile"
 
###### for each drive ######
for drive in $drives
do
    brand="$(smartctl -i /dev/"$drive" | grep "Model Family" | awk '{print $3, $4, $5}')"
    serial="$(smartctl -i /dev/"$drive" | grep "Serial Number" | awk '{print $3}')"
    (
        echo ""
        echo "########## SMART status report for ${drive} drive (${brand}: ${serial}) ##########"
        smartctl -H -A -l error /dev/"$drive"
        smartctl -l selftest /dev/"$drive" | grep "# 1 \|Num" | cut -c6-
        echo ""
        echo ""
    ) >> "$logfile"
done
#sed -i '' -e '/smartctl 6.3/d' "$logfile"
#sed -i '' -e '/Copyright/d' "$logfile"
#sed -i '' -e '/=== START OF READ/d' "$logfile"
#sed -i '' -e '/SMART Attributes Data/d' "$logfile"
#sed -i '' -e '/Vendor Specific SMART/d' "$logfile"
#sed -i '' -e '/SMART Error Log Version/d' "$logfile"
echo "</pre>" >> "$logfile"
 
### Send report ###
#sendmail -t < "$logfile"
#rm "$logfile"