20 Şubat 2018 Salı

EMC DD / import sfs_dump data into Excel

Generate and capture the sfs_dump data:

In this section you will need to know how to connect to your Data Domain via a ssh client, and how to configure its ability to log any data which appears on the screen. This log file is not written on the Data Domain, it will be written to the system from which you are running the ssh client (perhaps your laptop). You should ensure there is sufficient space for the log file (which could be around 200 MB for systems with 1 million files).

1.     Login to the DDR from a ssh client
2.     Type system show serialno
3.     Type priv set se
4.     Authenticate using the serial number from Step 2. If your system uses retention-lock compliance then you will need to authenticate using a user with the "security" role first.
5.     Switch on logging within your ssh client, ensure "all session output" is enabled, the output file is named logically and will be placed in a suitable directory where you can find it. For PuTTY, this is under the Session -> Logging tab of the configuration screen. You may need to check the manual if you are using a different ssh client.
6.     Type se sfs_dump -c (you can specify a path if you want to limit to a particular dir or mtree, eg se sfs_dump -c /data/col1/mymtree.)
7.     When the command has completed, switch off logging. If Excel is not installed on the same machine as the log file just created then you will need to copy it across.

Import into Excel:
In this section the previously created log file will be opened inside Excel, and the data imported into a spreadsheet. Excel's "Text Import Wizard" will be configured to split the data into individual columns. Any redundant information above the actual data will be removed.

8.     Start Excel
9.     Open the file created by your ssh client's logging, using File -> Open. You may need to set the dialog to show "All files", depending on the suffix of the log file you created.
10.   Excel will start the "Text Import Wizard"
11.   Choose "Delimited" to describe the data
12.   Scroll through the preview, looking for a line which starts with "namemtimefileidsizeseg_bytes" and note the row number. This line is the header for the sfs_dump data and is the first line we wish to import. Set the "Start import at row" value to this number.
13.   Click "Next"
14.   Ensure the "Tab" delimiter box is ticked and the others are unticked, check the preview to see filenames in the first column, and numbers in the remaining columns. Note: You may need to scroll down a little as the preview starts at the top of the file, rather than the header line set in Step 12. If the preview does not show filenames in the first column, and numbers in the remaining columns then you should click "Back" and follow Steps 11-14 again. If it still doesn't appear correct then you will need to restart from Step 1 - paying particular attention to ensuring that "all session output" is enabled. This is essential to allow the tab characters used in the sfs_dump output to be saved in the log file.
15.   Click "Next"
16.   Click "Finish"
17.   At this point, Excel should contain data in columns A to I. If data is only present in column A, but stretches across the screen, then close Excel and retry from Step 8. If you still see the same result, re-read Step 14 and then restart from Step 1.

Add extra calculated fields:
In this section, extra fields will be added to show various compression values, and to convert the timestamp data into a human readable value. We will also remove any redundant information below the data. When you first enter the formulae in the following steps, the cell will display "#VALUE!" - this is normal and expected.

Firstly, the timestamp conversion.

18.   Copy the following formula and paste into cell J1: =(((B1/1000000000+(-6*3600))/86400)+25569)

a)     Change the -6 in the -6*3600 section to the offset between your timezone and UTC, ie -6 for Central, -5 for Eastern, -8 for Pacific. For timezones East of UTC (ie mainland Europe, India) the offset will be positive (ie 1 for Central Europe). If your timezone has a half hour difference, make sure you include .5 in the value you use.

b)     Secondly, the compression information.  

19.   Copy the following formula and paste into cell K1: =IF(H1=0,0,D1/H1)
20.   Copy the following formula and paste into cell L1: =IF(I1=0,0,H1/I1)
21.   Copy the following formula and paste into cell M1: =K1*L1
22.   Select cell J1, right click, then "format cell", choose your preferred format from the Date selection.
23.   Select cells K1,L1,M1, right click, then "format cell", set to Number with one decimal place

Next, we are going to copy the formulae we have just added, to every row of data, using some useful shortcuts.

24.   Select cells J1,K1,L1,M1, right click, select copy
25.   Scroll down to the bottom of the data using the vertical scrollbar on the right (which is quicker than using the arrow keys.)  Look for the last row which contains data in columns A through to I. Take a note of the row number. There should be a blank row below this one, followed by some summary information.  
26.   Select the cells in columns J,K,L,M in the row for which you noted its row number, then scroll back to the top of the sheet (drag the vertical slider to save time), hold down shift and select cells J2,K2,L2,M2. Now press enter so that the values are pasted all the way down. Now we are going to remove any redundant rows below the data.
27.   Return to the bottom of the data again, to the row number recorded in step 25.
28.   Select any entire row underneath this row which contains data, and delete it.

Add remaining headers and filtering functionality:
In this section, we will name the calculated columns we added, and enable Excel's powerful filtering and sorting ability.

29.   Go to cell J1, change this to read "Last written date"
30.   Change cell K1 to read "Global compression"
31.   Change cell L1 to read "Local compression"
32.   Change cell M1 to read "Overall compression"
33.   Double click on the divider between the headers for column A and B to resize column A to the widest filename.
34.   Select cell A1.  Select "Format as Table" from the "Home" ribbon menu. Click on your preferred style - any from the 2nd line under the "Medium" heading have been found to be easy to read. Ensure "My table has headers" is ticked, then click "OK" in the "Format as Table" dialog.
35.   "Save as", ensuring the type is set to "Excel Workbook (*.xlsx) 

Information on the columns:

·         A contains filename
·         D is the original size of the file
·         H is the size after global compression
·         I is the size after global and local compression
·         J contains the datestamp for when the file was last written..
·         K contains the global compression or dedupe
·         L contains the local compression
·         M contains the overall compression for the file

All filesizes are in bytes.

You can now use Excel to filter/sort as required to report on your data.
·         For example, to show only files within the MTree "/data/col1/backup" which are over 30 days old:
·         Click on the down arrow in the "Name" header, select "Text Filters", then "Begins With" and type "/data/col1/backup/" into the box. The trailing slash is important.  Click "OK"
·         Click on the down arrow in the "Last Written Date" Header", select "Date Filters", then "Before".  Use the date picker on the right of the dialog to select a date 30 days ago. Click "OK"
·         The status bar at the bottom will show how many rows match this selection.
Notes:
This procedure relies on the Data Domain's sfs_dump command. The information returned from this command includes a timestamp of the file (in a non-human readable format) and internal information related to deduplication. This information is only stored when then the file is first written to the Data Domain, hence it tends to become less accurate as other data is written to/deleted from the Data Domain.

Please note this particular procedure uses the "-c" option of sfs_dump, which outputs in columns which are separated by tabs. It is essential that the tab characters present in the output are saved in the logging file, otherwise Excel will not be able to separate the fields correctly. There are other output formats for sfs_dump, and EMC Support may have requested these in other situations.

The instructions given assume a reasonable knowledge of Excel, but may be translated into other spreadsheet software.
Modern versions of Excel allow a large number of rows (bear in mind that each file on your system will require a row in the spreadsheet.)
If your system has too many files for your version of Excel you can try running the procedure once for each Mtree, so you will have multiple spreadsheets for the system.


How to import sfs_dump data into Excel?

Windows Server IIS üzerinden hizmet veren bir FTP servisine erişmek istediğimizde;  Internet Explorer, FileZilla vb. uygulamalar ile erişim ...