Online File Server Archiving with PowerShell

Our fileserver was running low on space and still growing at about 80GB/month. Rather than throw more storage at it, which would only buy us another few months before we had to do it again, I figured I'd look for something that would archive unused data to secondary storage, without rendering it unreachable.

tl;dr

There are a lot of options to do this, but they're all expensive, require extra hardware beyond the storage required, require additional software, or essentially offline the data. What I want is something that's going to be: essentially transparent to users; free as in beer; free as in speech; built-in; and easy to maintain.

Users should ideally see no difference between primary and archival storage, we don't want to create additional work for the sysadmins by forcing them to deal with onlining offlined files when there's a need for them, nor do we want to have to train users how to get access to the files they need when they've been offlined.

We're a non-profit, so spending money we don't have to is a bad thing. I know this is true in the for-profit world as well, but a for-profit company of our size would have people who make half of our annual operating budget. Free as in beer is always best.

Despite being a primarily Microsoft shop, I am a big proponent of FOSS. All the code that we use in house, where possible and where not specialized to the point of unportability, I try to post online and write a thing about. Keeping that in mind, we need to install some publicly available PowerShell modules to do a few things. Namely we need this module to be able to create symlinks, and the PSCX extensions to be able to hash the files we're modifying.

We want to stick with something built-in to Windows because of Windows' legacy support which means that we'll likely be able to stick with this implementation for a long time without too many revisions. Additionally, we don't have a big team of specialized IT people here, so hiring talent for one specialty is no good.

Easy to maintain is relative, but what I mean is that we're not going to need to outsource support for it. I'd like something that we don't need to make significant changes to every time a server changes, is moved, we change where the archives sit, what files we're trying to archive, whatever.  Ideally it'll also be write once, run wherever so it should be somewhat flexible so that we can use it across the domain at all local file servers.

Running this script alone will allow people that are locally on the file server to access the links, but by default Windows won't follow remote to remote (r2r) symlinks, so you'll get an error when trying to access any of the newly archived files from any client computers. To get around this Tell Windows to follow the links by running this at an elevated command prompt:

fsutil behavior set SymlinkEvaluation r2r:1

You can make sure it worked after by running:

fsutil behavior query SymlinkEvaluation

You can set this on a per-computer basis if needed, or you can GPO it by setting the policies in Computer Configuration > Policies > Administrative Templates > System > File System > Selectively allow the evaluation of a symbolic link, then set the Remote to Remote one to enabled.

The original script is below.

One that will be updated at some point can be found here

#requires -version 2
<#
.SYNOPSIS
	Moves files which have not been modified in 4 or more years, copies them to secondary storage and creates a symlink
.DESCRIPTION
	Scans a particular directory, and all subdirectories for files that haven't been modified in 4 years. Any detected 
	files are moved to a secondary storage location, and a symlink is created to the new file location.
	
.NOTES
	Version:		1.0
	Author:			Robbie Crash
	Written: 		2014-02-13
	Version Notes:	Initial script.
	
	REQUIRES: 		This script requires that you set SymlinkEvaluation to allow remote transversal. You can do this 
					by opening an admin command prompt and running the following command:
						fsutil behavior set SymlinkEvaluation r2r:1
					You can verify this worked by running:
						fsutil behavior query SymlinkEvaluation
					This behaviour can also be set via GPO:
						Computer Configuration > Policies > Administrative Templates > System > File System >
							Selectively allow the evaluation of a symbolic link
						Then set Remote to Remote to Enabled.						
	
.LINK
	https://robbiecrash.me/scriptz/ArchiveAndLink.ps1
#>

param(
    [string]$Dir = "",
    [string]$ArchiveDrive = ""
    )

if ($ArchiveDrive -eq ""){
    $hostname = hostname
    $ArchiveDrive = "\\Archives\"+$hostname+"\"+$Dir[0]+"\"
    }

import-module PSCX
import-module new-symlink

$FileList = @()

$SourceDrive = $dir[0] + ":\"

$date = Get-Date -Format yyyy-MM-dd
$ErrLog = "C:\ErrorLog $date.txt"
$DelLog = "C:\DelLog $date.txt"
$PathWarning = "C:\_PROBLEMS DETECTED.txt" 

function BuildLists($dir){
    $FileList = @()
    $DirList = (dir $dir -recurse)
    foreach ($item in $DirList){
        if ( ((get-date).Subtract($item.LastWriteTime).Days-gt 1460) -eq $True) { 

            $FileList += $item
            }
        else {write-verbose "$item is modified recently"}
        }
    return $FileList
    }

function CheckPathLength($file){
    if ($File.FullName.Length -ge 220){
        copy $PathWarning $File.DirectoryName}
    }

function ArchiveFile($SourceFile){
    $DestFile = ($SourceFile.fullname.replace($SourceDrive, $ArchiveDrive))
    $DestDir = ($SourceFile.DirectoryName.replace($SourceDrive, $ArchiveDrive))
    mkdir -Path $DestDir 2>$ErrLog
    copy $SourceFile.FullName $DestFile
    }

function HashCheckFile($SourceFile){
    $DestFile = $SourceFile.FullName.replace($SourceDrive, $ArchiveDrive)
    $SourceHash = get-hash($SourceFile.fullname)
    $DestHash = get-hash("$DestFile")
    return $SourceHash.HashString -eq $DestHash.HashString
    }

function DeleteFIle($File){
    del $file.fullname 
    }

function LinkFile($Sourcefile){
    $SourceFilePath = $Sourcefile.fullname
    $DestFile = ($sourcefilepath.replace($SourceDrive, $ArchiveDrive))
    New-Symlink -path $DestFile $SourceFile.fullname -file 1>$errlog
    }


function CheckPathLength($file){
    if ($File.FullName.Length -ge 220){
        copy $PathWarning $File.DirectoryName}
    }

function ReplicateFile($file){
    if ($file.Attributes -eq "Directory"){continue}
    ArchiveFile($File)
    if (HashCheckFile($File)){
        DeleteFile($File)
        LinkFile($File)
        }
    }

function Archive($FileList){
    foreach ($File in $FileList){
        CheckPathLength($file)
        ReplicateFile($File)
        }
    }

function RunArchiving($dir){
    $FileList = BuildLists($dir)
    Archive($FileList)
    }

RunArchiving($dir)