Monday, May 21, 2012

Enterprise Wide PST Import - Using RoboCopy

This is Part 8 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.

Our users are world-wide and even though we are concentrating on our Headquarters, many other users are seeing the benefit of importing their PST files. Especially those who travel a lot. Nearly everyone of these heavy travelers carry an external hard drive with their PST files. Now they can get those messages via OWA and they can discard the extra weight of the external hard drive.

We started getting many request for Archive mailbox from all over the world. The challenge is to copy the PST file to our staging area so it can be processed by the server.

Alas, Copy-Item seems to be really slow when working over a WAN line. And many of our pipes can get saturated. I started experimenting with Robocopy, and I am not entirely happy with the results, but it's better than copy-item.

I created a small stand alone script (RoboCopy-Item) that does the very simple  copy of a file. I am experimenting with the settings like /IPG:300 /Z, etc -- trying to find the best overall throughput for our environment. Fast, but not choking the WAN. I am still experimenting there.

In the Add-PSTImportQueue function we now look to see if a user is in HQ or outside somewhere. If they are outside HQ then we mark the job with a status of RoboCopy.

Then we use: Robocopy-PSTImportQueue which looks at the import queue and starts background jobs for each job with a RoboCopy status. I do this on a by user, by location selection. We don't want to have 50 jobs running for 30 users in 1 location. I keep it at 1 user per 1 location at any given time.

Then we use a Get-RoboCopyjob <jobnumber> that just gets the 1st 10 lines and that last 20 lines of a job, so you can quckly see the status of the job.

We toyed with the idea of incorporating this into the overall script so we could just run it, but too many things can go wrong and the powershell window with all the jobs running can get closed ot the server rebooted, etc.

We want to find a better way to do this.

Still bigger files -- over 1.5G -- are taking forever. We are using BITS to control the amount of traffic that can be allowed, so we have to go modify the Registry and restart BITS service. But copies from computers with that setting fills up the pipe to that location.

I'll update as we search for the better way.



Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions





Tuesday, May 8, 2012

More PST Import Utils - Get-ImportStatus & Lock-File

This is Part 7 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.

Quicker Stats
We found the MailboxImport queue needs a little tender care from time to time. We always would run this command to get a quick understanding of what was going on in the queue:

Get-MailboxImportRequest | Get-MailboxImportRequestStatistics

Sometimes we needed :

Get-MailboxImportRequest -Status Failed | Resume-MailboxImportRequest

I really got tried of typing all that out all the time, so I created a short function for me.

Function Get-ImportStatus (){
      #---------------------------------------------------------
      # a helper function to display mailbox import info
      # option to show only a subset -- by batchname
      #      sometimes the list is just long
      # option to restart failed jobs --
      #      sometimes jobs failed because the service crashed on a bad PST file
      #      or too many jobs for one mailbox
      # option to restart suspended jobs
      #      the script can suspend jobs it thinks may be causing issues
      # option to suspend all jobs
      #      a single PST can crash the MB rep service and all jobs start over
      #      you can't tell exactly which is the culprit so suspend all jobs
      #      and investigate

      # left the -confirm off on purpose
     
      param (
             $Batch=$null,
              [switch]$RestartFailed,
              [switch]$RestartSuspended,
              [switch]$SuspendAll
       )
      If($Batch) {
              Get-MailboxImportRequest -BatchName $Batch |
              Get-MailboxImportRequestStatistics
       }
      ElseIf($RestartFailed.IsPresent) {
              Get-MailboxImportRequest -Status Failed |
              Resume-MailboxImportRequest
       }
      ElseIf($RestartSuspended.IsPresent) {
              Get-MailboxImportRequest -Status Suspended |
              Resume-MailboxImportRequest
       }
      ElseIf($SuspendAll.IsPresent) {
              Get-MailboxImportRequest |
              Suspend-MailboxImportRequest
       }
      Else {
              Get-MailboxImportRequest | Sort Name |
              Get-MailboxImportRequestStatistics
       }
     
}

Stepping all over each other
You now how it is, everyone gets busy and stops checking with others about what's going on and crap happens. The way we were handling the queue files became an issue. If two people ran the script, the last one to write was the winner. I noticed this when I tried to schedule a task and the queue was trashed. (Lots of manual fixing up there.) And there were a few other small disasters, too.

I remembered and old way to make sure one process did not step on the other, create a "lock" file when you started your work and then delete the "lock" file when you done.
This wasn't exactly elegant, but it works just fine. We're just doing this with a zero lenth file.

Simply check for the existance of the file (Test-PSTIQLock) and if false, lock the file (Lock-PSTIQ) process the queue and then remove the lock (Unlock-PSTIQ)



Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions




Tuesday, April 10, 2012

Enterprise Wide PST Import -- PST Capture

This is Part 6 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.

When the Exchange Team at Microsoft posted about PST capture in July of 2011, I was very excited. That post was really the catalyst that got us started thinking we could really Import PSTs. We started putting our infrastructure together to handle all the PSTs floating out in the wild. The future look bright.

And we waited...

During our waiting period we had our own crisis or two that propelled us into the PST import business. By the time PST Capture was officially out in Jan of 2012, we were already fully functional with our scripts and we saw no compelling reason to change.

Still I saw value. We have many users in the field that don't have Home Shares at Headquarters and they were saving their PST files locally. We needed to get those PST as well, eventually. So I downloaded the PST Capture tool and did some testing.

I was sad to find out that to do discovery on a PC, you had to have an agent installed. We still have that same issue with delivering an EXE to all the PC's in the world. So that was out. I tried it on a few PCs, thinking we could do this from time to time. But I could not get it to work. I didn't go deep into troubleshooting, and just thought it was a firewall issue. That would be a nightmare to get open!

But I did try using the UNC file path and that was working. Until I found out that this did not work on some clients. Seems like it failed on all the clients we needed it to work on. Like Office 2007.

There is no reporting feature with PST Capture, true we could still use the same reports we use now, but we would lose a few statistics, like how many files were processed and skipped.

We stopped testing.

So if you're wondering why we went to all the trouble to write this family of scripts to import PSTs, instead of using PST Capture, now you know the rest of the story.


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, March 27, 2012

Enterprise Wide PST Import -- Set-PSTImportQueue

This is Part 5 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.

As we progressed with importing PST files and processing more and more users, we found we were having to Repair PST files or mark them to be skipped manually. Some users were saying - please wait, don't process me yet, wait for 2 days.
This was becoming a pain more than anything, so I sat down and decided to do something about it.

Right about this same time, I had decided we had too many little scripts scattered about everywhere and I wanted them all in one place easily accessible. And all of these scripts shared some functions. So I created a Module and started to migrate all the scripts there. Well it's not a real Module, but more like a repository for all the functions and scripts we used with the PST migrations.
Then I added two new functions: Set-PSTImportQueue and Remove-PSTImportQueue
(I've only used remove once, when a user decided to not have their PST imported.)

Set-PSTImportQueue is just a way for me to change settings on a Import Job without loading up Excel and making mistakes. Here are the options:

  • -DisplayName -- The person we are working on. This allows you to work on all the jobs associated with this name. You use it with the other options to change settings, like -JobStatus, etc.
  • -JobName -- Isolate this update to a particular job
  • -JobStatus -- Change the JobStatus -- reset back to New, etc. Sometimes it useful to change this status to something the script doesn't recognize, just to skip this job, or set of jobs.
  • -IP -- This is the computer Name, sometimes you may not have known it during the add, this is just a way to add the computername to the jobs. We use this entry later when moving PST files to the local PC
  • -OrgUNCName  -- There are cases where the user moved the file, and rather than doing a new discovery, just change the location.
  • -MRServer  -- In our 2 AD site world, having the wrong MR server setting can make the jobs just sit in the queue. We have dedicated CAS servers for this process, one in each site. If the Archive database is in Site 1 and an MRServer in site 2 get chosen by mistake, or the database moved, you need to reset the job status and change this to a MRServer in Site 1.
  • -SkipReason -- A Place to log why a PST file was skipped, "Age, Size, Backup, Sharepoint List, Corrupt, Missing, etc" This shows up on the Final Report.
  • -ClientVer -- A place to log the client version, mainly for records and reports.
  • -ClientVerOK -- The is true or false. By default Process-PSTImportQueue will not process jobs that have ClinetVerOK set to false. It just skips them. Sometimes setting this to false on all jobs for a user allows you to skip this user for now.
  • -ProcessFileOff -- As it sounds, changes the ProcessJob to $false -- skipped jobs are set to false. You might want to set the SkipReason at the same time.
  • -ProcessFileOn -- As it sounds, changes the ProcessJob to $true
  • -CompleteQueueFile -- The PSTCompleteQueue file needs maintenance from time to time, so this setting allows you to work on that file. The default is the PSTImportQueue
Remove-PSTImportQueue just takes two options:
  • -DisplayName -- This will remove all jobs with this users name.
  • -JobName -- This will remove this particular Job

Tuesday, March 13, 2012

Enterprise Wide PST Import -- Some Tools we added

This is Part 4 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.


Tools To Help Do the Job 


When this project started we were mandated to import over 23,000 PST files into mailboxes and get the PST files off the home shares. The important part here is "get the space cleared up off the home shares."

So each week we are given a list of names of people moved to Windows 7 OS and Office 2010. The list can have 20 names, or 120 names. So we process them. In many cases, there can be upwards to 50 PST files per person. One user had over 500. (I know! I had to double check.)

So the queue can be fat with over 1500 items at times, we found that sometimes the processing of the queue was very slow, mostly due to one person having a lot of files and most of them big.


Optimized For Speed

We needed to optimize the queue. We talked about just sorting buy the size of the file, and then decided that would make some people with only one file wait too long to be finished. We wanted to finish as many people as possible in the night. So all the people with just one PST file needed to be moved to the front of the line.

Pipe the queue to group by Displayname, sort by count, then recreate the queue from the smallest to the largest. Quick and easy. This usually gets 80 - 90 % of the users done over night. We start about 7PM.

Move the PST files off.

We really struggled to get a good handle on this. People are pretty freaky about their PST files. In our final report, we ask them to disconnect the PST from their client and then move them locally. But a large majority can't do that without help, or don't care. Many never read the message at all.

We had to figure out how to get the PST files off the Home shares and do this without freaking out the user.

Finally we agreed on a place to put the files. In the users "my documents" directory on their C drive. It's not the most perfect place, but it is somewhat secure. At least secure enough from the common user. We had to know the PC name, and what OS it was, just to find the correct place quickly. We also needed to know that the user is in our HQ building. a small percentage of users with Archive mailboxes are outside our HQ office. We wanted to skip those.

We agreed the PST files needed to be disconnected for 30 days. This was long enough, everyone thought, for the user to forget they had them. If a PST file has a LastWrite Timestamp of "right now" -- that PST is most likely open and connected to Outlook.

So:
If you have PST files on your home share,
And you are in the HQ office,
And those PST last write timestamp is 30 days old.
We move the file for you to ..\Documents\Outlook Files\
If there is a file there with that name already, we just add a random number to the name and keep going.

Grading Our Progress

Since moving the PSTs off the Home shares is the real ultimate goal, we started to keep track of all the PST files, creation dates and removal dates.

As of this date, we had over 23,000 files to move, and we moved about 6,000. Not too bad. Not fantastic, but good.  I'll post some info on our reports later on.


But there is more work to do...

I am still constantly opening the PST Import Queue in a spreadsheet and modifying it and saving it again. I am human and make many mistakes. I must figure out how to do that easier...
Next time: "Set-PSTImportQueue"


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, February 28, 2012

Enterprise Wide PST Import -- Process-PSTImportQueue

This is Part 3 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.


With PST file entries added to the queue, all that's left is to process them.


As we imported PST files for our pilot group using this script, we learned a few things of course! People really do come in two types when filing messages. Filers and Pilers. But more importantly, everyone saw their PST file as a "Folder" with messages in those folders. Many did not realize it was a file.

Initially we were going to be smart and import all the psts into the root of the Archive Mailbox and thus get some illusion of single instance storage by not allowing duplicates. But our test cases hated the "lumped up all together into one big mess" this created. They wanted that extra "Folder" that the PST created.

So we adapted by putting each PST file as it's own folder named the PST file name without the trailing ".pst" -- They loved it. Since this was all about user acceptance and them embracing the new technology, the "folder per PST file" became the default setting.

So onto processing.

When the script opens the queue file, it just loops thru each entry and decides what work needs to be done for that entry. The script does what it can, and then moves on. So a PST file will be copied one one pass, then imported the next, removed afterward and so on...
So you must run the Process script over and over. But also you must allow time for the Mailbox Replication queues to do their job. I do this with a do .. while loop...
(After lots of trial and error, we found that 15 minutes is plenty long enough time to wait and get the imports done the quickest.)

do { Process-PSTImportQueue ; "`n" ; get-date -f 'yyyyMMdd HHmm' ; "`n" ; sleep -seconds 900} while ($true)

I added the time stamp in the middle so after I wake up in the morning and check on progress, I can tell if the script has hung or whatever.

The importing of a PST file has several states:
  • New -- This PST file entry was just added to the queue and no work has been done. When the script works on a job in this state, it tries to copy the file to the staging area. If the copy fails it just logs the error and moves on to the next entry in the queue.
  • Copied -- The PST file was successfully copied, so start the import into the mailbox.
  • InQueue -- The PST is being processed by the the mailbox replication service, check on status and see if complete, failed, or still processing.
  • InBITS -- in this state the file is being copied using BITS Transfer
  • Suspended -- Some PST files can be corrupted. They could make the Mailbox replication service restart. When that happens, all the mailbox imports will restart and revert to zero percent complete. A job will be suspended if we find that on the last run it was at 30% and now it's at 5% (meaning it restarted) -- it's also possible for an admin to suspend the mailbox import jobs. So we allow for that ... the jobs stay suspended until an admin restarts them. A corrupt PST file might cause all the active to "look bad by association" -- the admin can restart the jobs one by one until he find the hidden culprit and deal with it. For example, he might repair the PST file. Or remove it from the queue, by setting the job to skip this file with a reason of "Corrupt"
  • Failed -- In this case the Mailbox Replication service deemed this job as failed. The script looks at the reason why it failed. There is one particular case where we know the file is corrupt and can't be processed, the PST file is marked to be skipped. There is another particular case where we know the file can be restarted. So we do that. Every other case, we simply ignore this entry this time and let the Admin find it and deal with it.
  • Imported -- The PST file was imported into the mailbox with out issue.
  • CleanedUp - After the PST file is imported we save a record of the job (get-mailboximportrequest | get-mailboximportrequeststatistics) into a CSV file named the same as the job. Also we remove the PST file from the staging area.
  • Notified -- This is where we check to see if all jobs for this user have been "CleanedUp" meaning, it's ok to send the Final Report to the user. All jobs were process file is True has to be Cleaned up before Notified comes into play. A separate file, "Notified.txt," is used and is just a list of people that have already been notified. This ensures the user only gets one message.
  • AllComplete -- The user has been notified and all PST cleared and the record files of the processed jobs are moved to a permanent location. This job is moved to the Complete Queue.


The options for Process-PSTImportQueue

  • -Displayname -- when you want to just process a subset of jobs for a particular user
  • -Status -- to see the status of all the jobs
  • -StatusDetail -- used with -Displayname to see status detail of each job for Displayname
  • -IgnoreClientVer -- normally a job with ClientVerOK set to $false (meaning the Outlook client is NOT the correct version to see the archive mailbox) will be skipped. To process these job, use this switch.


Next we added some tools we needed ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, February 7, 2012

Enterprise Wide PST Import - Add-PSTImportQueue

This is Part 2 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.

Before we could start writing the script, we needed a share where Exchange Services had access and was big enough to hold the active queue of PST files. We would be copying the PST files here and then importing them. Luckily, we had a server with 1TB of space and we would put the share there and run the scripts from there.

As I started putting this script together, I realized we needed a queue. One entry in the queue was one PST file, that way a user could have many PST files or a single PST file.

But before anything, first we had to add them to the queue -- Add-PSTImportQueue.ps1

We also need to know information about the user
The first thing I wanted to know was -- what version of Outlook is the user running? The RPC logs have that information. IP address and Outlook version, just need to look for a "connect" then inside that find the user. When i ran this it located the user just fine, their Outlook version and their IP. But it took a long time. And when doing 150 users it would go slowwww... The part that was killing it the most was getting parsing the logs. So i wrote a separate function that just gets all the "connect" entries and Outlook 2007 and Outlook 2010 and saved them to a file, once per day. So finding these users connects are much faster since the filtering has already been done.
Later I realized the version I find in the log can be incorrect if the user has just been upgraded recently and I haven't saved those logs yet. Another problem was people giving me the wrong names and importing PST for someone who didn't have the right version of Outlook.

So now we get the connect logs and try to find the IP and change the IP to a computer name using DNS. Then go find the Outlook.exe file and get the version. More accurate.

Many trial and errors later - this is what evolved...

Example
                Add-PSTImportQueue 'Thompson, Daniel E.'

Pipe a text file of names to be processed:
                gc users.txt | %{Add-PSTImportQueue $_}

After we make a backup of the current queue file, then the script will verify the user's mailbox. Using the mailbox information, we glean some connection information from the CAS server logs. Afterwards we know the version of Outlook the user last connected to the Exchange Server and from what IP. Since IPs can change we translate that to a ComputerName, then check for the OS.
We'll need all these things as we progress thru the steps of importing.

Next we determine the Home Share for a user and search the entire share for *.PST. We log each entry noting the size and the last write property. If the file has not been written to in 2 years we will still log this PST file in the queue, but we mark it as skipped. We look at the size in the same way. A size of 256K is an empty shell of a PST and contains no data. It is also marked as skipped. Sharepoint.PST and files with the words "BACKUP" or "BACK UP" in the entire full pathname are also skipped.
All the file entries are added to the end of the Queue.
Lastly we send an email notification to the Person being searched and show the results. This is the "Requested PST Import: Initial Report for <User's DisplayName>" email.

There are options when running Add-PSTImportQueue
  • -DisplayName <user name> -- who we are searching, you can use anything that translates to a mailbox
  • -Client <14.0.6109.5000, etc.> -- allows you to specify the client version and override what is found
  • -IP <IP> -- allow you to override the IP found (sometimes this info is not available)
  • -SourceDir <unc pathname> -- allow you to override the HomeShare -- used when you know the directory where the PST files reside. For example: on a computer and not a home share.
  • -SearchDays <number> -- search the CAS logs this many days - 14 is default
  • -SearchPC <computername> --  Search the computer for PST files starting in the users directory and recurse thru all the remaining directories -- this can take a while.
  • -IgnoreClientVer -- This switch is used when you don't care about the Client Version at all and you want to just get the PST files in the queue. This is not preferred, because it causes issues later, but sometimes is the only way. Computer not available and user don’t have a home share, etc.
  • -NoNotify -- This switch allows you to discover PST files for a user and not send a notification.

After you Add a PST to the queue, you have to process it ...
That's part 3 ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions