Tuesday, February 28, 2012

Enterprise Wide PST Import -- Process-PSTImportQueue

This is Part 3 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.


With PST file entries added to the queue, all that's left is to process them.


As we imported PST files for our pilot group using this script, we learned a few things of course! People really do come in two types when filing messages. Filers and Pilers. But more importantly, everyone saw their PST file as a "Folder" with messages in those folders. Many did not realize it was a file.

Initially we were going to be smart and import all the psts into the root of the Archive Mailbox and thus get some illusion of single instance storage by not allowing duplicates. But our test cases hated the "lumped up all together into one big mess" this created. They wanted that extra "Folder" that the PST created.

So we adapted by putting each PST file as it's own folder named the PST file name without the trailing ".pst" -- They loved it. Since this was all about user acceptance and them embracing the new technology, the "folder per PST file" became the default setting.

So onto processing.

When the script opens the queue file, it just loops thru each entry and decides what work needs to be done for that entry. The script does what it can, and then moves on. So a PST file will be copied one one pass, then imported the next, removed afterward and so on...
So you must run the Process script over and over. But also you must allow time for the Mailbox Replication queues to do their job. I do this with a do .. while loop...
(After lots of trial and error, we found that 15 minutes is plenty long enough time to wait and get the imports done the quickest.)

do { Process-PSTImportQueue ; "`n" ; get-date -f 'yyyyMMdd HHmm' ; "`n" ; sleep -seconds 900} while ($true)

I added the time stamp in the middle so after I wake up in the morning and check on progress, I can tell if the script has hung or whatever.

The importing of a PST file has several states:
  • New -- This PST file entry was just added to the queue and no work has been done. When the script works on a job in this state, it tries to copy the file to the staging area. If the copy fails it just logs the error and moves on to the next entry in the queue.
  • Copied -- The PST file was successfully copied, so start the import into the mailbox.
  • InQueue -- The PST is being processed by the the mailbox replication service, check on status and see if complete, failed, or still processing.
  • InBITS -- in this state the file is being copied using BITS Transfer
  • Suspended -- Some PST files can be corrupted. They could make the Mailbox replication service restart. When that happens, all the mailbox imports will restart and revert to zero percent complete. A job will be suspended if we find that on the last run it was at 30% and now it's at 5% (meaning it restarted) -- it's also possible for an admin to suspend the mailbox import jobs. So we allow for that ... the jobs stay suspended until an admin restarts them. A corrupt PST file might cause all the active to "look bad by association" -- the admin can restart the jobs one by one until he find the hidden culprit and deal with it. For example, he might repair the PST file. Or remove it from the queue, by setting the job to skip this file with a reason of "Corrupt"
  • Failed -- In this case the Mailbox Replication service deemed this job as failed. The script looks at the reason why it failed. There is one particular case where we know the file is corrupt and can't be processed, the PST file is marked to be skipped. There is another particular case where we know the file can be restarted. So we do that. Every other case, we simply ignore this entry this time and let the Admin find it and deal with it.
  • Imported -- The PST file was imported into the mailbox with out issue.
  • CleanedUp - After the PST file is imported we save a record of the job (get-mailboximportrequest | get-mailboximportrequeststatistics) into a CSV file named the same as the job. Also we remove the PST file from the staging area.
  • Notified -- This is where we check to see if all jobs for this user have been "CleanedUp" meaning, it's ok to send the Final Report to the user. All jobs were process file is True has to be Cleaned up before Notified comes into play. A separate file, "Notified.txt," is used and is just a list of people that have already been notified. This ensures the user only gets one message.
  • AllComplete -- The user has been notified and all PST cleared and the record files of the processed jobs are moved to a permanent location. This job is moved to the Complete Queue.


The options for Process-PSTImportQueue

  • -Displayname -- when you want to just process a subset of jobs for a particular user
  • -Status -- to see the status of all the jobs
  • -StatusDetail -- used with -Displayname to see status detail of each job for Displayname
  • -IgnoreClientVer -- normally a job with ClientVerOK set to $false (meaning the Outlook client is NOT the correct version to see the archive mailbox) will be skipped. To process these job, use this switch.


Next we added some tools we needed ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, February 7, 2012

Enterprise Wide PST Import - Add-PSTImportQueue

This is Part 2 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.

Before we could start writing the script, we needed a share where Exchange Services had access and was big enough to hold the active queue of PST files. We would be copying the PST files here and then importing them. Luckily, we had a server with 1TB of space and we would put the share there and run the scripts from there.

As I started putting this script together, I realized we needed a queue. One entry in the queue was one PST file, that way a user could have many PST files or a single PST file.

But before anything, first we had to add them to the queue -- Add-PSTImportQueue.ps1

We also need to know information about the user
The first thing I wanted to know was -- what version of Outlook is the user running? The RPC logs have that information. IP address and Outlook version, just need to look for a "connect" then inside that find the user. When i ran this it located the user just fine, their Outlook version and their IP. But it took a long time. And when doing 150 users it would go slowwww... The part that was killing it the most was getting parsing the logs. So i wrote a separate function that just gets all the "connect" entries and Outlook 2007 and Outlook 2010 and saved them to a file, once per day. So finding these users connects are much faster since the filtering has already been done.
Later I realized the version I find in the log can be incorrect if the user has just been upgraded recently and I haven't saved those logs yet. Another problem was people giving me the wrong names and importing PST for someone who didn't have the right version of Outlook.

So now we get the connect logs and try to find the IP and change the IP to a computer name using DNS. Then go find the Outlook.exe file and get the version. More accurate.

Many trial and errors later - this is what evolved...

Example
                Add-PSTImportQueue 'Thompson, Daniel E.'

Pipe a text file of names to be processed:
                gc users.txt | %{Add-PSTImportQueue $_}

After we make a backup of the current queue file, then the script will verify the user's mailbox. Using the mailbox information, we glean some connection information from the CAS server logs. Afterwards we know the version of Outlook the user last connected to the Exchange Server and from what IP. Since IPs can change we translate that to a ComputerName, then check for the OS.
We'll need all these things as we progress thru the steps of importing.

Next we determine the Home Share for a user and search the entire share for *.PST. We log each entry noting the size and the last write property. If the file has not been written to in 2 years we will still log this PST file in the queue, but we mark it as skipped. We look at the size in the same way. A size of 256K is an empty shell of a PST and contains no data. It is also marked as skipped. Sharepoint.PST and files with the words "BACKUP" or "BACK UP" in the entire full pathname are also skipped.
All the file entries are added to the end of the Queue.
Lastly we send an email notification to the Person being searched and show the results. This is the "Requested PST Import: Initial Report for <User's DisplayName>" email.

There are options when running Add-PSTImportQueue
  • -DisplayName <user name> -- who we are searching, you can use anything that translates to a mailbox
  • -Client <14.0.6109.5000, etc.> -- allows you to specify the client version and override what is found
  • -IP <IP> -- allow you to override the IP found (sometimes this info is not available)
  • -SourceDir <unc pathname> -- allow you to override the HomeShare -- used when you know the directory where the PST files reside. For example: on a computer and not a home share.
  • -SearchDays <number> -- search the CAS logs this many days - 14 is default
  • -SearchPC <computername> --  Search the computer for PST files starting in the users directory and recurse thru all the remaining directories -- this can take a while.
  • -IgnoreClientVer -- This switch is used when you don't care about the Client Version at all and you want to just get the PST files in the queue. This is not preferred, because it causes issues later, but sometimes is the only way. Computer not available and user don’t have a home share, etc.
  • -NoNotify -- This switch allows you to discover PST files for a user and not send a notification.

After you Add a PST to the queue, you have to process it ...
That's part 3 ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions