Tuesday, February 28, 2012

Enterprise Wide PST Import -- Process-PSTImportQueue

This is Part 3 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in the series is here.


With PST file entries added to the queue, all that's left is to process them.


As we imported PST files for our pilot group using this script, we learned a few things of course! People really do come in two types when filing messages. Filers and Pilers. But more importantly, everyone saw their PST file as a "Folder" with messages in those folders. Many did not realize it was a file.

Initially we were going to be smart and import all the psts into the root of the Archive Mailbox and thus get some illusion of single instance storage by not allowing duplicates. But our test cases hated the "lumped up all together into one big mess" this created. They wanted that extra "Folder" that the PST created.

So we adapted by putting each PST file as it's own folder named the PST file name without the trailing ".pst" -- They loved it. Since this was all about user acceptance and them embracing the new technology, the "folder per PST file" became the default setting.

So onto processing.

When the script opens the queue file, it just loops thru each entry and decides what work needs to be done for that entry. The script does what it can, and then moves on. So a PST file will be copied one one pass, then imported the next, removed afterward and so on...
So you must run the Process script over and over. But also you must allow time for the Mailbox Replication queues to do their job. I do this with a do .. while loop...
(After lots of trial and error, we found that 15 minutes is plenty long enough time to wait and get the imports done the quickest.)

do { Process-PSTImportQueue ; "`n" ; get-date -f 'yyyyMMdd HHmm' ; "`n" ; sleep -seconds 900} while ($true)

I added the time stamp in the middle so after I wake up in the morning and check on progress, I can tell if the script has hung or whatever.

The importing of a PST file has several states:
  • New -- This PST file entry was just added to the queue and no work has been done. When the script works on a job in this state, it tries to copy the file to the staging area. If the copy fails it just logs the error and moves on to the next entry in the queue.
  • Copied -- The PST file was successfully copied, so start the import into the mailbox.
  • InQueue -- The PST is being processed by the the mailbox replication service, check on status and see if complete, failed, or still processing.
  • InBITS -- in this state the file is being copied using BITS Transfer
  • Suspended -- Some PST files can be corrupted. They could make the Mailbox replication service restart. When that happens, all the mailbox imports will restart and revert to zero percent complete. A job will be suspended if we find that on the last run it was at 30% and now it's at 5% (meaning it restarted) -- it's also possible for an admin to suspend the mailbox import jobs. So we allow for that ... the jobs stay suspended until an admin restarts them. A corrupt PST file might cause all the active to "look bad by association" -- the admin can restart the jobs one by one until he find the hidden culprit and deal with it. For example, he might repair the PST file. Or remove it from the queue, by setting the job to skip this file with a reason of "Corrupt"
  • Failed -- In this case the Mailbox Replication service deemed this job as failed. The script looks at the reason why it failed. There is one particular case where we know the file is corrupt and can't be processed, the PST file is marked to be skipped. There is another particular case where we know the file can be restarted. So we do that. Every other case, we simply ignore this entry this time and let the Admin find it and deal with it.
  • Imported -- The PST file was imported into the mailbox with out issue.
  • CleanedUp - After the PST file is imported we save a record of the job (get-mailboximportrequest | get-mailboximportrequeststatistics) into a CSV file named the same as the job. Also we remove the PST file from the staging area.
  • Notified -- This is where we check to see if all jobs for this user have been "CleanedUp" meaning, it's ok to send the Final Report to the user. All jobs were process file is True has to be Cleaned up before Notified comes into play. A separate file, "Notified.txt," is used and is just a list of people that have already been notified. This ensures the user only gets one message.
  • AllComplete -- The user has been notified and all PST cleared and the record files of the processed jobs are moved to a permanent location. This job is moved to the Complete Queue.


The options for Process-PSTImportQueue

  • -Displayname -- when you want to just process a subset of jobs for a particular user
  • -Status -- to see the status of all the jobs
  • -StatusDetail -- used with -Displayname to see status detail of each job for Displayname
  • -IgnoreClientVer -- normally a job with ClientVerOK set to $false (meaning the Outlook client is NOT the correct version to see the archive mailbox) will be skipped. To process these job, use this switch.


Next we added some tools we needed ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, February 7, 2012

Enterprise Wide PST Import - Add-PSTImportQueue

This is Part 2 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.

Before we could start writing the script, we needed a share where Exchange Services had access and was big enough to hold the active queue of PST files. We would be copying the PST files here and then importing them. Luckily, we had a server with 1TB of space and we would put the share there and run the scripts from there.

As I started putting this script together, I realized we needed a queue. One entry in the queue was one PST file, that way a user could have many PST files or a single PST file.

But before anything, first we had to add them to the queue -- Add-PSTImportQueue.ps1

We also need to know information about the user
The first thing I wanted to know was -- what version of Outlook is the user running? The RPC logs have that information. IP address and Outlook version, just need to look for a "connect" then inside that find the user. When i ran this it located the user just fine, their Outlook version and their IP. But it took a long time. And when doing 150 users it would go slowwww... The part that was killing it the most was getting parsing the logs. So i wrote a separate function that just gets all the "connect" entries and Outlook 2007 and Outlook 2010 and saved them to a file, once per day. So finding these users connects are much faster since the filtering has already been done.
Later I realized the version I find in the log can be incorrect if the user has just been upgraded recently and I haven't saved those logs yet. Another problem was people giving me the wrong names and importing PST for someone who didn't have the right version of Outlook.

So now we get the connect logs and try to find the IP and change the IP to a computer name using DNS. Then go find the Outlook.exe file and get the version. More accurate.

Many trial and errors later - this is what evolved...

Example
                Add-PSTImportQueue 'Thompson, Daniel E.'

Pipe a text file of names to be processed:
                gc users.txt | %{Add-PSTImportQueue $_}

After we make a backup of the current queue file, then the script will verify the user's mailbox. Using the mailbox information, we glean some connection information from the CAS server logs. Afterwards we know the version of Outlook the user last connected to the Exchange Server and from what IP. Since IPs can change we translate that to a ComputerName, then check for the OS.
We'll need all these things as we progress thru the steps of importing.

Next we determine the Home Share for a user and search the entire share for *.PST. We log each entry noting the size and the last write property. If the file has not been written to in 2 years we will still log this PST file in the queue, but we mark it as skipped. We look at the size in the same way. A size of 256K is an empty shell of a PST and contains no data. It is also marked as skipped. Sharepoint.PST and files with the words "BACKUP" or "BACK UP" in the entire full pathname are also skipped.
All the file entries are added to the end of the Queue.
Lastly we send an email notification to the Person being searched and show the results. This is the "Requested PST Import: Initial Report for <User's DisplayName>" email.

There are options when running Add-PSTImportQueue
  • -DisplayName <user name> -- who we are searching, you can use anything that translates to a mailbox
  • -Client <14.0.6109.5000, etc.> -- allows you to specify the client version and override what is found
  • -IP <IP> -- allow you to override the IP found (sometimes this info is not available)
  • -SourceDir <unc pathname> -- allow you to override the HomeShare -- used when you know the directory where the PST files reside. For example: on a computer and not a home share.
  • -SearchDays <number> -- search the CAS logs this many days - 14 is default
  • -SearchPC <computername> --  Search the computer for PST files starting in the users directory and recurse thru all the remaining directories -- this can take a while.
  • -IgnoreClientVer -- This switch is used when you don't care about the Client Version at all and you want to just get the PST files in the queue. This is not preferred, because it causes issues later, but sometimes is the only way. Computer not available and user don’t have a home share, etc.
  • -NoNotify -- This switch allows you to discover PST files for a user and not send a notification.

After you Add a PST to the queue, you have to process it ...
That's part 3 ...


Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions

Tuesday, January 24, 2012

Enterprise Wide PST Import - Script Requirements

This is a Part 1 in a series of posts about my experience tackling the migration of PST files.
The first post in the series is here.
The next post in this series is here.


In order to import all the PST files from the user's home drive, we needed some guidelines. And lets not forget about managers. They always want to feel they are adding in value ;)

Users need to know whats going on.
These guys are very wary of anyone touching their PST files. "This is very important data and I may need it someday." So we wanted to be upfront and completely honest.
They get an 'Initial Report' showing how many files were found on their Home share and the state of those files. All files are imported with these exceptions:
  • A file of 256K in size is just an empty shell and don't even waste the time trying to import them.
  • A file that has not been touched in 2 years is considered dormant and is skipped as well
  • If a file has the word "backup" or back up", we assume its a backup 

Next, once the import is complete, they get a 'Final Report' show what files were processed and the result of each. As you might expect, some files are corrupt and can't be imported. The use is also give a very concise list of instructions to disconnect the existing PST files from Outlook.

After 6 days from the Final Report, we send the 1st of the nudge reports. We beg them to disconnect those PST files. We send another after 13 days and after 24 days. All the while we are getting copies of these emails so we know who these people are and can open tickets so they can be assisted.

Even with all these notices, we noticed that people don't have time to read them and they generally ignored them. At least until the GPO set in.

After 12 days, users are added to a GPO that disables PST growth.
We did want to scare people by taking away their PST files. We even say in the 1st email to them, 'We Don't Delete Anything." -- but if we give them an Archive Mailbox we can't let them continue adding messages to PST files. So we compromised and allow them to open an review older PST files, but we do not allow the to add any new messages there. We tell the users they can keep their PST files forever if they want. But we all know that will be changed in some future meeting when The Powers That Be say to kill them all.


Reclaim Space on Home Shares.
Since this all started with file server farms filling up, there was also the dire need of removing the PST files from these home shares.
Certainly a dilemma since we're planning on not scaring the users with the "D" word (delete).
So it was decided to put the users PST on their local drive. This solves the need to clean off Home Shares and the need to not delete the files for the users warm and fuzzy feeling.



Introduction: The Beginings
Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions




Tuesday, January 17, 2012

Enterprise Wide PST Import - The Beginnings

When Exchange 2010 SP1 came out, the  Email Team decided the secondary mailbox, or Archive mailbox, was not really archiving, but it was something we could use.

The Problems:
Our server farm housing our Home Shares were always full and the Server Team were constantly struggling to get 1G back here and 1G back there. Then after all their hard work, before you knew it, the file servers were full again.
The Server Team added 6T of space in the last 2 years. It was beginning to look like they could never keep pace. And, of course, we were reminded over and over that Microsoft does not support PST files on a share in this way.
During that same time, we were testing a product that could do PST file ingestion, but it required something to be installed on the users PC and our delivery system was not working so well. (Let's not even start talking about the red tape!) So for about a year we stayed stuck in this quagmire.
At one point, The Powers That Be came to the Email Team and wanted us "To import these PST files into our archiving software 'right away' and how long would it take to complete?"
If we were going to decide how long it would take to ingest all PST files, we'd be needing some information to go on.  I wrote a quick report in Powershell taking all the people in AD with mailboxes (enabled and disabled) then scanned their home directory for PST files.
To our surprise there was 12T used by PST's. Over 21,000 created by just a little over 3000 people. There was one person that had over 500 PST's. Wow! How could one person have that much time to create that many?
We scratched our heads on how to get these imported with the software we already paid for which doesn't work so good. And also this software that didn't work so good depended on an EXE being delivered to the PC and that process didn't work so good either. So I told The Powers That Be ingesting all PST files will take 2 years. I guessed.

Along comes 2010 SP1 and Archive Mailbox
Since we had been fighting the evil demon of PST files for about a year, without anyone really doing anything or getting anywhere, The Email Team started making plans to import all these PST files into Archive mailboxes and bypass our Archiving Software completely. We were thinking it was time for new Archiving Software and no sense spending time on software we would not be using. So using the Archiving Software for PST ingestion went out the window.

We could use PowerShell to import directly into our Archive Mailbox without touching the PC at all.

We did not want the current mailbox databases to get bloated with new data, so we decided to create new mailbox databases and excluded them from automatic mailbox creation. We added 500G drives to each of our 8 DAG member server and set up mailbox databases. We ended up with 32 drives, equaling 16T.  We also wanted a fail over copy of each database so we really only had 8T. We hoped with de-duplication and a lot of PST files being "backup of backups" we could fit the 12T into the 8T. We found a pilot group and started Importing PST files. We had about 45 users imported when ...

Disaster Happens
Luckily, is wasn't anything to do with Exchange. It was the Home Shares. Turned out the push of Office 2010, many open PST files, and probably no telling what else, used up all the resources on the server farm housing the Home Shares and it went down. There were terrified people running up and down the halls but mainly everyone wanted to know how to get to their PST files! "You have to fix this and NOW!"
You think I'm joking. We had to say we are not on the Server Team, and we just have to wait. More than likely nothing will be damaged and you'll be just fine.

Those 40 people we had already migrated, were very happy their PST (now folders in the Archive Mailbox) were safe and sound.

So how do we convince all the other Users to switch over to Archive Mailbox?
We use the Home Share Disaster to scare the hell out of them!

I started working on the scripts, we were doing our pilot group one by one manually using a snippet here and there. But now we're talking about potentially 100 users at a time. We had to come up with a better script.


This is the intoduction to a series of posts I've made about my experiences using Powershell to import PST files.

Part 1: Script Requirements
Part 2: Add-PSTImportQueue
Part 3: Process-PSTImportQueue
Part 4: Some Tools we added
Part 5: Set-PSTImportQueue
Part 6: About PST Capture
Part 7: More PST Import Tools
Part 8: Using RoboCopy
Part 9: Morning Status Report
Part 10: Using BITS Transfer
Part 11: Get the script / Set up
Part 12: The Functions









Monday, January 16, 2012

About Me and this Blog

My name is Dan Thompson. I am one of several Email Administrators for a large retail chain.
During our work day, we solve many problems small and large - at least to us anyway. I want to journal some of those problems and solutions here. Mainly because I am the worst at documenting my work and I want a place to jot down an idea to work toward, a snippet of PowerShell code I can reuse later, or document some process before I forget all the reasons I wanted to do something some certain way.

But what drives me most is trying to "Find the solution to that problem we had 1 month ago. How did we fix that again?"-- or -- "Where was that link we found that had that thing we loved?"
Yes, I know about SharePoint.  I wanted something outside work.

Our environment is MS Exchange 2010 SP1 and Exchange 2007 co-existence. We have about 65 servers in various DAGS on Dell hardware blades.

These posts will mainly be PowerShell related and how I can do my work faster and better using PowerShell with Microsoft Exchange.