Monday, January 7, 2013

Mailbox Database FailOver Dance: Mail queues backing up

We had an issue where the cluster service failed on a DAG member and the so the cluster started to complain about membership and so on and so on...

The database tried to come up in several places, then it finial settled on Dag Member #2.

This pretty much happened within 30 seconds. A long time in my book for Exchange mailbox database fail overs. And about 30 minutes later we started to be alerted about mail queues backing up. There were over 200 messages waiting to go to a particular database.

I had never seen this before and wondered if there was an AD replication issue. The error clearly pointed out that the database location and the user's location did not agree!


get-queue Site2HUB\Submission | get-message | fl

<snip>
LastError         : 432 4.2.0 STOREDRV.Deliver.Exception:WrongServerException.MapiExceptionMailboxInTransit; Failed to
                    process message due to a transient exception with message The user and the mailbox are in different
                     Active Directory sites.


So off i go to see the administrator in charge of AD. I wanted them to check the replication and maybe even reboot the server. Even if indeed they didn't find anything, because by golly, I was right!

To their credit they refused to boot the server. And I went back to the drawing board.

In an effort to prove I was right, I actually proved I was wrong.

I found a stuck message, and then grabbed a recipient and searched for their database and where is was mounted:

get-mailbox <username> -DomainController Site1-dc
Name                      Alias                ServerName       ProhibitSendQuota
----                      -----                ----------       -----------------
<blah>                    <blah>               MBX02           39.06 MB 

get-mailbox <username> -DomainController Site2-dc
Name                      Alias                ServerName       ProhibitSendQuota
----                      -----                ----------       -----------------
<blah>                    <blah>               MBX02           39.06 MB 

Both AD Sites showed same Server MBX02, for the user.
But the database was mounted in a different place.

[PS] C:\Windows\system32>Get-MailboxDatabase <blah's DB>

Name                           Server          Recovery        ReplicationType
----                           ------          --------        ---------------
<blah's DB>                    MBX05          False           Remote

I moved the database to MBX02 to make it agree with AD and all messages cleared.

My guess is I could have moved it to any of the other 3 servers and it would have worked.

So I was right, this was an AD issue, just I was wrong about it being a replication one, ;)  

It was the way Exchange wrote those AD entries during all the bouncing around at 10:00 AM. Moving the database made Exchange rewrite those entries.


1 comment:

  1. Thanks for your post. Had a similar issue this morning and this pointed me in the right direction. Same error in the queues but couldn't pinpoint the message that was stuck. The mailbox database that was affected was in a DAG group so switched over the server to the DAG copy and bingo - mail started routing again!

    ReplyDelete