Monday, June 4, 2012

Ongoing Indexing Issues on DAG02

See the update to this problem here.

We've been having issues with indexing on one DAG and we thought we had it fixed when we went to Service Pack 1, Rollup 6. This indexing issue doesn't happen very often and I have to learn everything all over again. Putting all my notes here, gives me one place to refresh my view of history.

Before Rollup6 we would see these issues...
People would call and say "I click search and it just spins ... nothing is ever returned."

Do a "Get-MailboxDatabaseCopyStatus DB01" and it reports all is well.
Odd that the Get-MailboxDatabaseCopyStatus would say the Content Index is fine when it obviously isn't. You could failover to another database copy and have the same issue. Reports good, but search doesn't return anything.

(During this, we also found that some clients would search and return results, but when you click on one of the items, you'd get an error message: "Could not display item." This turned out to be a client issue and an update to a newer version cleared that up. But it sent us on a wild goose chase for a bit.)

To fix the "index-don't-work-but-reports-good" issue, we would log onto the server where the passive database copy lived, then:
  1. Suspend the database copy
  2. Stop both indexing services: "Microsoft Search (Exchange)" Service and "Microsoft Exchange Search Indexer" Service
  3. Remove the catalog for that database
  4. Start both Indexing services
  5. Resume the database copy
(This link shows a better way to reset these indexes. Although it's essentially the same thing, it's done via a script already written for you.
http://blogs.msdn.com/b/pepeedu/archive/2010/12/14/exchange-2010-database-copy-on-server-has-content-index-catalog-files-in-the-following-state-failed.aspx)

This generated a "Crawl" of the database to rebuild the index from scratch. After the crawl is compete you can activate that copy of the database and then update the Content Index on the now passive database. using:

Update-MailboxDatabaseCopy DB01\MBX02 -CatalogOnly -Force

I still need to know when this was failing, and not by a customer calling and saying "I ain't getting nuthin."

Test-ExchangeSearch to the rescue -- at least this was something real. The test adds a message to the System mailbox and then checks to see if it is available through the Search Indexer. An actual honest to goodness test!

Get-Mailboxdatabase -Server MBX02 | Test-ExchangeSearch

This command gets us the results we want. They tell us if the Indexing is crap (Result = -1) or how long it takes to retrieve the results. Sometimes this will return a MAPI Error, but you run the test again and it returns a value just fine. Maybe by testing it you woke it up?

After Rollup6
We never found out why those indexes were getting corrupt and why only on that DAG. We have 7 other DAGs that don't have that issue. But after Rollup6, we never saw those indexing issues again.
No, not those issues. We found new ones. But not so terrible ones. At least the Get-MailboxDatabaseCopyStatus reports correctly now.

And in this lastest episode we started seeing a Content Indexing error on the active copy of the database. Update-DatabaseCopy -CatalogOnly doesn't work on the Active copy of a database. And since we are under strict orders to not change anything without Change Management documentation and waiting 14 days for approval, we could not move the database to fix this content index.

I tried to restart the Microsoft Search (Exchange) Service, which tries to restart the Microsoft Exchange Search Indexer Service and the Indexer service hung. I could restart the indexing duo on any other servers in the DAG. It was very clear this database and this server were in a deadlock battle with no relief in sight. In fact this proved it:

sl $exscripts
.\TroubleShoot-CI.ps1 -Database DB01

So I tried to restart the Microsoft Search (Exchange) Service again and the Microsoft Exchange Search Indexer hung while stopping again. I loaded up Task Manger and killed the Exchange Search process, which did immediately restart because the Process ID changed. It finally dawned on me that the Microsoft Search (Exchange) Service had never actually restarted, so I found that process and killed it.

Everything started to work again. So to clear a deadlock, at least in this case, was to kill the Microsoft Search (Exchange) Service process.



This Explains the Troubleshooters:
http://blogs.technet.com/b/exchange/archive/2011/01/18/3411844.aspx

Resetting the Content Indexer to force a new crawl:
http://blogs.msdn.com/b/pepeedu/archive/2010/12/14/exchange-2010-database-copy-on-server-has-content-index-catalog-files-in-the-following-state-failed.aspx