NOTE: This post – drafted, composed, written, and published by me – originally appeared on https://blogs.technet.microsoft.com/johnbai and is potentially (c) Microsoft.
With the release of Exchange 2013, there are some changes that are relevant to eDiscovery; whether it be for In-Place Holds or Litigation Queries to export to the Discovery Mailbox. Most notably, eDiscovery/Exchange Search does not support AQS – it switched to KQL. KQL is supported in the SearchQuery parameter (Keywords box in the Exchange Admin Center). However, Outlook still uses AQS.
Using KQL, we can perform searches that are beneficial to the eDiscovery and will save time/money/resources, without the need to invoke a third-party to process the data for you.
For example, if I perform a query for any messages that only have a word document as an attachment, I get the two messages I expect to find.
If perform the same query but, this time, define a subject or keyword I’m after, the messages are excluded because the primary rule hasn’t been met.
If I perform a third query with words that exist in the document (but not in the document name), these documents will return in my query, as well.
There is a limitation to the number of mailboxes that can be searched and it is 5,000*. Any number beyond this and the specified query will return the following error: An unknown error occurred on the search server. Please contact your administrator for assistance. The message from the search server is ‘The search exceeded the maximum number of mailboxes that can be searched at a time. Please try searching less than 5000 mailboxes.’.
*The maximum number of mailboxes that you can search can be changed in on-premises Exchange 2013. You can use the Set-ThrottlingPolicy command with the DiscoveryMaxMailboxes parameter to do so but this may come at a negative impact to performance.
As Exchange now uses the FAST Search index, we can query for what documents haven’t been processed and why. For example, if I what to query for the error where the document parser encountered a processing error, I would use the following command in Exchange Management Console:
Get-FailedContentIndexDocuments Administrator -ErrorCode 7 | FT -AutoSize
DocID Database Mailbox Subject Description
—– ——– ——- ——- ———–
3462 LAB-NAEX15-01 Store 002 Administrator Binaries Test The document parser encountered a processing error.
3464 LAB-NAEX15-01 Store 002 Administrator FW: Binaries Test The document parser encountered a processing error.
Using this I can see what, precisely, caused the document to not be indexed:
$errorSevens = Get-FailedContentIndexDocuments Administrator -ErrorCode 7
301002 Error parsing document ‘exchange://localhost/Attachment/34eb02b4-3bc6-4163-a40d-2587faa9e0db/135d5536-d180-4198-9ba8-574b53df8206/e08d777e-e710-4407-a53d-1f57a4a58d79/a654efa1-bb87-426a-aaca-9866be73
3ccd/438086667654.0/System.Data.dll’. Document has an undetectable format and will not be parsed. 301002 Error parsing document ‘exchange://localhost/Attachment/34eb02b4-3bc6-4163-a40d-2587faa9e0db/135d5536-
d180-4198-9ba8-574b53df8206/e08d777e-e710-4407-a53d-1f57a4a58d79/a654efa1-bb87-426a-aaca-9866be733ccd/438086667654.1/mscorlib.dll’. Document has an undetectable format and will not be parsed.
In this case, the documents are binaries attached to the email for testing in regards to another issue. FAST Search cannot reverse-engineer binaries, so it is safe to assume that these files aren’t necessary for my eDiscovery purposes. See here for a list of formats that Exchange FAST Search can index.
The error code enumerations are as follows:
0 – No problems.
1 – An error has occurred.
2 – A timeout has occurred.
3 – The message was not processed in a timely manner.
4 – The mailbox was offline.
5 – The attachment limit was reached.
6 – The item is only partially indexed.
7 – The document parser encountered a processing error.
8 – The document annotations aren’t valid.
9 – The document is suspected of being unable to be processed.
10 – The document processing failed due to a Rights Management error.
11 – The Store Session is not available.
12 – The mailbox is quarantined.
13 – The mailbox is locked.
14 – The operation is not supported.
15 – Search can’t sign in to the mailbox.
16 – Body conversion failed.