How to Create a FAST Search for SharePoint Test Document Using XMLMapper

How to Create a FAST Search for SharePoint Test Document Using XMLMapper

FAST Search Server 2010 for SharePoint (FAST Search for SharePoint) is an alternative to the out-of-box SharePoint Server search (the built-in enterprise search solution in SharePoint Server 2010). If you want to understand the differences between SharePoint Server search and FAST Search for SharePoint, see How FAST Search for SharePoint fits into SharePoint 2010.

One of the nice capabilities is a flexible item processing architecture, which also includes XML processing.

Using synthetic items (documents) for testing

The simplest way to test a search index is to crawl a bunch of items from your SharePoint site. If you want to test various capabilities of your search deployment, it may be useful with a set of synthetic items with well-defined content and metadata. The advantage is that you can specify all properties of the indexed items. If you want to test ranking features, it is often simpler to use small, synthetic items.

Using XML is a convenient way to create synthetic items. 

Prepare the item processing pipeline

You can use the XMLMapper custom processing stage to input XML documents. In this example we create a configuration to handle synthetic items with the following crawled properties (and the corresponding managed property mappings):

  • mytitle: Title of the item. Mapped to Title.
  • mybody: Main body text of the item. Mapped to body.
  • mysize: Size of the document. Overrides the detected size of the item. Mapped to Size.
  • mydate: The update date of the item. Mapped to Write.
  • mytags: Contains some metadata tags for the item. Mapped to Tags.
  • myint1 - myint4: Four general purpose integer properties. Mapped to managed properties with the same name.
  • mytext1 - mytext4: Four general purpose text properties. Mapped to managed properties with the same name. 

The following steps configure the item processing pipeline. For more information, see Customizing item processing (MSDN).
Unless otherwise specified, you should apply the commands in a PowerShell window on the FAST Search for SharePoint administration server.

  1. Create a new crawled property category for the new crawled properties. All text crawled properties will be mapped to the default fulltext index:
      $guid = [guid]::NewGuid()

      $cat = New-FASTSearchMetadataCategory -Name xmlprops -Propset $guid

      Set-FASTSearchMetadataCategory -Category $cat -MapToContents 1

      $guid

      Note the GUID value. This is a unique value that you need to add in the XMLMapper configuration file.

  2. The following three configuration files are for the item processing pipeline. You must update these files on all servers that runs item processing in your deployment (<document-processor> in deployment.xml). 
    1. Edit C:\FASTSearch\etc\config_data\DocumentProcessor\formatdetector\user_converter_rules.xml to override the default document type detection for .xml documents. Details: Format detection and item parsing.
    2. Edit C:\FASTSearch\etc\config_data\DocumentProcessor\optionalprocessing.xml to activate the XMLMapper stage and the FFDDumper stage (for pipeline debugging). Change the following two elements to have active="yes":

      <processor name="XMLMapper" active="yes" />
      <processor name="FFDDumper" active="yes" />

      Note: You should only activate 'FFDDumper' on the test deployment, and not if you crawl a larger number of documents. For more information about FFDDumper output format, see How To Identify Crawled Properties and Their Values

    3. Create C:\FASTSearch\etc\config_data\DocumentProcessor\XMLMapper.xml. This is the configuration of the mapping from XML elements to crawled properties. For the mapping indicated above, you use this configuration:

      <XMLPropertiesCreator>
         <propset><GUID as created above></propset>
         <type>31</type>
         <XMLMappings>
            <Mapping attr="mytitle" path="//Title"/>
            <Mapping attr="mybody" path="//Body"/> 
            <Mapping attr="mysize" path="//Size" type="3"/> 
            <Mapping attr="mydate" path="//Date" type="64"/> 
            <Mapping attr="mytags" path="//Tags"/> 
            <Mapping attr="myint1" path="//Int1" type="3"/> 
            <Mapping attr="myint2" path="//Int2" type="3"/> 
            <Mapping attr="myint3" path="//Int3" type="3"/> 
            <Mapping attr="myint4" path="//Int4" type="3"/> 
            <Mapping attr="mytext1" path="//Text1"/> 
            <Mapping attr="mytext2" path="//Text2"/> 
            <Mapping attr="mytext3" path="//Text3"/> 
            <Mapping attr="mytext4" path="//Text4"/>
         </XMLMappings>
      </XMLPropertiesCreator>

      For more information, see XML mapper schema (MSDN) .

  3. Update the item processing configuration on the FAST Search for SharePoint servers (on any FAST Search for SharePoint server):
      psctrl reset

    Create and submit a test item

    1. Create the test document. This is a simple xml document you can use as the initial test document. The commands below assumes you create the file as C:\XMLMapper\doc1.xml:

      <Document>

        <Title>Document 1</Title>

        <Date>2011-01-01T08:00:00Z</Date>

        <Size>128</Size>

        <Body>This is the first test document. alpha bravo charlie delta echo foxtrot golf hotel.</Body>

        <Tags>

          <Tag>alpha</Tag>

          <Tag>bravo</Tag>

          <Tag>charlie</Tag>

        </Tags>

        <Int1>1</Int1>

        <Int2>2</Int2>

        <Int3>3</Int3>

        <Int4>4</Int4>

        <Text1>alpha</Text1>

        <Text2>alpha bravo</Text2>

        <Text3>alpha bravo charlie</Text3>

        <Text4>alpha bravo charlie delta</Text4>

      </Document>

    2. Submit the document using 'docpush':

      docpush -c sp -u file:/// C:\XMLMapper\doc1.xml

      This command submits the XML document to the pipeline. When you have submitted this first document, the crawled properties are automatically created.

    3. Verify that the document is added. Inspect the processing log (from FFDDumper) in C:\FASTSearch\data\ffd\. For more details, see How To Identify Crawled Properties and Their Values.
       

    Create managed properties and crawled property mappings

    By submitting the initial test document you have created the necessary crawled properties. Now you need to create the custom managed properties and set up the mapping to managed properties.

    1. Create the mapping for the existing managed properties:

      $mp = Get-FASTSearchMetadataManagedProperty -Name body
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mybody
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = Get-FASTSearchMetadataManagedProperty -Name Title
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mytitle
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = Get-FASTSearchMetadataManagedProperty -Name Size
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mysize
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = Get-FASTSearchMetadataManagedProperty -Name Write
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mydate
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp


      Note that mapping from XML to body should only be used for testing. Such a mapping implies that certain standard item processing of the body will not take place, such as property extraction. If you want to test such features, it is better to submit a plain text file using docpush.

    2. The managed properties Write, Size and Title already contains crawled property mappings, and these properties will have default values also for our synthetic documents. In order to override the default mappings, you must ensure that our new mappings gets first in the list of crawled property mappings. This can be done in PowerShell, but is more convenient to do in the GUI. To do that, you must:
      1. Go to the Query SSA server, and go to Central Administration --> Manage service applications.
      2. Select the name of your Query SSA.
      3. Select FAST Search Administration --> Managed properties
      4. For each of the three properties:
        1. Search the name, click on the property.
        2. In Mappings to Crawled Properties, move the crawled property you have created up to the top of the list and click OK.
    3. Create the new managed properties with associated mappings. 'myint1' and 'mytext1' also have the following additional features enabled:
      • Sorting enabled
      • Query refinement enabled

      $mp = New-FASTSearchMetadataManagedProperty -Name mytext1 -Type 1
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext1
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp
      Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -SortableType 1
      Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -RefinementEnabled 1

      $mp = New-FASTSearchMetadataManagedProperty -Name mytext2 -Type 1
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext2
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = New-FASTSearchMetadataManagedProperty -Name mytext3 -Type 1
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext3
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = New-FASTSearchMetadataManagedProperty -Name mytext4 -Type 1
      $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext4
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = New-FASTSearchMetadataManagedProperty -Name myint1 -Type 2
      $cp = Get-FASTSearchMetadataCrawledProperty -Name myint1
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp
      Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -SortableType 1
      Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -RefinementEnabled 1

      $mp = New-FASTSearchMetadataManagedProperty -Name myint2 -Type 2
      $cp = Get-FASTSearchMetadataCrawledProperty -Name myint2
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = New-FASTSearchMetadataManagedProperty -Name myint3 -Type 2
      $cp = Get-FASTSearchMetadataCrawledProperty -Name myint3
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

      $mp = New-FASTSearchMetadataManagedProperty -Name myint4 -Type 2
      $cp = Get-FASTSearchMetadataCrawledProperty -Name myint4
      New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    4. Re-submit the test document using 'docpush' (to populate the managed properties you have mapped)
    5. Run a test query in one of the following ways:
      1. Make a test query on the internal query interface on the query processing server (<query> role in deployment.xml). Use the following URL: http://localhost:13280/   (assuming you use the default base port)
        Search for 'xmlmapper'. You will see an internal XML query result with all managed properties that are returned in query results.
      2. Use the simple PowerShell script as described in this blog: http://blogs.msdn.com/b/knutbran/archive/2011/04/01/some-hints-on-testing-custom-managed-properties-and-queries.aspx

    Leave a Comment
    • Please add 5 and 6 and type the answer here:
    • Post
    Wiki - Revision Comment List(Revision Comment)
    Comments
    • Richard Mueller edited Revision 13. Comment: Removed (en-US) from title, added tag

    • Craig Lussier edited Revision 12. Comment: added en-US to tags and title

    Page 1 of 1 (2 items)
    Wikis - Comment List
    Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
    Comments
    • Richard Mueller edited Revision 13. Comment: Removed (en-US) from title, added tag

    • Craig Lussier edited Revision 12. Comment: added en-US to tags and title

    • Hi,

       So great article and very helpful.  Have you done a scenerio to index one single XML document with multple document nodes?  The ESP Filetraverser will parse this out when we give them XPath which we do not have here.  Thoughts?

      Thanks.

    • Unfortunately you cannot submit multiple items in one XML file, as in FAST ESP.

    Page 1 of 1 (4 items)