Getting all documents from a FAST collection

The Introduction to FAST Search API provides an introduction on how to use the FAST API. FAST stores documents in a collection. We usually perform searches on a collection. But there are times when we want all documents in a collection. This article explains how.

The maximum number of documents in a FAST response is 4020. We page through the results, 100 documents at a time.

private static void SearchVideos()
{
    Search(FastServerUrl, @"meta.collection:""videos""", "Videos");
}

private static void Search(string url, string query, string docType)
{
    var factory = new HttpSearchFactory(new NameValueCollection 
    { { "Com.FastSearch.Esp.Search.Http.QRServers", url } });
    var engine = factory.GetSearchEngine(new Uri("http://" + url));

    int pageSize = 100;
    int index = 0;
    int totalDocuments = 0;
    int retries = 0;

    var queryObject = new Query(query);
    IQueryResult result = null;
    IDocumentSummary document = null;

    int hits = pageSize;
    do
    {
        queryObject.SetParameter(BaseParameter.OFFSET, index);
        hits = index + pageSize < totalDocuments ? pageSize 
                                : totalDocuments - index;
        hits = totalDocuments == 0 ? pageSize 
                                    : hits;
        queryObject.SetParameter(BaseParameter.HITS, hits);
        
        retries = 0;
        bool fail = false;
        do
        {
            try
            {
                // Search returns 4020 records maximum
                result = engine.Search(queryObject);
                fail = false;
            }
            catch (Exception)
            {
                retries++;
                fail = true;
            }
        } while (retries < 3 && fail);

        if (totalDocuments == 0)
        {
            totalDocuments = result.DocCount;
            Console.WriteLine("Documents in {0}: {1}", docType, totalDocuments);
        }

        var documents = result.Documents();
        int curIndex = 0;
        bool failCurrent = false;
        while (documents.MoveNext() && curIndex < result.Hits)
        {
            try
            {
                document = (IDocumentSummary)documents.Current;
            }
            catch (Exception)
            {
                failCurrent = true;
                break;
            }
            string country = document.GetSummaryField("country")
                                     .StringValue;
            curIndex++;
        }
        if (!failCurrent)
            index += pageSize;
    } while ((index < totalDocuments) && (result.Hits == hits));
}

While retrieving a large number of documents (100), there is a possibility of the query timing out. If a timeout occurs, retry the Search operation.

The result from FAST is a collection of IDocumentSummary. IDocumentSummary has a collection of fields. We retrieve the fields, name and country.

While iterating through the document collection, there is a possibility of exception, even though MoveNext method returns true. I believe this is a bug with the API. Silently ignore the exception.

The HttpSearchEngine class also exposes the FetchDocuments method. Though the FetchDocuments method succeeds, it is not possible to retrieve the document collection from the result. This can be another bug with the API. In any case, FetchDocuments is an undocumented part of the FAST Search API.

Related Posts

Leave a Reply

Your email address will not be published.