Getting all collections stored in FAST

FAST is a search engine used for enterprise search. In FAST, a View specifies how content is structured. It has collections. A collection has documents. Each document has a specific structure, specified by fields.

Apart from specifying the content, a View has certain parameters to fine-tune the search. For example, parameters like lemmatisation and spell-check control the results returned by a search.

In the below code, we loop through the views, get collections for the view and dump the number of documents stored in each collection.

static void DumpCollections(string url)
{
    var factory = new HttpSearchFactory(new NameValueCollection 
    { { "Com.FastSearch.Esp.Search.Http.QRServers", url } });
    var engine = factory.GetSearchEngine(new Uri("http://" + url));
    foreach (var viewName in engine.GetViewList())
    {
        var view = engine.GetView((string)viewName);
        Console.WriteLine("View: {0}", (string)viewName);
        var spec = view.ContentSpecification;

        foreach (var collection in spec.Collections)
        {
            string query = string.Format(@"meta.collection:""{0}""", 
            (string)collection);
            var result = engine.Search(query);
            Console.WriteLine("Collection {0} has {1} documents", 
            (string)collection, result.DocCount);
        }
        Console.WriteLine("==================");
    }
    Console.WriteLine("------------------------");
    Console.WriteLine("------------------------");
}

To create a search engine, use the HttpSearchFactory class. The GetSearchEngine method accepts the server URL and returns the search engine. HttpSearchEngine class has the GetViewList method to get a list of view names. Each view has a IContentSpecification interface. The interface exposes the Collections property which is a list of collection names. To get the number of documents within a collection, we perform a search.

FAST uses FQL or Fast Query Language. It is a proprietary syntax understood by the search engine. We pass the collection name for the search query. The FQL to get all documents is meta.collection:collection-name. The DocCount property of the search result object has the count of all documents stored in a collection.

Related Posts

Leave a Reply

Your email address will not be published.