Examine in Umbraco 8: Indexing, searching and extending Lucene in Umbraco 8
Onsite or internal search is often a poor relation to other sexier features of a site such as the primary navigation or rich integrations with external systems such as CRM. On average 30% of users will do an onsite search, and doing so is a strong predictor of likelihood to convert (see Moz).
Onsite search is challenging though, internal search engines can never rival Google for quality and accuracy of results - because they can't rely on the many ranking factors that Google uses.
However, the Lucene search engine which forms part of the core of Umbraco does provide a rich search API which is super fast, coupled with the fact that much of the indexing scaffolding is already built into the core, means that a fast and feature rich onsite search can be achieved in Umbraco 8 within a few hours of development.
Umbraco’s built-in search engine API Examine uses Lucene.Net to provide a fast and free text search of content in the CMS. How developers configure and use the API has changed substantially in Umbraco 8, and in an article written for and published in Skrift this month, we look at how to build a simple free text search that can cope with hidden, protected and multilingual content as well as content from 3rd party, external sources.
Search has come a long way in Umbraco since I started using Umbraco back in 2008. Way back then we were writing sites using XSLT templates and WebForms if we were smart, and XSLTSearch by Doug Robar was the go-to search engine for Umbraco sites built using Umbraco 3 and 4.
But in 2009 Shannon Deminick started working on a new search API which wraps the popular Lucene.Net search engine (itself a port of Lucene), this was then incorporated into the core of Umbraco in 2010. Lucene.Net is an extremely fast search engine that provides full-text search, multi-index searching, and fuzzy searching.
Examine and Lucene are often used in Umbraco sites to provide free-text onsite search on the front end of websites, and is also built into the backend of Umbraco to enable Editors to easily find and edit content. But beyond these simple use-cases Lucene can be used to provide related content searches (think the “people who bought this also bought these items” features on Amazon) and can also provide very fast access to content in code, rather than traversing content via the Umbraco IPublishedContent API or ModelsBuilder.
Because Examine is in the core of Umbraco a lot of the scaffolding that would be needed to get up and running has been done for you, with built-in indexes of Published Content used for external search, as well as indexes of unpublished content, Members and Users. This speeds up the effort needed to index and search Umbraco Content, and Examine can be extended to also index and search external content.
While Examine provides a fluent API which wraps the Lucene query language, in earlier versions of Umbraco getting up and running with Examine has been much simplified still by the package EZSearch, providing a Partial View Macro that could be dropped into a site to provide simplified drag-and-drop configurable free text search of Umbraco content.
In Umbraco 8 most of the APIs for Examine have substantially changed, like lots of other APIs in the core of the CMS, and sadly there are no plans currently to upgrade EZSearch. This means that getting site search up and running can be challenging if your MVC coding skills aren't strong. However, all the functionality remains, and we cover the key changes needed to implement a basic free text search, respecting hidden and protected content, as well as multilingual content and content stored in third-party systems in the full Skrift article.