|
Geo-search: what you want, where you want.
|
|
Technology |
|||
OverviewSearchArea is a commercial geo-search engine with the following features:
Full-text search featuresThe Apache foundation's Lucene search library is at the core of SearchArea and is a fully featured search engine with fast indexing, and state-of-the-art search algorithms. Unlike search facilities found in many relational databases the engine offers excellent relevance ranking and a powerful query language.For the average user who wants to perform basic queries by typing in keywords the search engine is capable of returning the most relevant results first by using statistical natural language processing techniques to identify the most significant terms from the query and underlying documents. For power-users who wish to have more control over queries the query syntax supports powerful features such as boolean, phrase, fuzzy, wildcard and fielded queries as well as term and document boosting. ![]() The "significant terms" feature is an extension to the Lucene functionality and is used to automatically identify key words or phrases found in search results. With a single mouse click these terms can be added or removed from the query to help refine the concept and improve the quality of search results. A "highlighter" feature offers the ability to summarise and highlight the most relevant parts of a document when displaying query results. The core engine is very modular in its design and can easily be re-configured to use different relevance scoring algorithms and text processors e.g. choice of stemmer. Spatial search featuresRelational databases such as Oracle implement the OpenGIS.org specifications in order to provide support for spatial queries. SearchArea is the first search engine to follow the OpenGIS standard for use in pure search technology.Geography is defined internally using co-ordinates in order to provide the best support for queries. A spatial query tests the relationship between a shape representing the searcher's area of interest and shapes held in the index representing the location of documents' subject matter. Documents stored in a SearchArea index can represent any kind of shape defined in the OpenGIS.org’s Simple Features Specification (eg lines, points, polygons, multipoints, multipolygons etc.).The shapes are expressed in a standard plain text format called “WKT” (Well Known Text) and queries which have a spatial element use the same WKT format to define the area of interest. The WKT format is powerful enough to express many location types eg. single retail locations (using points), business franchises (using multipoints), geographical studies (using polygons) and queries for restaurants situated within 5 miles of the next 10 motorway exits (using multipolygons). The SearchArea Developer's Guide offers further details on spatial queries. Defining location for end usersObviously end users do not want to type co-ordinates in to perform queries so SearchArea applications are typically deployed with a user interface which makes it easy to define location such as:
Defining location for documentsNot all documents which need to be placed into a SearchArea index come conveniently prepopulated with coordinate information in the WKT format. Applications built on the SearchArea technology therefore typically involve some form of document parsing to recognize location information such as:
Architecture overviewSearchArea is a pure Java solution running on the Java 1.4 platform. The engine is composed of a number of components that can be deployed in highly scalable distributed solution running on many machines or can be confiigured to operate as a library that can run in-process in a small-scale application.Distributed large scale deployment
In large scale deployments brokers and indexes can be distributed across multiple machines
in order to provide load-balancing and fail-over. Index servers can be both partitioned and
replicated to avoid the issues of trying to fit large data volumes on just one machine or trying
to service large volumes of search requests. The broker automatically manages the merging
of query results from multiple partitions, load balancing requests across replicated index
servers and fail-over in the event of failure. Java's Remote Method Invocation protocol is
used for communication between servers and indexes can register with more than one broker
in order to avoid any one single point of failure.
Small scale deploymentNot all applications need the overhead of a brokered architecture and the SearchArea engine is designed to be embeddable so that the broker and index components can run within the one Java Virtual Machine without any need for remote method calls.Should the need arise, applications can easily migrate to a distributed architecture without having to redesign the application code.Typical application configurationThe SearchArea engine is often configured in a web-based environment although this is not a mandatory requirement. Each deployment tends to have its own requirements for providing a user interface that allows users to pick a location and has its own source of location-based data that needs to be searched.
The user interface typically needs to offer the end user the ability to define his choice of location using a map or by entering the name of a location. The source of data that needs to be searched can also vary in format between applications. This can vary between structured data such as XML or a database containing precise coordinate information and unstructuted data such as webpages where location information needs to be parsed by identifying key patterns such as postcodes or telephone dialling codes in the text. A batch task is required to add this content into the index using the index APIs. We can tailor existing example solutions to help with these application-specific tasks in order to get a solution up and running quickly. |
|
Full text features Spatial features Architecture |
|