Apache Solr and Lucene Books, Presentations and Benchmarks, Importing Big SQL data into Solr

Apache Solr is a distributed search engine based on the Lucene Open Source Search Engine. Solr version 4 and 5 improve on the Lucene engine by not only distributing the index and search operation over a cluster of servers but integrating with ZooKeeper and HDFS to provide redundant indexes, and updates of indexed documents.

Lucene and thus Solr generate an inverted index that provides high speed search, small index size and rapid indexing of the source corpus.

An Inverted Index design contains three basic elements –

1) A dictionary of unique terms,

2) A pointer files that stores the location of unique terms within indexed documents and

3) A list of unique words that should not be indexed.

Unlike other index designs, an inverted index does not contain the indexed data only its location within the source document. This allows inverted indexes to be small and fast.

Modified inverted index designs may allow selected source data to be cached in memory or file structures to avoid accessing source data to produce summary search results.

With the integration of Solr with HDFS, this provides high speed index and search of data. It also allows indexes to be stored in HDFS for protection. Index protection is also possible using Solr Cloud with standard Linux file systems.

The following is my collection of books, presentations and benchmarks on Solr and Lucene.

Recommended Books

Solr High Performance (using Solr 4 and Solr Cloud)

Solr Search Patterns (Multi-Language, using Solr Cloud and Sharding)

Solr Essentials

Related Material

Lucene Presentation (Stewart Tate, 4/18/2005)

Lucene Benchmarks (Apache 4/2005)

Lucene In Action

Training Videos

A recent presentation on Solr 4 by the author of Solr. It’s very interesting that he views Solr 4 as a NoSQL database .vs a high speed, distributed Lucene index created from source data or a database to improve search performance of the indexed data. With the introduction of Solr Cloud in Solr 4, updates of indexed items (called documents in Solr) was added which may place Solr 4 ahead of HBase in performance and function.

Admin and configuration
Changing the index
Query the index
Understanding the document schema
Additional Solr Configuration
A Solr use case
Advanced Solr concepts

Importing data from Big SQL into Solr

Download (, Unknown)