SOLR Performance Benchmarks – Single vs. Multi-core Index Shards

May 5, 2009

Single vs. multi-core sharded index. Which one is the right one? There is not a whole lot of information out there, especially when it comes to hard numbers and comparisons. There are a couple reasons for this. The first one that comes to mind is the multi-core functionality offered by Apache SOLR is very nascent. It was recently introduced with the latest SOLR v1.3 and hasn’t had much time to be adopted by the SOLR community. Second, the results are dependent on your schema, index size, query types and user load. These factors can account for varying performance results. As evidenced by the following benchmarks, a multi-core SOLR index has the potential to speed up the performance of your application or cut throughput and scalability by approximately the inverse number of cores.

i.e. For n cores the maximum throughput is roughly 1/n vs. a single index.

With multi-core sharded indexes the underlying assumption is that search performance improves by splitting your index into smaller chunks. These smaller shards are then faster and more efficient to search and index. However, you never get anything for free, the performance increase comes at a cost of higher CPU utilization. By breaking the index into multiple smaller pieces it makes searching and indexing on that smaller subset of the index faster, but you’ll need to search each core individually for every query. Where as a single index runs one slightly slower query, a multi-core sharded query runs n queries in parallel and then combines the results.

There is one problem which still needs to be worked out with the multi-core sharded index. There is no distributed IDF (inverse document frequency). This is to say, if your documents are not spread evenly across all shards then you risk a result set that is improperly ordered based on your sorts, query boosts, etc. This happens with a distributed multi-core index because the scoring of the documents takes place within each individual core before the results are combined and the query returned.

Ideally, a multi-core index is great if you need to increase the performance of your queries and can afford to sacrifice some scalability and throughput to see it through.

Below are some charts of benchmarks that I have compiled on the CitySquares SOLR index. The specifications of the machine and indexes are as follows:

Testing machine – Dell r900:

4x Quad Core Intel(R) Xeon(R) CPU E7340 @ 2.40GHz (16 physical cores)
24GB RAM
3x 15k RPM drives in RAID 0
Gig-Ethernet on a local LAN

Index Stats:

14.5 Million Documents
13 GB total size
56 fields (indexed and/or stored w/ various amounts of processing)
Fully optimized index

Benchmarks:

Used Apache Bench for testing purposes from another machine on the same LAN over Gig-E.

#!/bin/bash
echo "" > solr_results.log
for C in 2 4 8 16 32 64 128 256 512
do
N=$(($C*1000))
echo "ab -n$N -c$C" >> solr_results.log
ab -n$N -c$C 'http://solr:8080/solr/select?q=&qf=&fq=:&start=0&rows=20' >> solr_results.log
done

For the trends in red the lower the number the better.
For the trends in blue the higher the number the better.

Single index with no caching enabled

Single index with filterCache enabled

We can see here in the above graph that there were no results from the 512 concurrency test. This is because there was a deadlock in the Apache Tomcat server. The max number of connections was set to 512 with an overflow of 100. This is the cause of all the cases where there are no results for the 512 test case. Ironically the Single core without the cache managed to finish but the test with fieldCache on failed.

Multicore Index (2 Cores) with no caching enabled

Multicore Index (2 Cores) with filterCaching enabled

The higher the better in the following chart.

Requests per second across all benchmarks

The lower the better in the following charts.

Time per request across all benchmarks

The above graph shows the only test to finish successfully with 512 concurrent connections was the single index with caching disabled.

Time per request across all benchmarks (truncated view)

This graph is the same as the one before without the last two concurrency levels so you can see whats going on at the beginning of the benchmark. Its still hard to see but the multi-core sharded indexes are a bit lower that the single indexes. Its clear however at the higher concurrencies that the single indexes beat the multi-core ones hands down.

Ive attached a spreadsheet with actual numbers from the benchmarks since some of the charts are hard to read.

So there it is, take it as you will. There are definitely benefits to moving from a single index to a distributed multi-core sharded index. However, whether it works for your dataset and application is up in the air. After these benchmarks we decided that the multi-core index that had served us well on Amazon’s EC2 no longer worked well for us on our new managed hosting. We are currently running a single index at CitySquares.

Posted by Justin Leider
Filed in solr, Web Technology
Tags: apache, Performance, scalability, Search, Shards, solr, Throughput

2 Comments »

2 Responses to “SOLR Performance Benchmarks – Single vs. Multi-core Index Shards”

Shalin Shekhar Mangar Says:

May 16, 2009 at 6:49 pm
Thanks for the nice writeup Justin. I would like to highlight a few points related to multicore and distributed search support.

Multicore support was originally contributed for a use-case where users wanted to have multiple indexes on the same Solr instance where each core was dedicated to a different language. One of the assumptions was that one will use multi-core if the data did not need to be joined.

Distributed Search was built for the use-case where the index is just too big for one box and therefore needs to be split among different boxes but the data needs to be queried in its entirety. As you noted, one of the key assumptions was that if documents are assigned randomly to shards, IDF can be ignored in a distributed search.

It is the HTTP based design which enables one to use both these features together but that might not be the optimal design. If one can use a single index, it is better to go with that and avoid the overhead of distributed search. Also, in some cases, it is better to have multiple solr instances on the same box rather than have multiple cores to avoid GC overhead on JVM heaps.

Reply
Steve Pettigrew Says:

June 4, 2009 at 2:52 pm
Well done. As you said there is not a whole lot of information out there and comparisons are difficult to be made. We now have some (lot) more with the above study.

Just one question about the Dell r900 specs. Is it really 24GB RAM???

Reply

Justin Leider’s Web Think Tank