Johns Hopkins Blacklight Implementation

Sorting large result sets frequently times out

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: Milestone 3
  • Fix Version/s: Milestone 3
  • Component/s: None
  • Description:
    Hide
    Sorting large result sets frequently times out application, probably because Solr is taking too long to respond. Solr needs to be investigated, and things (our index, our solr settings, possibly memory allocation etc) need to be tuned so sorting can happen in a reasonable time.
    Show
    Sorting large result sets frequently times out application, probably because Solr is taking too long to respond. Solr needs to be investigated, and things (our index, our solr settings, possibly memory allocation etc) need to be tuned so sorting can happen in a reasonable time.

Activity

Hide
Jonathan Rochkind added a comment - 26/Jul/10 4:08 PM
I suspect this was just due to our "warming queries" NOT correctly including sorts in them. I will modify our warming querries to actually include all our actual in-use fields and sorts, so indexes will be properly warmed.
Show
Jonathan Rochkind added a comment - 26/Jul/10 4:08 PM I suspect this was just due to our "warming queries" NOT correctly including sorts in them. I will modify our warming querries to actually include all our actual in-use fields and sorts, so indexes will be properly warmed.
Hide
Jonathan Rochkind added a comment - 26/Jul/10 4:59 PM
Okay, I fixed the 'warming' queries to actually match our actual queries, including alternates for each of our sorts.

This seems to result in fast re-sorts. This is my hypothesis, that it was only the FIRST sort that was causing timeouts from the app, as caches had to be populated.

This definitely increases Solr startup time, as Solr isn't available until warming querries have been run. Solr startup time on our production 3 million doc index is now 5-10 minutes. Our eventual master/slave replication arrangement plan for indexing should make this not so much of a problem, I think. Or at any rate, we may need to figure out how to tune caches and warming better in the future.

But for now this seems like the solution. Certainly further tuning is called for in the future.
Show
Jonathan Rochkind added a comment - 26/Jul/10 4:59 PM Okay, I fixed the 'warming' queries to actually match our actual queries, including alternates for each of our sorts. This seems to result in fast re-sorts. This is my hypothesis, that it was only the FIRST sort that was causing timeouts from the app, as caches had to be populated. This definitely increases Solr startup time, as Solr isn't available until warming querries have been run. Solr startup time on our production 3 million doc index is now 5-10 minutes. Our eventual master/slave replication arrangement plan for indexing should make this not so much of a problem, I think. Or at any rate, we may need to figure out how to tune caches and warming better in the future. But for now this seems like the solution. Certainly further tuning is called for in the future.
Hide
Jonathan Rochkind added a comment - 26/Jul/10 6:40 PM
hmm, although the warmers that take a long time end up making commits take a long time and/or run out of memory.

I ended up having to delete the demo index and start over, every time I tried to do anything was a memory problem.

Leaving the warming queries there for now, but re-opening this ticket, needs more analysis. May need more RAM.
Show
Jonathan Rochkind added a comment - 26/Jul/10 6:40 PM hmm, although the warmers that take a long time end up making commits take a long time and/or run out of memory. I ended up having to delete the demo index and start over, every time I tried to do anything was a memory problem. Leaving the warming queries there for now, but re-opening this ticket, needs more analysis. May need more RAM.
Hide
Jonathan Rochkind added a comment - 23/Sep/10 12:05 PM
This seems to no longer be a problem, the sorting issue. Indeed the warming queries seem to have solved it.

We do still have some issues with mass indexes, things may need to be tweaked, but that's really another issue, the warming queries are the right way to go, I think, and appear to solve the sort takes forever issue.
Show
Jonathan Rochkind added a comment - 23/Sep/10 12:05 PM This seems to no longer be a problem, the sorting issue. Indeed the warming queries seem to have solved it. We do still have some issues with mass indexes, things may need to be tweaked, but that's really another issue, the warming queries are the right way to go, I think, and appear to solve the sort takes forever issue.

People

Dates

  • Created:
    01/Jul/10 12:21 PM
    Updated:
    23/Sep/10 12:05 PM
    Resolved:
    23/Sep/10 12:05 PM