Blacklight Plugin

'semantic' mappings for solr stored fields

Details

  • Type: New Feature New Feature
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.4
  • Fix Version/s: 2.6
  • Component/s: None
  • Description:
    Hide
    A proposed patch to provide a way to configure 'semantic' mappings for solr stored fields, from a semi-controlled vocabulary. For instance, to configure that a :title value is provided from the stored field "title_display".

    To explain why this is useful, consider CODEBASE-77, about RSS feeds. Currently as of this writing, our rss-generating template assumes that title and author can be found in certain hard-coded solr stored field names. This is obviously inappropriate, it needs to be configurable.

    We _could_ provide configuration aimed at the "presentation" of RSS. rss_title_field = "title_display" or something. That would be in line with the existing presentation-aimed configuration related to stored fields we have: We have configuration for labels to use for certain fields (config[:show_fields][:labels]); and we have also have config related to what field to use to put in a certain spot on a certain screen (config[:show][:html_title], config[:show][:heading], config[:index][:show_link] ).

    However, this threatens to become a voluminous list of presentation-level config parameters -- that in many cases are configured to the same thing -- for example in default out-of-the-box, config[:show][:html_title] == config[:show][:heading] == config[:index][:show_link] == "title_display".

    It occurs to me that semantic-level configuration could provide a more efficient, easier to config, re-useable and extensible framework for this. If there were a way to config which stored field were :title, this could be used by the RSS generation, by hypothetical Atom generation, by hypothetical OAI-PM generation, and possibly even as the default for things like config[:show][:html_title]. Presentation-level config could still be provided for certain views when neccesary, but in many cases shared semantic-level config could suffice, keeping things simple.

    Proposal in code:

    First you would describe your semantic mappings, something like:

      SolrDocument.field_semantics = {
        :title => "title_display",
        :author => "author_display",
        :publisher => "publisher_display",
        :language => "language_facet"
      }

    Note that the 'semantics' are just arbitrary symbols, so this leaves flexibility for extensions to define their own that you could still house there.

    Now, SolrDocument has a method that will actually take that config and produce a hash where the values are actual values from the document

    someDocument.to_semantic_values => {
       :title => "Title of the book",
       :author => "Author of the book"
    }

    Etc. Client code will actualy never look at the config values, only SolrDocument itself looks there, client code (like an RSS view), calls someDocument.to_semantic_values[:title] for instance.

    This means that the to_semantic_values method is available for Solr Document Extensions to provide custom additions or modifications to, if needed.

    Code in progress is/will be in: http://github.com/jrochkind/blacklight/tree/semantic_fields
    Show
    A proposed patch to provide a way to configure 'semantic' mappings for solr stored fields, from a semi-controlled vocabulary. For instance, to configure that a :title value is provided from the stored field "title_display". To explain why this is useful, consider CODEBASE-77, about RSS feeds. Currently as of this writing, our rss-generating template assumes that title and author can be found in certain hard-coded solr stored field names. This is obviously inappropriate, it needs to be configurable. We _could_ provide configuration aimed at the "presentation" of RSS. rss_title_field = "title_display" or something. That would be in line with the existing presentation-aimed configuration related to stored fields we have: We have configuration for labels to use for certain fields (config[:show_fields][:labels]); and we have also have config related to what field to use to put in a certain spot on a certain screen (config[:show][:html_title], config[:show][:heading], config[:index][:show_link] ). However, this threatens to become a voluminous list of presentation-level config parameters -- that in many cases are configured to the same thing -- for example in default out-of-the-box, config[:show][:html_title] == config[:show][:heading] == config[:index][:show_link] == "title_display". It occurs to me that semantic-level configuration could provide a more efficient, easier to config, re-useable and extensible framework for this. If there were a way to config which stored field were :title, this could be used by the RSS generation, by hypothetical Atom generation, by hypothetical OAI-PM generation, and possibly even as the default for things like config[:show][:html_title]. Presentation-level config could still be provided for certain views when neccesary, but in many cases shared semantic-level config could suffice, keeping things simple. Proposal in code: First you would describe your semantic mappings, something like:   SolrDocument.field_semantics = {     :title => "title_display",     :author => "author_display",     :publisher => "publisher_display",     :language => "language_facet"   } Note that the 'semantics' are just arbitrary symbols, so this leaves flexibility for extensions to define their own that you could still house there. Now, SolrDocument has a method that will actually take that config and produce a hash where the values are actual values from the document someDocument.to_semantic_values => {    :title => "Title of the book",    :author => "Author of the book" } Etc. Client code will actualy never look at the config values, only SolrDocument itself looks there, client code (like an RSS view), calls someDocument.to_semantic_values[:title] for instance. This means that the to_semantic_values method is available for Solr Document Extensions to provide custom additions or modifications to, if needed. Code in progress is/will be in: http://github.com/jrochkind/blacklight/tree/semantic_fields

Issue Links

Activity

Hide
Jessie Keck added a comment - 20/May/10 3:58 PM
I think that conceptually this is a really great idea. We have to have some sort of way to define these sorts of things for RSS/ATOM and naming convention alone won't work.
I look forward to seeing it as the code progresses.
Show
Jessie Keck added a comment - 20/May/10 3:58 PM I think that conceptually this is a really great idea. We have to have some sort of way to define these sorts of things for RSS/ATOM and naming convention alone won't work. I look forward to seeing it as the code progresses.
Hide
Jonathan Rochkind added a comment - 20/May/10 4:25 PM
Thanks much for the (positive, but thanks either way) feedback Jessie.

The code is actually pretty much done, pretty simple, few lines of code really.

http://github.com/jrochkind/blacklight/tree/semantic_fields

If you 'diff' it between master, you'll see it's not very much code.

What has not been done yet is any change to the existing config that COULD be superseded by the new semantic_fields.

For instance, right now there is config:

  config[:show] = {
    :html_title => "title_display",
    :heading => "title_display",
    :display_type => "format"
  }

Should I change the code to not reuqire config[:show][:html_title], but
just use semantic_field[:title] instead? Or perhaps do both: if you
don't WANT to set a special thing, it'll just use the semantic_field,
but if you actually do set config[:show][:html_title] it'll over-ride that.

If we know we have consensus on the general semantic_fields idea, then
maybe we can see what people would prefer with that existing config. I
don't have strong feelings, but generally tend to be for simplifying the
minimal required config, which is part of the idea behind semantic_fields.
Show
Jonathan Rochkind added a comment - 20/May/10 4:25 PM Thanks much for the (positive, but thanks either way) feedback Jessie. The code is actually pretty much done, pretty simple, few lines of code really. http://github.com/jrochkind/blacklight/tree/semantic_fields If you 'diff' it between master, you'll see it's not very much code. What has not been done yet is any change to the existing config that COULD be superseded by the new semantic_fields. For instance, right now there is config:   config[:show] = {     :html_title => "title_display",     :heading => "title_display",     :display_type => "format"   } Should I change the code to not reuqire config[:show][:html_title], but just use semantic_field[:title] instead? Or perhaps do both: if you don't WANT to set a special thing, it'll just use the semantic_field, but if you actually do set config[:show][:html_title] it'll over-ride that. If we know we have consensus on the general semantic_fields idea, then maybe we can see what people would prefer with that existing config. I don't have strong feelings, but generally tend to be for simplifying the minimal required config, which is part of the idea behind semantic_fields.
Hide
Jonathan Rochkind added a comment - 24/May/10 12:44 PM
Committed.

I have NOT changed any existing config that theoretically could be simplified using the semantic_fields stuff though. I'm happy to do the coding, if I get some feedback on what the 'right' thing to do is, otherwise I'm happy to leave it as it is (for now).
Show
Jonathan Rochkind added a comment - 24/May/10 12:44 PM Committed. I have NOT changed any existing config that theoretically could be simplified using the semantic_fields stuff though. I'm happy to do the coding, if I get some feedback on what the 'right' thing to do is, otherwise I'm happy to leave it as it is (for now).

People

Dates

  • Created:
    18/May/10 4:24 PM
    Updated:
    24/May/10 12:44 PM
    Resolved:
    24/May/10 12:44 PM