Sunday, December 7, 2014

Four possible Web Scale Discovery Future Scenarios

I was recently invited to give a talk at the Swedish Summon User Group meeting and I presented about possible scenarios for the future of web scale discovery.

As Web Scale Discovery in libraries goes back to roughly 2009, most academic libraries have by now had 2-3 years of grappling with the concept. Using the hype cycle concept many if not most have moved past the pains of trough of disillusionment and are moving up the slope of enlightenment or even at the plateau of Productivity as pointed out by Dave Pattern.

For example, mobody today here believes that web scale discovery in its current form can replace subject databases totally, an idea that was mooted during the peak of inflated exceptions.


http://en.wikipedia.org/wiki/Hype_cycle#mediaviewer/File:Gartner_Hype_Cycle.svg


Still one wonders, what lies in store for web scale discovery in academic libraries? How would things looks like in 5 , 10 or even 15 years time?


How well is web scale discovery doing?

To forecast possible futures, we need to know where we are now. Generally I think many of us (where us = the odd mix of technologists including ILP librarians who have worked on implementing discovery services though we are of course biased)  think it is doing decently well based on any one of the criteria

a) Usage of web scale discovery is increasing year on year typically for most institutions (eg See University of Michigan's experience)

b) Statistics on Openurl origins show a substantial share from Discovery services (varies by institution but typical share would be 40% at least for most destinations)

c) Studies are appearing that show when controlling for various factors, institutions that have web scale discovery systems Summon or Primo , do seem to increase usage per FTE versus institutions that do not have any such systems.





This bright picture seems to contradict the sense we get (via surveys etc) that increasingly more users are favoring Google, Google Scholar, Medneley etc and that the University Library is slowly having a declining role as a Gateway for discovery.


 Ithaka S+R US Faculty Survey 2012 above shows a slight improvement in 2012, though the trend is generally down.

Lorcan Demseley notes,“They have observed a declining role for the "gateway" function of the library, partly because it is at the wrong level or scale. Google and other network level services have become important gateways.”

So what is happening here? Is there a shift to other non-library sources of discovery? Or isn't there?


Lots of unknowns

One way to reconcile this contradiction is to note, undergraduates are by far the largest group typically, and we know undergraduates love web scale discovery while other groups particularly faculty do not, so it is not a surprise to note usage goes up when web scale discovery is initially implemented while noting most sophisticated users could be favoring something else.

A typical year on year increase in web scale discovery is good news. Still I suspect part of it is inertia, my experience is it takes a while for any major change to filter down to users. So if you implemented web scale discovery in say 2012. It is likely you will continue to see gains until 2015 (assuming a 4 year course), when the freshman who were the first batch to start using web scale discovery become final year students. As prior to that, the more senior students will likely to ignore it.

It is somewhat surprising to me even today that many students are surprised to know our default search tab handles articles as well as books though part of it could be a UX issue.

It's also unclear to me, what a say 10% increase usage of your discovery system year on year really means since we don't know how much the "demand for discovery" went up. So say between 2014 and 2015, required "discovery incidents" that one could conceivably use your discovery went up say 100% more but your usage of web scale discovery only went up 10%. Doesn't sound so good right?

Lastly, web scale discovery systems also work beautifully as known item article search tools by simply cutting and pasting articles, so the increased usage seen (a), or frequent use as openurl orgins (b) or even increased usage compared to peer control institution per FTEs could simply be due to increased ease in finding known articles as opposed to really helping in discovery.

In short, there is a lot of uncertainty here. It's not easy for libraries to know if Web Scale discovery has helped shift the balance of discovery from outside the library back to the library in both the short and long term.

I suspect the answer is probably not, indirect evidence I have on the use of the proxy bookmarklet (which is usage primarily when user does not start from library homepage) and link resolver usage of Google Scholar of my institution seem to suggest even though the implementation of Summon was successful, usage of both tools continued to rise. In fact in the following year after implementation of Summon, usage rose even higher than before. If this trends continues.....



Four Discovery Scenarios

In the long run, I suspect the ultimate fate of web scale discovery will fall into one of these four broad category


Discovery Dominant - Web Scale Discovery continues to grow and become the prime source of discovery displacing google, google scholar and other external sources (Unlikely)

II Discovery Deferred - Web Scale Discovery continues to be used along side other non-library tools. Most often it will be used as a secondary source after looking at other places first (Possible)

III Discovery Diminished - In this scenario, Web scale discovery services have been displaced in their discovery role, and are used for known item search only. Kinda like a glorified catalogue, except it includes article, conference etc titles (Perhaps)

IV Discovery Decommissioned - This is the most extreme scenario, where the whole system is removed and doesn't even play a role in known item searching. (Unlikely)



Discovery use matrix

After reading all the various arguments about the position of search in libraries, I was initially confused. But let's try to create a framework here.

Let's consider 2 dimensions of use here.

Firstly, are users using the web scale discovery for known item search or for discovery?

Secondly, it is their primary go-to tool? Or is it secondary?

Below shows one hypothetical use of one library discovery search.


Known item Search Subject Search
First Stop 50% 10%
Secondary Stop 30% 40%
Total 80% 50%

We could split this up further into content types, say search of books vs articles but let's keep it simple.

In this hypothetical example , say users have a "discovery need" 100,000 times a year.  (Look at the 2nd column)

10% of the time, they go straight to your discovery service and starting typing keywords.

40% of the time, they do some preliminary search elsewhere eg Wikipedia, Google, Google Scholar, but they do eventually end up doing subject searches in library discovery for whatever reason.

50% of the time, they totally ignore our tools and use something else such as Google Scholar, Mendeley to search.

Note : If one cares about library supplied tools, then one would need to take into account subject Abstract and indexining databases provided by the library such as Scopus, but I will ignore this for now.

Similarly say users have a need to find a known item 100,000 times a year (Look at the 1st column)

50% of the time, they go straight to your discovery service and starting searching for known item by entering article or book titles.

30% of the time, they do some preliminary search elsewhere eg Wikipedia, Google, Google Scholar, but they do eventually end up doing known item search in library discovery.

20% of the time, they totally ignore your discovery service for known item search. It could be they found it using another tool which could be either library related such as traditional catalogue, Browse A-Z list, or non-library related Google, Google Scholar, or it could be they use link resolvers from Google Scholar, Libx, or they just give up.


Discovery is not just keyword based search

One thing to note is that when there is a desire for discovery, search based tools like web scale discovery or Google Scholar are not the only options.

There are recommender sources or systems (both humans and machines) as well as discovery based on citation based methods. In this case, I am assuming discovery here is keyword based discovery. In other words if there is a need to enter keywords, how many times do people use the library discovery service vs other systems.

It possible to envision a future where a powerful Google Now type recommender system becomes so dominant, keyword based discovery becomes obsolete but let's put aside this possibility.



Scenario I : Discovery Dominant 


Known item Search Subject Search
First Stop Variable High
Secondary Stop Variable Moderate
Total Variable Very high

This would be the ideal scenario. Our discovery tools become the dominant tool for discovery as first stop.

How popular such tools for known item search would be less critical, though I suspect it's easier to optimize  for one rather than both,


This scenario unfortunately I think is unlikely. It implies at least one of the following

a) All other non-library discovery sources dry up.

For example, Google giving up on Google Scholar would be a good example.

While Google Scholar is doing well now after 10 years, this has always been a possibility. Still, is hard to believe this will happen for all the competitors.

b) We created tools so compelling that it makes all other sources pale in comparison.

What would such compelling differences be?

I suspect the following would NOT be them

a) Gesture/Motion based inputs - eg Kinect/ Leapfrog

b) Augmented reality outputs - eg Oculus

While such features might eventually be included in future discovery tools, they don't give such tools any competitive advantages  They would be the equivalent of mobile responsive design features for example.

The following might give us a fighting chance

Personally tuned discovery layers - Championed by Ken Varnum - see his two recent presentations Personally-Tuned Discovery and Library Discovery: From Ponds to Oceans to Streams . The idea as I understand it is that libraries can create specially tuned features , particularly scopes that appeal specifically to their communities. So I suppose in the context of my institution we can create filters and features that are specially designed to work well with research on southeast asia. A more generic global system is unlikely to be able to tune the system to such an extent.

Improved semantic search - No doubt Google etc are working on this. However libraries particularly publishers like Ebsco have tons of expert assigned subject headings in specialised theasuri. Would a cross walked "mega theasuri" be leveraged to improve relevancy? Do note , I have practically no idea how linked data will come into this.


Scenario II : Discovery Deferred


Known item Search Subject Search
First Stop Low to Moderate Low
Secondary Stop Moderate to High Moderate
Total Moderate Moderate

This scenario is the scenario I think that is closest to the current situation. Our discovery tools are seldom the first stop in the discovery process. But many users do use it in combination with Google , Google Scholar etc. So overall use is moderate.

Use for known item searching is generally low to moderate. While tools like library links in Google Scholar , use of link resolvers in mendeley and other systems means that users can use the link resolver to check for availability after discovery direct, there is still sufficient numbers of users who do put in article titles or book titles in discovery services.

One can encourage use of discovery service has a secondary discovery source by various means.See for example 6 ways to use Web Scale Discovery tools without visiting library sites which provides some ideas, chief among them is to use Libx.  If you have Libx with Summon you can even pull off something interesting with the API.




For driving subject searches/discovery from Wikipedia, I've blogged about John Mark Ockerbloom's Forward to Libraries service that takes the title of the Wikipedia article you are on (among other things) and does the same search in your discovery service.









Scenario III : Discovery Diminished


Known item Search Subject Search
First Stop Moderate for books Extremely Low
Secondary Stop Moderate for books Extremely Low
Total Moderate for books Extremely Low


This is basically the situation prior to web scale discovery in 2009. Federated search wasn't successful , most users went to the library catalogue to do known item search for books and to a lesser extent search for books but not articles.

So with Web Scale Discovery we are happily over with this scenario and are in at least Scenario II, where some discovery at least happens in our tools right?

Is there any chance we should fall back to Scenario III?

I think Utrecht University believes so and was one of the first if not the first to talk about giving up discovery to focus on delivery or fulfillment.






I've blogged about this in the past. Essentially the idea here is that Google etc have won the discovery battle already and there is no point trying to compete with them.

Libraries should focus on supporting fulfillment. In other words Discovery occurs elsewhere say on Google or Google Scholar and we provide the way to check/obtain full text.

Google Scholar, of course has the well known library links program.



A newer idea would be collaboration that Worldcat is working on with Wikipedia that allows linking from references in Wikipedia to full text or library catalogues.

 http://en.wikipedia.org/wiki/Wikipedia:OCLC/Search


The title of the talk  is A library without a catalogue, though interestingly while the speaker has decided they won't bother to implement a web scale discovery system and will give up their federated search, they are not sure if they can give up the the normal traditional catalogue (stated at the end).

I suspect that while one can give up web scale discovery, giving up a catalogue for known item search is harder. Sure, one can contribute holdings to Worldcat, which will allow the holdings to be shown from various places including Google books but can this truly work for everything?



I somehow doubt it would be likely any library wouldn't have a search facility for known item search- or a comprehensive search of what is available to the institution since it wouldn't be a lot more work to do so especially after the effort spent on populating the ILS or Knowledgebase.

Under this scenario, libraries would maintain something similar to classic catalogues that are optimised for known item searching of books, DVDs etc. The difference is they may also include an article index but unlike Discovery services available between 2007-2014 they give up the pretense of serving discovery at all.

Some libraries may totally dispense with this if most major outside discovery tools have good linkages with link resolvers, catalogue apis etc. But as I said this is unlikely.

By focusing on known item search of books and articles, the relevancy issue would be much easier solved then trying to balance discovery and known item search needs. A bento type box search might even make more sense.



Discovery Decomissioned

Known item Search Subject Search
First Stop Low to zero Zero
Secondary Stop Low to zero Zero
Total Low to zero Zero

This is the most unlikely scenario. In this scenario, use of library discovery tools, for both known item and subject search is utterly destroyed!

This most unlikely Scenario was mentioned in The day library discovery died - 2035 . In this scenario, open access has won out completely, with open access been the norm in both books or articles.

How academic libraries may change when Open Access becomes the norm details the implications.

After summarizing the argument in the above scenario about losing the war in discovery and focusing on delivery, I proposed that the rise of open access

"has the potential to disrupt even the delivery or fulfilment role. In a open access world when most articles or perhaps even books (open access models for books exist, as well "as all you can eat" subscription services like Scribd, Oyster, Amazon Prime) can be gotten for free, academic libraries' role in both discovery and fulfillment will be greatly diminished."

In such a world, libraries would no longer need to maintain long lists of holdings for both books/traditional catalogue items as well as article index. Libraries don't even

As everything or nearly everything is open access, discovery and delivery would be coupled together. Where you do your discovery is where you get delivered the articles or books. There would still be some portion that would be unavailable immediately (special collections, older books , articles not digitized) but in time they would be reduced.

Even in such a world, some argument there may be a role for libraries in terms of aiding discovery by providing better curated collections, metadata etc - based the personal tuned discovery services argument above. It is of course unclear if that will be enough.


Conclusion

This is an extremely speculative piece of course but had to get it off my chest.

If you found it thought provoking, or at least entertaining do comment or share.





Saturday, October 25, 2014

How should academic library websites change in an open access world?

In How academic libraries may change when Open Access becomes the norm  , I argued that as open access takes hold, academic libraries will increasingly focus on Expertise based services like bibliometrics, open access publishing, GIS services, Research data management and more.

The question is this, with this change in focus how should academic libraries reflect these changes in priorities on their library website? It is after all the digital front door of the library and it sends out a message on what the library is about.

A typical academic library in 2010s

Academic library websites in the 2000-2010s are all focused towards search, typically with a single search box or a tabbed search box (whether horizontally or vertical). Most sites would also have links to databases (the number of links varies) plus links to common and popular (read undergraduate) services like printing, booking of study rooms, opening hours etc.  Lastly some would devote some amount of space to news and events the library wants to promote typically in the form of some rotating banner.

I think it is fair to say this describes 90% of most academic library websites and most despite their efforts are bursting to the seams with content.

What experts think are academic libraries best practices in 2013/2014


Emily Singley currently with Harvard Library reviews academic library sites she likes, and the #1 site for her 2013 reviews is Stanford University. Which looks like this.



Noted UX expert, Aaron Schmidt was recently asked for Library Websites worth looking at.  

One academic library he went for, was MIT Libraries.



Both sites had fairly major redesigns within a year or two and spot what can be considered current state of art in academic library design. A partial hint comes from the fact that they spot dynamic opening hours widgets that dynamically state what facilities are open today, a relatively new idea compared to just static lists of opening hours on most sites now.

They devote roughly the same amount of space to the search box (though one uses a simple one box vs tabbed approach), opening hours and a news box, with the major difference being that MIT devotes a lot of screen space to displaying their research guides and experts.

In this difference hints at a possible challenge.

As academic libraries in the 2010s are adding new expertise based services and the question we face is how do we incorporate such services into the library website? How do you incorporate your shiny new Scholarly Communication team, or GIS service on your library front page? Or should you even try?

Given how political changing websites can be, easiest is to just give up and tuck the service under some hidden drop down menu or some secondary level link and hope your liaisons can make such services known.

Stanford for example tucks GIS under a drop down in "Computing - equipment & services"



Besides an argument can be made that such services are very niche (even in fairly research intensive universities like my own, the undergraduates and graduate students far out-number the research and faculty staff), so one shouldn't spend so much screen space on them.

Then again even the best liaisons can't reach everyone. And increasingly as more library effort is spent on such services it seems perverse that they don't have or have a very minimum presence on the library websites. This parallels slightly the older development where the budget on eresources increased but the manpower devoted on managing it was restricted to just one eresources librarian.

To anticipate the whole argument, I believe that while a complete redesign would be eventually necessary to showcase the new focus and provide browsing affordances, a short term solution that is increasingly adopted by leading academic librares is the bento style search box or as Lorcan Demseley calls it full library discovery

This has several advantages, one of which is it avoids a sudden shift and hence avoids the need for political battles. Users also are conditioned to search and a search box that allows you to find articles,books, AND expertise/services would scale and transition smoothly as libraries slowly (or quickly as it may) transition away from discovery/search/purchaser business to expertise business.


The age old debate on library websites

It's perhaps only a slight exaggeration to say that designing the library front page (hence forth library homepage) is a highly political affair but leaving that aside is there a "objective" way to do it?

The difficulty lies in the fact that Libraries are a mix of totally different classes of content. Traditionally the debate on library websites has always been between adding features to find "stuff" that users need to do their assignment or research (eg library databases, ebooks etc)  and using the screen space to market services/events/expertise.

The difficulty is so great, a recent 2014 article asked  Do we need two library landing pages?

The argument against Discovery on websites in 2009

But let's go back to 2009 first where Steven Bell in The Library Web Site of the Future argued that library websites should shift away from what these days we call "discovery".

He argues that academic library websites of the time pretty much fail to connect users to the full range of content library offers, as most users just use it to link to their favourite databases anyway. So libraries have failed in that task anyway despite a screen full of links and should give up trying to play that role.

He writes

"Rather than attempting to mimic search engines academic librarians should aim to differentiate their Web sites. They should devote the most eye-catching space to information that promotes the people who work at the library, the services they provide and the community activities that anchor the library’s place as the social, cultural and intellectual center of campus. That shifts the focus from content to service and from information to people."

He then asks what do we do with all the expensive databases that library buy if we don't put links on the library website?

Bell clearly anticipates some of the more common ideas currently in play in 2014. He talks about how users "invent their own backdoor routes to the content" and how we should be embedding library service so  “we’ll be where you are.” (eg Library Links in Google Scholar) or as another meme goes "Discovery happens elsewhere".

However he does propose a rather unusual idea that the LibGuides that librarians create for specialised courses and serve as a substitute for listing databases on library websites. This drew a response by Catherine Pellegrino - The Library Web Site of the Future: thanks but no thanks.

The main argument seems to be 1) The idea of providing LibGuides for every need can't scale and 2) Following Steven's idea would lead to the now famous XCD comic about the difference between what users want to find on a University Homepage and what the marketing people want to sell.


Has the argument changed in 2014?

I would argue that Steve Bell's argument for devoting less space to content for "finding stuff"  is even stronger in 2014 then in 2009 for the following reasons.

Firstly back in 2009, web scale discovery was just being launched in the first pioneer libraries. Whatever shortcomings there are in modern web scale discovery services it definitely does a job better at the exposing the full range of content a library has then 50 links to databases.

While we know that users no longer start their research from the library homepage, they do eventually return to the library homepage. Part of the reason is to check for known content, in which case our web scale discovery tools are light years ahead of the federated searches of 2009.

Because of that academic libraries of today can get away with using less space for searching features and yet be more effective.

Secondly, I have controversially speculated that as open access takes hold, Libraries roles in traditional discovery and even fulfilment/delivery for users will diminish .

As of 2014, whether libraries should bow out of discovery roles is been heavily discussed, for example see
Does Discovery Still Happen in the Library? Roles and Strategies for a Shifting Reality. One of the interesting points he makes I think is that current web scale discovery services are failing to match the Googles of the world, not due to the size of the index but the lack of "deep personalization".

 Jill O' Neill asks "Should libraries abandon investment in formal discovery services (of whatever ilk) and leave the job to the somewhat mysterious algorithmic mercies of Google, Amazon...."  Like Roger Schonfeld, she asks “Can libraries step forward to play a greater role in current awareness? Should they do so?”

For other responses see the follow up post by Roger Schonfeld

This debate is up in the air, though  Barbara Fister points out even in the day pre-web when it came to discovery academic libraries were  "supplementary, not the only way discovery happened". 

For now, I am going to be pessimistic and say No, libraries should eventually (when I do not know) prepare to walk away gracefully.

This is an argument that is not one that I make happily. However it does make the tussle over the academic library website much easier because there is one less thing off the agenda to focus on. 

That said, personalised recommenders strike me to be a totally different class of services then simple search boxes or links to databases. For example a Journaltocs type system enhanced with Google Now level smarts and perhaps mediated by knowledgable librarians strikes me as fundamentally different and something users would want assuming we can do it better than Google etc. But those don't require a search box....


The library website of the future will focus more on services and showcasing expertise

The typical academic library of 2014 now has a lot more services or put a lot more head count on them than 10 years ago and this will accelerate as the library loses it role for discovery and fulfillment.

Most typical academic libraries (granted of at least a certain research intensity), will be at the very least be working on promoting open access and populating institutional repositories, perhaps managing APC funds. Most will also be assisting with bibliometrics, benchmarking etc.

Depending on how progressive the academic library is they might be also be supporting other parts of the research life cycle.
  • GIS services
  • Data research management services
  • Grant management - playing a role in CRIS/IR integration 
  • Supporting innovative teaching environments like MOOCs

As the library shifts to devote effort to such initiatives, the academic website of the future will have to correspondingly devote more screen space to such services. Slowly there will be a shift towards using more and more space on such services and less on searching features.

Could we see a day where search boxes become de-empathized, or even disappear totally and exist merely as links as a rare few libraries do?



How much space are academic libraries devoting to search vs marketing now?

A Use of Space: The Unintended Messages of Academic Library Web Sites a study of 50 academic libraries and how they devote screen space to various categories of resources and services is probably worth looking at if you want to know what academic libraries are doing now.

Somewhat to the average space taken by marketing/PR is pretty high 28.2% (excluding white space etc) and and is ranked first.

Multisearch box comes in 2nd on average using 10.2% of used space.

The paper points out variance is high, range of space used for marketing/PR varies from 4.6% to 67.9%, with the paper heavily criticizing the instance of 67.9% usage.

According to the paper one of the most important findings is

"Occupying an average of 28.2 percent of the used space, promotion/PR is clearly important for all sites. This indicates libraries are trying to engage their users with more than just text-based communications. The message seen in this content is libraries know they have value to provide and are working proactively to connect users to these essential resources and services rather than waiting for users to locate them on their own."

Not bad at all isn't it? Academic libraries are trying to market... That said Steven Bell in The Library Web Site of the Future would argue

"It’s not that academic library Web sites completely ignore marketing. It’s just done badly. News about the library’s programs, events or new resources are often crammed into a corner of the page, are limited to small bits of text or are relegated somewhere out of the F-zone, the area, according to usability experts, to which most web users’ eyes naturally gravitate. Those prime real estate areas are instead dedicated to lists of links to catalogs, database lists and things with names that mean little to anyone other than a librarian."

So his problem is that such marketing are not prominent enough. 


Marketing - a dynamic short term effect?

I think most marketing/PR type content consist of rotating banner type ads, which markets whatever is the "favour of the month". Are they effective in pulling in eyeballs and clicks? My own personal experience at my institution is while they don't beat mass email marketing, yes they are effective (contingent on positioning on your website).

That said, I am not a big fan of such dynamic content at least for long term efforts. So your library just did a big push for "shiny project or service of the year" and you splash it or over the big rotating side banner. All well and good for a quick boost in publicity but not something you would rely for in the long term on for a core, essential part of your library offerings.

If academic libraries are going to move toward expertise based services where librarians are your greatest resource, I would advocate a permanent, prominent spot on your library homepage to advertise your library expertise and services, what MIT Libraries does for their libguides and experts is a start.


Are bento style boxes or full library discovery the (short term) answer?

One must be careful though of avoiding the problem pointed out by the famous xkcd webcomic





                                                                http://xkcd.com/773/


A redesign that goes too far , too fast before users are ready could backfire. The open access tipping point isn't easy to predict so a redesign that moves away from discovery is difficult to time.

Tons of user testing needed here.....

A possible solution is the bento style idea. I've written about the bento style idea at least half a dozen times.

Here's again the famous NCSU one that started it off.




Notice this form of bento style is distinguished from the following type of display of search results.




Lorcan Demseley called it the difference between full library discovery and full collection discovery.

The Villanova University Library example  which is common with many Universities using Vufind etc, is still focused on showing only standard results from discovery systems including books and articles.

 I have argued that this form of presentation is designed to help with relevancy issues. This is particularly useful for users who only have interest in typical catalogue results such as books or multimedia material, which may be buried in a more conventional "blended" single style result.

It also helps to cater to more sophisticated users who dislike the all in one blended approach and prefer results segregated by format.

However this still doesn't go far enough. The way ahead I agree is "full library discovery" and not just full "collection discovery".

The next generation website and also search will have a bento search that includes various sources, not just collection results, but also services, people expertise results.

This can be accomplished most simply by pulling in results from a general website search. More refined approaches would pull in anything from


  • Libguides
  • Faqs
  • Some smart matching of Librarian profiles (See Mlibrary)
  • Best bets - whatever is popular or uniquely searched by your users


Why is this important? Simply put, this allows the academic library to showcase not just the collection but also expertise and services.

So while Stanford library does not make it quite so easy to find GIS services and expertises directly by browsing, a search will surface it easily.




It is also a low risk strategy as compared to revamping the whole academic website.

I have not done a recent survey on academic libraries that use bento-style results from their search but it is likely to be popular with the biggest and most research intensive libraries.

MIT Libraries currently does not have it yet but their environment scan report prior to implementing Ebsco Discovery notes

"With the exception of BU, the sites that did offer a “Search All” option on their homepage, typically pointed users to a homegrown “composite library search” application that unified search results from multiple sources in a tabular user interface (see Stanford, NCSU, Dartmouth, Columbia, Virginia); with the exception of Stanford, these sites utilized the discovery service as a data source (via API) to populate an articles/e-books section of the composite library search results" and plans for one in the future.

Yale University is also implementing a quicksearch that is supposedly inspired by the Columbia example., though the current example includes only collection based boxes.

What are the drawbacks of a bento style box?

The most obvious drawback is users are used to Google and Google Scholar style results.

Here's one way to put it.

"At present, we are not using bento. This is because our community is familiar with Ebsco's blended interface, in which the results are presented as a list, ranked by relevancy based on the available metadata."

From here to discovery

Some argue Google, does some degree of segregation, though I would argue not to the level Mlibrary or NCSU (another interesting example) does it.

Recently work by Karen Mills of the Open University What do students want from library discovery tools?
suggests that students are confused by Bento style results. This is probably due to confusion with the labels and terminology.

I suspect, the effectiveness of bento style boxes would be affected by the number of boxes and the type of users. Freshman with almost no academic experience will probably dislike bento style. On the other extreme, a experienced graduate student or faculty with good mental models of research would appreciate a bento style display especially if they are predisposed to utilize the other non-collection based services (eg consulting of librarians) that such a search would throw up.

As such, I would expect the most research intensive University Libraries who have a higher share of such advanced users to push for bento style searches, while community colleges might go for a blended google style results display. This is what we see of course, though this could be driven by the fact that bento style display results more resources to implement since currently the major web scale discovery services do not provide bento style results out of the box.

A personalised approach that allows one to pick either display is possible of course.


Conclusion

Let me end by quoting Bell again.
"Academic libraries must promote their human side. The library portal experience should emphasize the value of and invite stronger relationships with faculty and students. That means going beyond offering a commodity that, by and large, the user community can well access without the Web site. The next generation academic library Web site must leverage what academic librarians can do to help faculty and students improve their productivity and achieve success."

With the rise of open access, the above has never been more true.

How would such a website look like, when the academic library is freed from the responsibility or duty of discovery? In the long term, I believe it should be radically different but that is for the future say 2020ish. For instance how would a "people/expert first search" look like? Would it simply be a CRIS system like Vivo?

Thursday, October 2, 2014

From Confusion to Expertise - an experimental post on Medium


Trying something new this time. I have started to post on Medium. Read the post "From Confusion to Expertise" there.

Some brief impressions

  • The interface is indeed as clean and well designed as I have heard, allowing writers to knock out simple yet professional looking posts.

  • One of the selling points of Medium, where you can easily submit to "collections" is gone. In the past you could submit your stories to any collection without prior invites. You could also submit one story to more than one collection. It was one of the selling points of Medium, where you only followed collections instead of individual posters. 


Some collections on libraries on Medium 





I have 262 followers on Medium. I think they found me mostly via Twitter


  • The interesting feature to add comments at the paragraph level is something I wonder if will encourage more comments?

  • Always wondered how much of my long rambling posts are read. Medium has stats on this.



As you can see, without any extra marketing via my own Twitter or Facebook networks, Medium posts don't seem to draw a lot of views after 1 day.


Conclusion

All in all, while Medium is interesting, I wonder if it really does anything unique enough to be worth abandoning existing platforms like Blogger or even Tumblr.

This holds for whether one is thinking of doing it for institutional libraries or academics thinking of using blogging to spread their work. I haven't gone out of my way to look for academics doing so but I notice Adeline Koh of Associate Professor of Postcolonial Literature, Director of DH@Stockton at Richard Stockton College doing so and creating a interesting collection on "Chinese Privilege in Singapore" (Singapore is 70% ethnic Chinese).


A small plug for Internet Librarian International 14



I don't do this often (or at all!) but would like to mention that the Internet Librarian International 2014 will be on at the end of October.

I was at the 2012 event (read about my experiences here) and I can sincerely say it remains my favourite conference so far. While I have gone to bigger conferences eg ALA Annual, ILI 2014 seems to my type of conference.

As I blogged back then "I suspect if you like most of the things I blog about here, the conference would be a natural fit for you." I have gone to a few more conferences since then, and my thoughts on this haven't changed.

Every year since then, I have received brochures for ILI, and the talks and speakers on display are always excellent with a good blend of speakers from England, Australia, US and Scandinavian countries etc. This year is no different with star speakers such as Jan Holmquist (on Gamification) attending.

It's a pity I can't go to ILI more often, but if you do have the funds, do consider ILI 2014.

Disclosure: I am listed as on the advisory board on ILI 2014, though I am embarrassed to admit, I haven't done much advising. 


Wednesday, August 20, 2014

How academic libraries may change when Open Access becomes the norm

Like many academic library bloggers, I occasionally fancy myself as a "trend spotter" and am prone to attempts at predicting the future.

The trend I am increasingly convinced that is going to have a great impact on how academic libraries will function is the rise of Open Access.  As Open Access takes hold and eventually becomes the norm in the next 10-15 years, it will disrupt many aspects of academic library operations and libraries will need to rethink the value-add they need to provide to universities.

The events of the past year have convinced me that the momentum for open access is nearly unstoppable and the tipping point for open access has or will occur soon.

To be fair, this is a pretty easy call to make, Richard Poynder an independent journalist who has covered open access for over a decade and is as close as an independent observer on such matters as you can get (he claims not to be an open access advocate, though I find his views quite librarian friendly) says that open access is inevitable, the only question is how it will occur. 

I find myself identifying with him, as unlike some librarians, I don't consider myself a really big open access advocate. The fact that I believe that open access will take hold, neither fills me with sheer joy nor unhappiness.

That said I know enough to talk about it to most ordinary researchers in a general way after reading blog posts, articles and books on the topic.  I freely admit squabbles between open access advocates on the exact definition of open access, on the best way to provide/reach it etc often threaten to confuse me.

What I think I do have is some knowledge about some aspects of academic libraries. Some (but not all) open access advocates claim that the goals of the open access movement is more about access then affordability and isn't really about solving the serials crisis (that might or might not occur depending on the route taken) or even about libraries or librarians. As I identify myself as a librarian I love to think what it means for academic libraries when open access becomes the norm.

This post is going to assume that sometime during my professional career in the next 10-25 years, 50%-80% or more of the annual output of new papers will be open access in some form. Whether this will be mostly via through Gold OA or Green OA I do not know. Some models I have read predict also additional disruptions to the scholarly communication system eg. post peer review models may also occur that are not strictly necessary for open access. 

I am not going to argue why I think open access is inevitable, though I think policy changes by governments is the most obvious reason, but feel free to leave comments if you disagree.

What I want to explore in this blog post is its impact on academic libraries. 


1. Libraries roles in traditional discovery and even fulfilment/delivery for users will diminish 

We've known for a long while  that almost no student begins their research from the library homepage and this is likely to occur even for researchers of the future as the younger phd students are showing a preference for non-library "web scale" tools like Google Scholar.

The same report showing that no-one began their search from the library home did show that in the end 56% of users did use library materials via cross referencing of information sources (e.g Library Links in Google Scholar), so in the end the library did play a part in their research though more as a fulfillment role and less in a discovery role.

This has prompted some to argue academic libraries of the future to "think the unthinkable" and focus on delivery of full-text and books and give up on the discovery roles. This is a view that is far from been the majority view with dissenters saying that such a move is defeatist and object that it is risky to rely on for profit entities like Google on such a important role or that libraries can provide personally tuned discovery layers that can serve their communities better than search tools operating at the network level like Google Scholar or Mendeley.

But the rise of open access has the potential to disrupt even the delivery or fulfilment role. In a open access world when most articles or perhaps even books (open access models for books exist, as well "as all you can eat" subscription services like Scribd, Oyster, Amazon Prime) can be gotten for free, academic libraries' role in both discovery and fulfillment will be greatly diminished.

What proportion of articles are free online now? I've seen estimates that vary from 24% free articles (all years) in Google Scholar and Microsoft Academic Search  to as high as 48% for papers published in 2011.

Assuming the higher estimates of the newer articles are true (though I doubt so), we may already be at or near the tipping point of 50% for the annual output of articles each new year.

As it is, we already know from the  Ithaka S+R US Faculty Survey 2012 that when faculty can't get access from our library collection, they will search for free access online (80%). This option for searching for free copies is even more popular than ILL or document delivery. This strategy is going to become increasingly more effective as open access becomes the norm.




That explains why tools like Lazy Scholar a Chrome extension that automatically scans every web page you are on to identify articles mentioned and provides a link to the pdf if a free version available in Google Scholar seems to be so popular.

You can expect tools like Lazy Scholar to become increasingly effective as the tide for Open Access turns.

Conversely as argued in the day library discovery died -2035, web scale discovery services by libraries are likely to become even more irrelevant.

Lorcan Dempsey has been writing for years now about how researchers prefer gateways at the "network level" as opposed to the institutional level, but institutional discovery services have always had the advantage of showing all the journal articles you have immediate access to and nothing else and this can be helpful.

But in a world where the vast majority of journal articles are open access, we don't need institutional discovery services to make such distinctions.

Unless academic libraries can provide distinct reasons for why their search services are better than what the likes of Google Scholar, Mendeley web search etc can offer, eg personally tuned discover layers , I can't see why we will need such institutional level discovery layers.


Collection development, electronic resource management is also going to be very different.

At extreme levels of open access say 75%, one wonders if there will be much of a team in the library working on traditional librarian duties of subscriptions and electronic resource management (parts relating to managing link resolvers, knowledgebase management etc).

Services relating to document delivery may diminish in importance as well.


2. Libraries might make a greater focus on Special Collections and move into publishing/hosting journals

So does this mean the technical services portion of academic libraries will be less important?

Not necessarily.

Most obviously if the green route to open access takes off, perhaps along the "The Immediate-Deposit/Optional-Access" , more and more resources will be channeled towards the management of institution repositories.

Beyond simply serving as a repository, some libraries are experimenting also with "layered journals", such as what University College London is doing. Essentially this involves libraries moving into the publishing business by converting institutional repositories to become publishing platforms. For example, UCL Press is now a department within the institution’s Library Services.  Using the open source , Open Journal System (OJS) and the institutional repository as a storage system, the library is publishing open access journals. There are also many open access journals published via Digital Commons.

Whether academic libraries have the skills, knowledge and incentive to play such a role and retake the scholarly communication system is a big question.

Beyond hosting open access journals, academic libraries will also probably put greater focus on their special collections.



As argued by Lorcan Dempsey, libraries will have to focus their energies on items with high uniqueness (in few collections), in other words special collections. In the future, the prestige of a academic libraries lies in not how many journal articles or books it can provide to its community, but how much unique content that is made available by the library to the world.

Under such a model , academic libraries would perhaps resemble museums, carefully curating and preserving rare artifacts.

Similarly, in Can't Buy Us Love, Rick Anderson proposes that academic libraries should shift from what he calls "commodity documents" (common things you can purchase on the market place eg published journal articles, published books) towards "non-commodity documents" (rare unique material, grey literature etc).

He proposes we "devote a greater percentage of budget and staff time than we hitherto have to
the management and dissemination of those rare and unique documents that each of us owns, that no one but the holder can make available to the world, that have the potential greatly to enrich the world of scholarship, and that can be made available outside of the commercial marketplace without damage to any participant in the scholarly communication system."

There are certain subtleties in the proposal, I suspect I miss but I would argue that in a world where journal articles are available for free and are already efficiently discover-able by Google etc, we would be forced to follow Rick's proposal and focus on special collection which will involve what Lorcan Demsley calls again the "Inside out" challenge. This would involve, digitization/OCR, text transcription, creating metadata and making it discover-able of our special collections.





3. Libraries will have greater focus on value add expertise services such as information literacy, data management services, GIS etc to replace the diminishing "buyer" role







The  Ithaka S+R US Faculty Survey 2012 , shows that of all the roles academic libraries play, it is the role of a buyer that is by far the most important. Interestingly, 2012 is the first year since 2003, where there is a fall in this area though it is still by far most important.

This fall could be insignificant, or it could perhaps point to the fact that increasingly more content is available free online between 2009 and 2012.

What is not in doubt is that if open access rises to become the norm, the role of the buyer by the library will definitely diminish. 

Somewhat discouragingly the other non-collection based roles such as facilitating teaching activities and research activities between 2009 and 2012 fell. But the survey notes that this could be due to a smaller proportion of humanities faculty doing the survey, so might not be a trend.

I am going to state the obvious but perhaps unpleasant truth. If faculty view the buyer role has paramount, open access is going to make it tricky to demonstrate the value of the library as it will diminish the value that faculty want from us (at least for now).

It is hence critical for the survival of academic libraries in the coming years to provide value to faculty that goes beyond purely buying material.

Librarians should double-down on providing expert assistance to faculty across the research cycle, whether it be data research management services, GIS services, Bibliometrics or assisting in teaching activities aka information literacy.

Open Access, also creates roles for librarians as guides in the new Scholarly communication landscape, helping clarify open access issues and terms to faculty who will need to adjust to the new publishing options. The greater disruption to the landscape, the more librarians will be needed to guide and advice on say changes in the evaluation of research impact (post peer review, altmetrics etc). Some will be given shiny new titles like "open access librarian" but most academic librarians who do outreach work will need to do the work as well. But will such roles only be short term due to the novelty of issues?

Of course, some academic librarians reading this, will protest and say that their institution is already doing most of this as opposed to purely collection centric roles and indeed this varies from library to library. I worry though the perception of academic libraries as buyers is going to be hard to shake.


4. Budgets of libraries might shrink

This is quite speculative, but how will library budgets be affected by open access? Looking at the ARL Library Investment Index, we see roughly 30%-50% of ARL library expenditure is on materials (majority will be on journals). How much of this will still be under the control of the library when open access reigns?

If savings do accrue from a revamped open access system, how much of this savings will be channeled to the academic library or will it simply disappear from the budget?

Of course there is no certainty that in the open access world, much savings will accrue. Some open access advocates such as Stevan Harnad fear that a overly and premature focus on the Gold route to open access without what he calls a "leveraged transition" (achieving close to 100% self-archiving first hence forcing published versions of pdfs to compete with author post-print pdfs leading to reduced costs for APCs), might simply mean a transition to an open access environment under which publishers recapture their former profits under subscription journals but only this time via APC (article processing charges).

Some models of Gold open access, also simply push the bill to funders and governments, and depending on the type of model, academic libraries may or may not be involved in managing funds for APCs.

I am not a specialist enough to weigh in on these matters, though Harnad's view seems to make sense to me.

In a sense, a smaller total library budget due to losing the need for a materials expenditure budget doesn't quite matter as long as other things remain constant, but would there be a reduction in the prestige of academic libraries?

More worryingly on a very pessimistic view,  if academic libraries are not prepared for the transition and do not make a strong enough case for the value of their operations to replace the role of a buyer, staff cutbacks might occur.



5. Modernising Referencing practices

This is more an intriguing proposal rather than a prediction from Academic citation practices need to be modernized - References should lead to full texts wherever possible 

The article makes a now fairly standard observation that legacy referencing practices are broken because they do not take into account the shift towards a digital online environment (why shouldn't we simply link to a doi for example) as well as changes in the Scholarly communication system.

There's a lot of fascinating ideas in there but I find the most interesting idea relates to open access.

"With open access spreading now we can all do better, far better, if we follow one dominant principle. Referencing should connect readers as far as possible to open access sources, and scholars should in all cases and in every possible way treat the open access versions of texts as the primary source."

He suggests that if a find published version of an article exists under paywall and a preprint or postprint exists online, referencing should link to the freely available version.

Here's the order he suggests for referencing of articles available in

  1. Open Access Journal 
  2. Hybrid Journal 
  3. University Institution repository
  4. Other "widely used" open access site - He mentions Researchgate or Academia.edu. Subject repositories like SSRN would fit here too. 

Only if none of this was available should one reference the paywall version as a primary source.


Conclusion 

Assuming open access is inevitable, I feel it is only  a slight exaggeration that the upcoming disruption to academic libraries will potentially be bigger than the shift from print to digital for librarians. For good or ill, in the last 20-30 years or so providing access to journal articles behind paywalls was the major purpose of academic libraries as seen by faculty and students and open access will change that.

In a way, I suppose none of the consequences in this blog post is particularly earthshaking assuming open access occurs, but is there sufficient reason to believe that open access is inevitable? I know many librarians who disagree and think it's not so simple.

Even if it does occur, how fast will the transition occur? Will it be gradual allowing academic libraries to slowly transition operations and competencies or will be it a dramatic shift catching us off-guard?

What would be some signals are signs that open access is gaining ground and it might be time to scale back on traditional activities? Downloads per FTE for subscribed journals start to trend downloads? Decreasing library homepage hits? At what percentage of annual output that is open access, do you start scaling back?


Acknowledgements
Much of this blog post about open access, benefits and are drawn from the State of Open Access interviews by Richard Poynder. 

Sunday, July 27, 2014

Size of Google Scholar vs other indexes, personally tuned discovery layers & other discovery news

Regular readers of my blog know that I am interested in discovery, and the role academic libraries should play in promoting discovery for our patrons.

If you feel the same, here are a mix of links I came across recently on the topic that might be of interest

The Number of papers in Google Scholar is estimated to be about 100 million

When talking about discovery one can't avoid discussion of Google Scholar. My last blog post on 8 surprising things I learnt about Google Scholar, raced to the top 20 all time read blog posts in just 3 weeks showing intense interest in this subject.

As such, the Number of Scholarly Documents on the Public Web is a fascinating paper that attempts to estimate the number of Scholarly documents on the public web using the capture/recapture method and in particular it gives you a figure for the number of papers in Google Scholar.

This is quite a achievement, since Google refuses to give this information.

It look me a while to wrap my head around the idea, but essentially it

  • It defines number of Scholarly documents on the web as the sum of the papers in Google Scholar (GS) and Microsoft Academic Search (MAS)
  • It takes the stated number of papers in  MAS to be a bit below 50 million.
  • It calculates the amount of overlap in papers found in both GS and MAS. This overlap needs to be calculated via sampling of course.
  • The overlap is calculated using papers that cite 150 selected papers. 
  • Using the Lincoln–Petersen method, the overlap of papers found and the given value of about 50 million papers in MAS , one can estimate the number of papers in Google Scholar and hence the total sum of papers on the public web. (You may have to take some time to understand this last step, it took me a while for sure)
There are other technicalities such as the paper estimates only English Language papers, being careful to sample papers with less than 1,000 cites (because GS allows only 1,000 results to be shown at most) .

For more see also How many academic documents are visible and freely available on the Web? which summarises the paper, and assesses the strengths and weaknesses of the methodology employed in the paper.

The major results are 

  1. Google Scholar has estimated 99.3 million English Language papers and in total there are about 114 million papers on the web (where web is defined as Google Scholar + MAS)
  2.  Roughly 24% of papers are free online
The figures here are figured to be a lower bound, but it is still interesting as it provides a estimate on the size of Google Scholar. Is 99.3 million a lot?

Here are some comparable systems and the sizes of indexes I am aware of as of July 2014. Scopes might be slightly different but will focus mostly on comparing scholarly or peer reviewed articles which are the bulk of most indexes anyway. I did not adjust for including English Language articles only though many of them do allow filtering for that. 
  • Pubmed - 20-30 million - the go to source for medical and life sciences area.
  • Scopus - 53 million  - mostly articles/conference proceedings but now include some book and book chapters. This is one of the biggest traditional library A&I databases, it's main competitor Web of Science is roughly at the same level but with more historical data , fewer titles indexed.
  • Base - 62 million -drawn from open access institutional repositories. Mostly but not 100% open access items and may include non-article times
  • CrossRef metadata Search - 67 million - Indexed dois - may include book or book chapters. 
So far these are around the level of Microsoft Academic Search at about 50 million.

Are there indexes that are comparable to Google Scholar's roughly 100 million? Basically the library webscale discovery services are the only ones at that level

  • Summon - 108 million - Scholarly material facet on + "Add beyond library collection" + authenticated = including restricted A&I records from Scopus, Web of Science and more. (Your instance of Summon might have more or less depending on A&I subscribed and size of catalogue, Institutional repositories). 
  • Worldcat - 2.1 billion holdings of which 148 million are peer reviewed, 203 million articles [as of Nov 2013]
I am unable to get at figures for the other 2 major library webscale discovery services - Ebsco Discovery Service and Primo Central, but I figure they should be roughly at the same level.



108 millions Scholarly material in Summon - may vary for your Summon Instance



  • Mendeley - 181 million ? This is an interesting case, Mendeley used to list the number of papers in their search but have removed it. The last figure I could get at is 181 million (from wayback machine), which fits with some of the statements made online but looks a bit on the high side to me. 

The figures I've given above with the exception of Mendeley I would think tends to be pretty accurate (subject to the issues of deduping etc) at least compared to the estimates given in the paper.

I think the fact that web scale discovery services are producing results in the same scale >100 million suggests that Google Scholar figure estimated is in the right ballpark. 

Still my subjective experience is that it seems that Google Scholar tends to have substantially more than our library web scale discovery service, so I suspect the 99.3 million obtained for Google Scholar is an underestimate. 

I wonder if one could use the same methodology as in The Number of Scholarly Documents on the Public Web to estimate the size of Google Scholar but using Summon or one of the other indexes mentioned above to measure overlap instead of Microsoft Academic Search.

There are some advantages

For example, there is some concern that the size of Microsoft Academic Search assumed in the paper to be 48.7 is not accurate but the figures given for say Summon are likely to be more accurate (again issues with deduping aside).

It would also be interesting to see how Google Scholar fares when compared to a index that is about twice as large as MAS.

Would using a web scale library discovery service to estimate the size of Google Scholar give a similar figure of about 100 million? 

Arguably not since we are talking about a different populations ie. MAS + GS vs Summon + GS though both can be seen as a rough estimate of the size of scholarly material available in the world that can be discovered online. (Also are the results you can find in Summon be considered the "public web" if you need to authenicate before searching to see a subset of results from A&I databases like Scopus?)

The main issue though I think to trying to use Summon or anything similar in place of MAS is a technical one.

The methodology measures overlap in a way that has been described as "novel and brilliant", instead of running the same query on the 2 searches and looking for overlaps, they do it this way instead.

"If we collect the set of papers citing p from both Google Scholar and MAS, then the overlap between these two is an estimate of the overlap between the two search engines." 

Unfortunately none of the web scale discovery services have a cited by feature (they do draw on and display Scopus and Web of Science cited counts but that's a different matter)

One can fall back on older methodologies and measuring overlap by running the same query on GS and Summon, but this has drawbacks described as "bias and dependence" issues. 

Boolean versus ranked retrieval - clarified thoughts

My last blog post Why Nested Boolean search statements may not work as well as they did was pretty popular but what I didn't realise that I was implicitly saying that relevance ranking of documents retrieved using Boolean operators did not generally work well.

This was pointed out by Jonas 



I tweeted back asking why we couldn't have good ranked retrieval on documents retrieved using Boolean operators and he replied that he thinks it's based two different mindsets and one should either "trust relevance or created limited sets."

On the opposite end, Dave Pattern of Huddersfield reminded me that Summon's relevancy ranking was based on Open Source Lucene software with some amount of tweaking. You can find some details  but essentially it is designed to combine Boolean with Vector Space models etc aka it is designed or can do Boolean + ranked retrieval.

After reading though some documentation and the excellent Boolean versus ranked querying for biomedical systematic reviews, I realized my thinking on this topic was somewhat unclear.

As a librarian, I have always assumed it makes too much sense to (1) Pull out possibly relevant articles using Boolean Operators (2) Rank them using various techniques from classic tf-idf factors to other more modern techniques like link popularity etc.

I knew of course, there were 2 paradigms, that the classic Boolean set retrieval assumed every result was "relevant" and did not bother with ranking beyond sorting by date etc. But it still seemed odd to me not to try to at least to add ranking. What's the harm right?

The flip side was, what is ranked retrieval by itself? If one entered SINGAPORE HISTORICAL BUILDINGS ARCHITECTURE, it would still be ranking all documents that had all 4 terms right?(maybe with stemming) or wasn't it really still Boolean with ranking?

The key I was missing which now seemed obvious is that for ranked retrieval paradigms not every search term in the query has to be matched.

I know those knowledgeable in information retrieval reading this might think this be obvious and I am dense for not realizing this. I guess I did know this except I am a librarian, I am so trapped into Boolean thinking that I assume implicit AND is the rule.

In fact, we like to talk about how Google and some web searches do "Soft AND", and kick up a fuss when they might sometimes drop off one or more search terms. But in ranked retrieval that's what uou do, you throw in a "bag of words" (could be a whole paragraph of words), the ranking algorithms tries to do the best it can but the documents it fulls up may not have all the words in the query.

Boolean versus ranked querying for biomedical systematic reviews is particularly interesting paper, showing how different search algorithms ranging from straight out Boolean to ranked retrieval techniques that involve throwing in Title,abstracts as well as hybrid techniques that involve combining Boolean with Ranked retrieval techniques fare in term of retrieving clinical studies for systematic reviews.

It's a amazing paper, with different metrics and good explaintion of systematic reviews if you are unfamiliar. Particularly interesting they compare Boolean Lucene results which I think give you a hint on how Summon might fair.

The best algorithm for ranking might surprise you.... 



Read the full paper to understand the table! 


Large search index like Google Scholar, discovery service flatten knowledge but is that a good thing?

Like many librarians, I have an obsession on the size of databases, but is that really important?

Over at Library Babel Fish, Barbara Fister on the Library isn't flat, worries that academic libraries' discovery services are "are (once again) putting too high a value on volume of information and too little on curation".

 She ends with the following questions

"Is there some other way that libraries could enable discovery that is less flat, that helps make the communities of inquiry and the connections between ideas easier to follow? Is there a way to help people who want to join those conversations see the patterns and discern which ideas were groundbreaking and significant and which are simply filling in the details? Or is curation and connection too labor-intensive and inefficient for the globalized marketplace of ideas?"

Which makes the next section interesting....

Library Top Trends - Personally tuned discovery layers 

Ken Varnum at the recently concluded LITA Top Technology Trends Sessions certainly thinks that what is missing in current Library discovery services is the ability for librarians to provide personally tuned discovery layers for local use.

He would certainly think that there is value in librarians, slicing the collections into customized streams of knowledge to suit local conditions. You can jump to his section on this trend here. Also Roger Schonfeld's
section on Anticipatory discovery for current awareness of new publications is interesting as well.




To Barbara Fister's question on whether curation is too labour intensive or inefficient, Ken would probably answer no, and he suggests that in the future librarians can customize collections based on subject as well as appropriateness of use (e.g Undergraduate vs a Scholar).

It sounds like a great idea, since Summon and Ebscohost discovery layers currently provide hardcoded discipline sets and I can imagine eventually been able to create subject sets based on collections at the database and/or at the journal title levels (shades of the old federated search days or librarians creating google custom search engines eg one covering NGO Sites or Jurn (open access in humanities)).

At the even more granular level, I suppose one could also pull from reading lists etc.

Unlike Ken though I am not 100% convinced though it would just take "a little bit of work" to make this worth while or at least better than the hardcoded discipline sets. 


NISO Publishes Recommended Practice on Promoting Transparency in Library Discovery Services


NISO RP-19-2014, Open Discovery Initiative: Promoting Transparency in Discovery [PDF] was just published.

Somewhat related is the older NFAIS Recommended practices on Discovery Services [PDF]

I've gone through it as well as EBSCO supports recommendations of ODI press release and I am still digesting the implications, but clearly there is some disagreement about handling of A&I resources (not that shocking).

Discovery Tools, a Bibliography

Highly recommend resource - this is a bibliography by Fran├žois Renaville. Very comprehensive covering papers from 2010 onwards.

It is a duplicate of the Mendeley Group "Libraries & [Web-Scale] Discovery Tools.


Ebsco Discovery Layer related news

Ebsco has launched a blog "Discovery Pulse" with many interesting posts. Some tidbits

Note : I am just highlighting Ebsco items in this post because of their new blog as the blog may be of interest to readers. I would be happy to highlight Primo, Summon, WorldCat discovery service items when and if I become aware of them. 


Summon Integrates Flow research management tool.

It was announced that in July, Summon will integrate with Proquest Flow, their new cloud based reference management tool.


The word Login is extremely misleading in my opinion. 

I have very little information about this and how overt the integration will be. But given that Mendeley was acquired by Elsevier, Papers by Springer, it's no wonder that Proquest wants to get into the game as well.

It's all about trying to get into the researcher's workflow and unfortunately as increasingly "discovery happens elsewhere", so it would be smart to focus on reference management an area where currently the likes of Google seem to be ignoring (though moves like Scholar Library where one can add citations found in Google Scholar to your own personal library may say otherwise).

Mendeley for certain has shown that reference management is a very powerful place to start to get a digital foothold.

While it's still early days, currently Flow seems to have pretty much the standard features one sees in most modern reference managers eg. Free up to 2GB storage, support of Citation Style Language (CSL), capabilities for collaboration etc. I don't see any distinguishing features or unique angles yet.

Here's a comparison in terms of storage space for the major competitors such as Mendeley.

The webinar I attended on it (sorry don't have link to recording) suggests Proquest has big plans for Flow, beyond a reference manager. It will aim to support the whole research cycle, and I think this includes support as a staging ground for publication (submission to PQDT??), as well as support of prepub works (posting to Institutional or Subject repositories?).

It will be interesting to see if Proquest will try to leverage it's other assets such as Summon to support Flow. Eg. Would Proquest tie recommender services drawn from Summon usage into it?

Currently you can turn off Flow from Summon without much ill effects and it seems some libraries have done so because it may take time to evaluate and prepare staff to support this, but it remains to see if in the long run , if Flow might just have too many features and value to be turned off.

BTW If you want to keep up with articles, blog posts, videos etc on web scale discovery, do consider subscribing to my custom magazine curated by me on Flipboard (currently over 1,200 readers) or looking at the bibliography on web scale discovery services)


Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...