Finding data in the long-tail

blog_1_fig_1Scientists are increasingly examining the most comprehensive catalogue of datasets for any particular question.  Making sure you can find as much of the data relevant to a particular problem thus begins to loom as a large issue.   Although institutional repositories (such as NCBI, Dryad, Figshare etc.) are great at storing the final published versions of the data sets, some early and smaller-scale research data can get lost in the “long-tail“.   Anne Thessen has a great post over on her blog on the Data Detektiv, on how to locate and keep track of such “dark data”:

Finding relevant data, especially if the needed data are dark, can be a difficult and lengthy task. … Was there a way to discover data based on events earlier in the research workflow? After some thought, I realized that databases and lists of awards made by funding agencies were an excellent source of information about potentially relevant data sets and who was likely to have them….

read more at Data Detektiv