Data Management

 

NSF requires that data generated from a system they funded by archived and shared in some way. Below are the promises we made to NSF to be able to purchase the cluster. Please adhere to them in your projects that use the cluster.

Types of data, metadata, and standards used: The data and metadata to be archived varies across the different projects.  For those projects that produce software, including source code and various scripts, each PI will make the software available by either posting it on each project’s web page, or in a public repository such as GitHub, linked from the PI’s web page.  The other data and metadata generated as part of the projects will be made similarly available. For both software and other data both dissemination options provide sufficient space for the data and support the file types that will be produced

Standard data formats vary among PIs’ disciplines, but most  results will be made available as standard text files, for example formatted with one sample per row for ease of analysis  using spreadsheets or with tools such as R and Matlab. In addition to this raw data, cooked data as typically found in spreadsheets will also be included. Both the raw and cooked data will include appropriate meta “readme” information describing the means by which the data was generated as well as any procedural information required for replication. Finally, like the data from the experiments, public domain source code utilized for the research will be made accessible to others primarily through URLs included in relevant papers. Note that only source code from the public domain will be included. Any proprietary source code or data and facts derived from it, will respect the rights of the  intellectual property owner.

 

Access, sharing, and redistribution policies: Access to existing data involves access to two sources of data. Open-source software (code and related artifacts) will be obtained from common Internet repositories such as GitHub or similar discipline-specific sources. Any industrial source and related artifacts will be obtained through appropriate industrial contacts. There is no cost for this data, which will be stored on the cluster.

Sharing of open source software and data obtained from it will be accomplished by making the data freely available on each PI’s research web page or standard web repository such as GitHub with no barriers (e.g., passwords) to access. The data will be in a format easily readable by commonly available tools. Each PI plans to retain the right to use the data before making them available long enough to publish research results. While no human subjects are involved in the proposed work, for any future experiments involving human subjects, all data will be gathered, analyzed and  disseminated subsequent to and consistent with the required approval from Loyola’s Institutional Review Board (FWA# – i’ll add this in). The only other need for protection of privacy, confidentiality, security, and intellectual property protection that will limit the ability of the data is related to the use of industrial source code and the facts derived from it.

 

Results Dissemination: Each PI will share the results of their work via publication and presentation appropriate to their field. Student research not ready for wider dissemination will be presented as part of various university supported events such as the Hauber Summer Research program and the Undergraduate Research Symposium.

 

Archiving plan: All project related data will be retained in a publicly available venue for not less than ten years and likely much longer.  While the particular venue depends on the PI, example possibilities include the PI’s web page, data archiving associated with a journal, or a public repository such as GitHub or BitBucket. Loyola has a data sustainability plan in place for locally stored data of current employees. This includes the Office of Technology Services maintaining off-site redundancy to ensure data preservation and backup. Should a PI leave Loyola, or the University stop maintaining faculty web pages, then locally stored data will be moved to a public repository.

In terms of discovery, it is anticipated that researchers who have read relevant papers will form the target audience for this data. The data is also indexed by all major search engines and thus is discoverable by searching the world wide web. To facilitate searching, web page titles will include relevant keywords. All web pages and repositories will include an acknowledgement of federal support. PIs will maintain links to off campus data on their university website to aid in discovery.

In addition to the data, the metadata, such as the experimental protocols and details of the statistical tests performed, will be archived through scientific publication. The resulting papers will be available from the PI’s web page, a research sharing website such as Academia.edu or ResearchGate, or that of the copyright holder.  They will be written with replication in mind and thus will include key metadata such as the steps necessary to reproduce the experiments.

 

Roles and Responsibilities: Each PI will take sole responsibility for management of their data. Should a PI leave Loyola, existing data will be maintained by the PI on a web repository such as bitbucket.

The only factor that will limit the availability of the data is when source code has an industrial origin, which requires that appropriate protection of confidentiality be maintained. No ethical restrictions related to data from human subjects is expected as no sensitive data will be collected.