Private Recipes

[Click here to return to the home page]

Important

If you use GGD, please cite the Nature Communications GGD paper

Although GGD has been developed to provide reproducible data access and management for public scientific data, GGD can also be used to reproducibility curate and manage private data.

For one reason or another, a researcher may wish to keep data private temporarily but still wish to use GGD to document data access, curation, and management. Below are suggested steps that can be followed in order to use GGD to manage private data on a local machine.

Note

Private data management is not the primary use of GGD. GGD is used for public data access and use. It should be noted that private data recipes cannot be added to the ggd-recipes repository on github, they cannot be added to any ggd conda channel, and they cannot be added to ggd cloud caching services. All private data recipes will need to be stored and maintained by creator/user of those data recipes.

1. Create a private github repository to store private data recipes

We suggest that you create a private github repository to store your private data recipes. This allows for data recipe versioning and storage. As noted above, private data recipes cannot be stored on the ggd repositories.

It is important that these recipes are somehow stored and maintained as they are the infrastructure and manual that ggd will use. They are key for the reproducibility and provenance aspects of data management.

2. Create a ggd recipe

Use ggd make-recipe to create the private data recipe(s) from a bash script with the instruction on data access, curations, and processing.

see ggd make-recipe for more information on how to create a data recipe

3. Check and install the data recipe

Because the private ggd data recipe will not be added to the ggd library, you won’t be able to use ggd install to install the recipe. Instead you will use ggd check-recipe command to install the recipe locally. In order to do this the -du argument must be added to the ggd check-recipes command.

For example, if you have a private data recipe called grch37-de-novo-callset-lark-v1 you would install it using the following command:

ggd check-recipe grch37-de-novo-callset-lark-v1 -du

Warning

The ggd check-recipe command does not have a --prefix argument. This means you have to be in the conda environment where you wish to install the data.

for more information on the ggd check-recipe command see ggd check-recipe

Note

Installing private data recipes will take longer because the recipe goes through the normal check and validation step before it is installed. Additionally, there is no way to cache private data recipes, so the speedup seen from installing public data recipes is not available.

4. Add the data recipe to github

Now that the data recipe has been created, it has been checked, and installed, you can add it to your private github repository.

Note

We are suggesting adding the data recipe, not the actual data to the private github repository

5. How to access installed data from private recipe

Once a private data recipe has been installed on your system you can access and use it in a few different ways.

  1. Using environment variables

    A private GGD recipe will come with environment variables as does a non-private ggd recipes. You must be in the same environment where the data recipe has been installed in order to use them. You must first run source activate base before the environment variables are active.

  2. Showing active environment variables

    Environment variables for private data recipes can be listed using ggd show-env

    For more information about the ggd show-env command see: ggd show-env

  3. Listing installed private recipes

    A list of installed data recipes, including private recipes, can be seen using ggd list.

    Note

    ggd list has a --prefix argument that is used to list installed data recipes in different conda environments. The --prefix argument can be used for private recipes. This means you can list private data recipes that are installed in a different conda environment than the currently active environment you are using.

    For more information about the ggd list command see: ggd list

  4. Getting data files for private recipes

    As with normal data recipes, you can use the ggd get-files command to get data files created by private data recipes.

    Note

    ggd get-files has a --prefix argument that is used to get installed data files from different conda environments. This --prefix argument can be used for private recipes. This means you can get installed data files from private recipes that are in a different conda environment than the currently active one.

    For more information about the ggd get-files command see: ggd get-files

6. GGD commands the won’t work with private recipes

There are a few GGD commands that won’t work with private recipes. Those include:

  • ggd search

  • ggd predict-path

  • ggd uninstall

  • ggd pkg-info

7. Uninstalling a previously installed private data recipe

To uninstall a private data recipe you will run ggd check-recipes <recipe name> where <recipe name> represents the path to and name of the data recipe.

Note

To uninstall the private data recipe you must omit the -du argument from the ggd check-recipe command.

Finally

GGD is a data management system built to manage and distribute publicly available scientific data. As this is the main purpose of GGD we encourage user to add ggd recipes to the public ggd repositories for the scientific community to use. GGD is built to help remove the inconsistencies with data processing and management that have plagued researchers for year. Therefore, GGD will continue to encourage public data access, management, and reproducibility. We understand that sometimes data cannot be shared publicly, but a user may wish to use GGD to process and manage their data, as well as to use the infrastructure of data recipes for reproducibility. The features on this page are here to assist if you want to use GGD but truly need to retain data privacy. However, GGD will continue to promote public data sharing whenever possible, and therefore, the GGD features will be maintained to support such goals.