ggd pkg-info¶
[Click here to return to the home page]
ggd pkg-info
is used to get package information for a specific ggd data package installed on your system.
This is a great resource for identifying the information for a specific ggd data package installed on your system. It will provide information about the data type, the provider of the data, the actual version of the data, the version of the ggd data package, and more.
The examples below help to illustrate what is available from ggd pkg-info
.
This tool provides a resource to help enforce reproducibility. The information provide from running
ggd pkg-info
will help to distinguish the data package and allow you to provide such information to
others or in a citation
Using ggd pkg-info¶
pkg-info is useful when you want to know a bit more about the data package you have installed. This tool provides general information about the data package and can access the original recipe used to download and process the data.
Note
The ggd data package must be installed on your system in order to get the package information
Running ggd pkg-info -h
will give you the following help message:
pkg-info arguments:
pkg-info |
Get the information for a specific ggd data package installed in the current conda environment |
---|---|
|
show this help message and exit |
|
Required The name of the recipe to get info about. |
|
(Optional) The ggd channel of the recipe to list info about (default: genomics) |
|
(Optional) When the flag is set, the recipe will be printed to the stdout. This will provide info on where the data is hosted and how it was processed. (NOTE: -sr flag does not accept arguments) |
|
(Optional) The name or the full directory path to a conda environment where a ggd recipe is stored. (Only needed if listing pkg data info for a pkg not installed in the current environment) |
Additional argument explanation:¶
Required arguments:
name: The
name
argument represents the name of the ggd data package for which to retrieve info. No flag is required for this argument, just supply the name.
Optional arguments:
-c: The
-c
flag represents the ggd channel the package came from. The default is genomics.-sr: The
-sr
flag is used to show the recipe for the data package. Showing the recipe will allow the user to identify where the data was originally downloaded, how it was processed, and other information about the data being used.–prefix: The
--prefix
flag is used to designate which conda environment/prefix to get the file from. This allows one to store ggd data packages in one environment and access it from another.
If the package has not been installed on your system then the package info will not be displayed and the recipe will not be accessible.
Examples¶
1. Example listing pkg info:¶
$ ggd pkg-info hg19-gaps-ucsc-v1
----------------------------------------------------------------------------------------------------
GGD-Package: hg19-gaps-ucsc-v1
GGD-Channel: ggd-genomics
GGD Pkg Version: 1
Summary: Assembly gaps from UCSC in bed format
Species: Homo_sapiens
Genome Build: hg19
Keywords: gaps, region, bed-file
Cached: uploaded_to_aws
Data Version: 27-Apr-2009
File type(s): bed
Data file coordinate base: 0-based-inclusive
Included Data Files:
hg19-gaps-ucsc-v1.bed.gz
hg19-gaps-ucsc-v1.bed.gz.tbi
Approximate Data File Sizes:
hg19-gaps-ucsc-v1.bed.gz: 5.16K
hg19-gaps-ucsc-v1.bed.gz.tbi: 8.22K
Pkg File Path: <conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1
Installed Pkg Files:
<conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz.tbi
<conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz
----------------------------------------------------------------------------------------------------
2. Example listing pkg info and recipe:¶
$ ggd pkg-info hg19-gaps-ucsc-v1 -sr
----------------------------------------------------------------------------------------------------
GGD-Package: hg19-gaps-ucsc-v1
GGD-Channel: ggd-genomics
GGD Pkg Version: 1
Summary: Assembly gaps from UCSC in bed format
Species: Homo_sapiens
Genome Build: hg19
Keywords: gaps, region, bed-file
Cached: uploaded_to_aws
Data Version: 27-Apr-2009
File type(s): bed
Data file coordinate base: 0-based-inclusive
Included Data Files:
hg19-gaps-ucsc-v1.bed.gz
hg19-gaps-ucsc-v1.bed.gz.tbi
Approximate Data File Sizes:
hg19-gaps-ucsc-v1.bed.gz: 5.16K
hg19-gaps-ucsc-v1.bed.gz.tbi: 8.22K
Pkg File Path: <conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1
Installed Pkg Files:
<conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz.tbi
<conda root>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz
----------------------------------------------------------------------------------------------------
hg19-gaps-ucsc-v1 recipe file:
*****************************************************************************
* #!/bin/sh
* set -eo pipefail -o nounset
* genome=https://raw.githubusercontent.com/gogetdata/ggd-recipes/master/genomes/Homo_sapiens/hg19/hg19.genome
* wget --quiet -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/gap.txt.gz \
* | gzip -dc \
* | awk -v OFS="\t" 'BEGIN {print "#chrom\tstart\tend\tsize\ttype\tstrand"} {print $2,$3,$4,$7,$8,"+"}' \
* | gsort /dev/stdin $genome \
* | bgzip -c > hg19-gaps-ucsc-v1.bed.gz
*
* tabix hg19-gaps-ucsc-v1.bed.gz
*****************************************************************************
:ggd:pkg-info: NOTE: The recipe provided above outlines where the data was accessed and how it was processed
3. Example listing pkg info and recipe in a different prefix:¶
Using the code:ggd pkg-info to get the package metadata info in a different conda environment such as a conda environment called ggd_data
$ ggd pkg-info hg19-gaps-ucsc-v1 -sr --prefix ggd_data
----------------------------------------------------------------------------------------------------
GGD-Package: hg19-gaps-ucsc-v1
GGD-Channel: ggd-genomics
GGD Pkg Version: 1
Summary: Assembly gaps from UCSC in bed format
Species: Homo_sapiens
Genome Build: hg19
Keywords: gaps, region, bed-file
Cached: uploaded_to_aws
Data Version: 27-Apr-2009
File type(s): bed
Data file coordinate base: 0-based-inclusive
Included Data Files:
hg19-gaps-ucsc-v1.bed.gz
hg19-gaps-ucsc-v1.bed.gz.tbi
Approximate Data File Sizes:
hg19-gaps-ucsc-v1.bed.gz: 5.16K
hg19-gaps-ucsc-v1.bed.gz.tbi: 8.22K
Pkg File Path: <ggd_data env>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1
Installed Pkg Files:
<ggd_data env>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz.tbi
<ggd_data env>/share/ggd/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/1/hg19-gaps-ucsc-v1.bed.gz
----------------------------------------------------------------------------------------------------
hg19-gaps-ucsc-v1 recipe file:
*****************************************************************************
* #!/bin/sh
* set -eo pipefail -o nounset
* genome=https://raw.githubusercontent.com/gogetdata/ggd-recipes/master/genomes/Homo_sapiens/hg19/hg19.genome
* wget --quiet -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/gap.txt.gz \
* | gzip -dc \
* | awk -v OFS="\t" 'BEGIN {print "#chrom\tstart\tend\tsize\ttype\tstrand"} {print $2,$3,$4,$7,$8,"+"}' \
* | gsort /dev/stdin $genome \
* | bgzip -c > hg19-gaps-ucsc-v1.bed.gz
*
* tabix hg19-gaps-ucsc-v1.bed.gz
*****************************************************************************
:ggd:pkg-info: NOTE: The recipe provided above outlines where the data was accessed and how it was processed