About the structure

The Nephroseq data structure can be broken into three layers:

1) Data input

This layer consists of the incoming gene expression data and gene annotation data. Candidate datasets are identified, prioritized, and pre-processed, including quality control steps.

2) Data processing and analysis

This layer consists of sample metadata standardization and automated statistical analysis. The metadata standardization utilizes an internally-developed ontology specific for chronic kidney disease. The automated statistical analysis component is implemented in Perl, Bash, and C. A series of scripts monitors the database for new data and sample parameters and automatically performs differential expression analysis and cluster analysis when needed.

3) Data visualization

The Nephroseq web servers query data from the Nephroseq database and display tabular and graphical representations of the data and analysis results.