nf-core/tfactivity
Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.
Introduction
This document describes the output produced by the pipeline. Most of the plots and analyses are generated from the test dataset for the pipeline.
The pipeline identifies the most differentially active transcription factors (TFs) between multiple conditions by integrating gene expression data with open chromatin information (ATAC-seq, DNase-seq, ChIP-seq). It uses a sophisticated workflow that combines chromatin accessibility, motif scanning, differential expression analysis, and machine learning approaches to rank transcription factors based on their regulatory activity.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter’s are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
Prepare genome
This step prepares essential reference assets used throughout the workflow, ensuring all downstream analyses have access to properly formatted and indexed reference data. The pipeline handles both compressed and uncompressed input formats seamlessly.
If compressed inputs are provided (.fa.gz
, .gtf.gz
), the reference FASTA and GTF files are transparently decompressed to standard formats for compatibility with downstream tools. From the GTF annotation file, the pipeline extracts critical information including a comprehensive mapping between stable gene identifiers and gene symbols, as well as detailed gene and transcript length tables that are essential for accurate quantification and normalization in expression analyses.
The reference FASTA genome file is indexed using SAMtools to obtain chromosome sizes and enable efficient random access for subsequent genomic tools. This indexing step is crucial for tools like STARE that need to access genomic sequences rapidly during transcription factor binding site analysis.
The structured output layout follows nf-core conventions, making it easy to locate and reuse reference files across different pipeline runs.
Output files
00_prepare_genome/
01_fasta/
chr1.fa
: Decompressed reference FASTA when input--fasta
is.fa.gz
(GUNZIP_FASTA).
02_gtf/
chr1.gtf
: Decompressed reference GTF when input--gtf
is.gtf.gz
(GUNZIP_GTF).
03_id_symbol_map/
id_symbol_map.txt
: Gene ID to gene symbol mapping extracted from the GTF (EXTRACT_ID_SYMBOL_MAP).
04_gtftools_length/
gtf.txt
: Transcript/gene length table derived from the GTF (GTFTOOLS_LENGTH).
05_samtools_faidx/
chr1.fa.fai
: FASTA index and chromosome sizes generated from the reference FASTA (SAMTOOLS_FAIDX).
Counts
This comprehensive step processes gene expression data through multiple stages, from raw count integration to sophisticated differential expression analysis. The workflow consolidates count matrices from multiple sources, performs quality filtering, and conducts statistical analysis to identify differentially expressed genes between conditions.
The pipeline begins by combining raw count tables from the primary input with any optional additional count sources specified by the user. It then calculates Transcripts Per Million (TPM) values using reference gene lengths extracted from the GTF annotation, providing normalized expression values that account for gene length differences and sequencing depth variations.
Multiple filtering steps ensure data quality: genes are filtered based on minimum total counts (--min_count
) and minimum TPM values (--min_tpm
) to remove lowly expressed features that may introduce noise. Transcription factors receive special treatment with separate filtering thresholds (--min_count_tf
, --min_tpm_tf
) to retain potentially important regulatory genes that might be expressed at lower levels.
The experimental design is automatically prepared from the counts design input, and DESeq2 performs robust differential expression analysis. DESeq2 uses negative binomial generalized linear models to test for differential expression while controlling for library size and dispersion. Each pairwise comparison between conditions generates comprehensive statistical results including normalized counts, variance-stabilized transformations, log2 fold changes, and adjusted p-values.
The outputs are systematically organized to separate combined inputs, derived quantifications, filtered feature sets, and DESeq2 analysis results, making it easy to trace the analysis workflow and access intermediate results for quality control or downstream analysis.
DESeq2 generates diagnostic plots including dispersion plots that show the relationship between mean expression and dispersion estimates, helping users assess the quality of the statistical model fit.
Output files
01_counts/
01_combined/
counts.clean.tsv
: Combined count matrix from primary and extra sources after aggregation.genes.txt
: List of genes retained after combination.
02_tpm/
counts.tpm.tsv
: TPM matrix computed from counts using reference gene lengths and the gene map.
03_filtered_genes/
counts.counts_filtered.tsv
: Counts after gene-level filtering by--min_count
.counts.tpm_filtered.tsv
: TPM after gene-level filtering by--min_tpm
.counts.genes_filtered.txt
: Gene list retained after filtering.
04_filtered_tfs/
TFs.counts_filtered.tsv
: TF counts filtered by--min_count_tf
.TFs.tpm_filtered.tsv
: TF TPM filtered by--min_tpm_tf
.TFs.genes_filtered.txt
: TF gene list retained after filtering.
05_deseq2/
design.design.csv
: Experimental design prepared from the counts design input.<contrast_id>/
: Per-contrast DESeq2 outputs, e.g.L1:L10/
containing:<contrast>.normalised_counts.tsv
<contrast>.vst.tsv
<contrast>.deseq2.results.tsv
<contrast>.deseq2.sizefactors.tsv
<contrast>.deseq2.model.txt
<contrast>.deseq2.dispersion.png
<contrast>.dds.rld.rds
<contrast>.R_sessionInfo.log
Motifs
This critical step prepares transcription factor binding motifs for downstream scanning and scoring analyses. The pipeline supports flexible motif input options and performs comprehensive processing to ensure motifs are in the correct format for various downstream tools.
Motif Acquisition: You can either provide your own curated motif collection via --motifs
(supporting multiple formats including JASPAR, MEME, TRANSFAC, CisBP, HOMER, and UniPROBE), or the pipeline can automatically fetch a comprehensive, taxon-specific collection from the JASPAR database when --taxon_id
is supplied. JASPAR provides high-quality, manually curated transcription factor binding profiles derived from experimental data.
Processing Pipeline: All motifs are converted to a universal internal representation using the universalmotif R package, ensuring consistent handling regardless of input format. The pipeline then applies intelligent filtering based on user-defined parameters, including removal of low-quality motifs and handling of duplicate motifs according to the --duplicate_motifs
setting (remove, merge, or keep).
Format Conversion: The processed motifs are exported to multiple standard formats to ensure compatibility with different analysis tools:
- MEME format: For use with MEME Suite tools like FIMO
- TRANSFAC format: For compatibility with various motif analysis software
- Position-Specific Energy Matrices (PSEM): Optimized format for STARE affinity calculations
The systematic organization of outputs allows users to trace the motif processing workflow and access motifs in the format most suitable for their downstream analyses or external tools.
Output files
02_motifs/
01_jaspar/
motifs.jaspar
: Retrieved motif collection and metadata from the specified JASPAR release.
02_universal/
motifs.converted.universal
: Motifs converted into a pipeline-universal format for consistent processing.
03_filtered/
motifs.filtered.RDS
: Universal motifs after applying user-defined filtering parameters.
04_meme/
motifs.converted.meme
: Motif set exported in MEME format for compatibility with MEME Suite tools.
05_transfac/
motifs.converted.transfac
: Motif set exported in TRANSFAC-like format.
06_psem/
motifs.psem
: Position-specific energy matrices (PSEM) derived from the filtered motifs.
Peaks
This sophisticated step transforms raw chromatin accessibility data into quantitative transcription factor-gene regulatory relationships through a multi-stage analysis pipeline. The workflow processes peak regions from ChIP-seq, ATAC-seq, or DNase-seq experiments to identify candidate regulatory elements and calculate precise TF-DNA binding affinities.
Peak Processing: The pipeline begins by cleaning and standardizing peak coordinates, ensuring consistent 6-column BED format. When --footprinting
is enabled, closely spaced peaks (within --max_peak_gap
) are merged to create more biologically meaningful regulatory regions. Users can choose to either merge peaks across samples with the same condition and assay (--merge_samples
) or process them individually, depending on experimental design and analysis goals.
Chromatin State Annotation: When BAM files are provided and ChromHMM is enabled, the pipeline uses ChromHMM to learn chromatin states from histone modification patterns. ChromHMM applies unsupervised learning to identify --chromhmm_states
distinct chromatin states, then classifies regions as enhancers or promoters based on user-specified histone marks (--chromhmm_enhancer_marks
, --chromhmm_promoter_marks
) and confidence thresholds (--chromhmm_threshold
).
Regulatory Region Refinement: The optional ROSE analysis serves as a post-processing tool that refines ChromHMM predictions by applying additional filtering and stitching operations. ROSE performs TSS proximity filtering and stitches nearby regions within --rose_stitching_window
to create more coherent regulatory domains. The tool removes regions that overlap with multiple transcription start sites, helping to distinguish distal regulatory elements from promoter-proximal regions and reducing potential confounding effects in downstream analyses.
Affinity Calculation: STARE computes quantitative TF-gene binding affinities by combining position weight matrix scanning with distance-based decay functions. STARE uses the Activity-By-Contact model to estimate regulatory potential, considering both binding site strength and genomic distance from target genes. The --window_size
parameter defines the search radius around genes, while --decay
controls distance-dependent attenuation of regulatory effects.
Data Integration: For multi-replicate experiments, affinities are averaged across biological replicates. Gene symbol synonyms are aggregated using the specified method (--affinity_aggregation
), and duplicate motifs are handled according to user preferences. Finally, per-contrast affinity ratios and sums are calculated for matched assays, providing the quantitative regulatory relationships needed for downstream TF activity scoring.
The comprehensive output organization reflects each processing stage, allowing users to access intermediate results for quality control and understand how regulatory relationships were derived.
Output files
03_peaks/
01_cleaned/
<sample>.clean.bed
: Peak BEDs normalized to 6 columns (CLEAN_BED).
02_footprinting/
01_merged/
: Peaks merged withinmax_peak_gap
per sample/assay (BEDTOOLS_MERGE).02_subtracted/
: Footprinted regions after subtracting overlaps as configured (BEDTOOLS_SUBTRACT).
03_merged_samples/
(only if--merge_samples
is true)01_annotated/
: Peaks annotated with sample identifiers (ANNOTATE_SAMPLES).02_concatenated/
: Sample BEDs concatenated (CONCAT_SAMPLES).03_sorted/
: Concatenated BEDs sorted (BEDTOOLS_SORT).04_merged/
: Merged regions with occurrence counts across samples (BEDTOOLS_MERGE).04_filtered/
: Regions filtered by--min_peak_occurrence
.05_cleaned/
: Final 3-column BED of merged regions (CLEAN_BED).
03_sorted/
(only if samples are not merged)<condition>_<assay>_sorted.bed
: Per-sample/assay sorted BEDs (SORT_PEAKS).
04_chromhmm/
(created unless--skip_chromhmm
)01_binarized/
: Binarized signals from input BAMs (BINARIZE_BAMS; requireschrom_sizes
).02_learned/
: Learned model and state assignments (LEARN_MODEL;--chromhmm_states
).03_enhancers/
: Enhancer regions at--chromhmm_threshold
from selected marks.04_promoters/
: Promoter regions at--chromhmm_threshold
from selected marks.
05_rose/
(created only if ChromHMM ran and--skip_rose
is false)01_filtered/
:<cond>_<assay>_filtered.bed
(FILTER_CONVERT_GTF).02_sorted/
: Sorted GTF-derived BED.03_sorted/
: Sorted chromosome sizes matching BED order (*_sorted.fa.fai
).04_tss/
:tss.bed
: ±--rose_tss_window
TSS windows.05_inverted/
: Inverted TSS windows for promoter filtering.06_filtered/
:<cond>_<assay>_filtered.bed
: Predicted regions after TSS filtering.07_stitched/
:<cond>_<assay>_stitched.bed
: Stitched regions within--rose_stitching_window
.08_tss_overlap/
:<cond>_<assay>_tss-overlap-counts.bed
: Overlap counts of stitched regions with TSS.09_filtered/
:<cond>_<assay>_overlap.bed
: Regions overlapping ≥2 TSS.10_subtracted/
:<cond>_<assay>.bed
: Stitched regions with multi-TSS overlaps removed.11_unstitched/
:<cond>_<assay>_original_regions.bed
: Original unstitched regions for multi-TSS overlaps.12_concatenated/
:<cond>_<assay>_stitched.bed
: Combined correctly-stitched and original unstitched, sorted.
06_stare/
<condition>_<assay>/Gene_TF_matrices/<condition>_<assay>_TF_Gene_Affinities.txt
: TF–gene affinities computed by STARE.
07_affinity_mean/
(only when samples are not merged)<condition>_<assay>.tsv
: Replicate affinities averaged across samples per condition and assay (AFFINITY_MEAN).
08_aggregated/
<condition>_<assay>.agg_affinities.tsv
: Affinities after aggregating gene symbol synonyms and optional duplicate motif merging (AGGREGATE_SYNONYMS).
09_affinity_ratio/
<condition1:condition2>_<assay>.tsv
: Per-contrast affinity ratio results for matched assays (AFFINITY_RATIO).
09_affinity_sum/
<condition1:condition2>_<assay>.tsv
: Per-contrast affinity sum results for matched assays (AFFINITY_SUM).
DYNAMITE
This step employs DYNAMITE, a machine learning approach that identifies transcription factors responsible for differential gene expression through regularized linear regression analysis. DYNAMITE integrates differential expression data with TF-gene binding affinities to determine which transcription factors are the most likely drivers of observed expression changes between conditions.
Statistical Framework: DYNAMITE uses elastic net regularization to build predictive models that relate TF binding affinities to gene expression changes. The method performs nested cross-validation with --dynamite_ofolds
outer folds and --dynamite_ifolds
inner folds to ensure robust model selection and prevent overfitting. The --dynamite_alpha
parameter controls the balance between L1 (LASSO) and L2 (Ridge) regularization, allowing the model to both select relevant features and handle correlated predictors.
Input Processing: The pipeline preprocesses differential expression results and TF-gene affinity matrices into the format required by DYNAMITE. This includes proper scaling and alignment of gene identifiers between the expression and affinity datasets.
Model Execution: DYNAMITE fits regularized regression models to identify TF-gene regulatory relationships that best explain observed expression changes. The algorithm can optionally randomize input data (--dynamite_randomize
) for negative control analysis. The pipeline includes safeguards to handle cases where input datasets are too small for robust statistical modeling (exit status 139).
Results Filtering: Raw DYNAMITE regression coefficients are filtered based on --dynamite_min_regression
to retain only transcription factors with substantial regulatory effects. This threshold helps focus on biologically meaningful regulatory relationships while reducing noise from weak or spurious associations.
The DYNAMITE analysis provides quantitative estimates of each transcription factor’s contribution to differential gene expression, forming a crucial component of the overall TF activity scoring framework.
Output files
04_dynamite/
01_preprocessed/
- Inputs converted/prepared for DYNAMITE (PREPROCESS).
02_dynamite/
- Raw DYNAMITE outputs; runs with too-small input may be ignored per module error strategy (RUN_DYNAMITE).
03_filtered/
- Tabular results filtered by
--dynamite_min_regression
threshold (FILTER).
- Tabular results filtered by
Ranking
This final analytical step integrates all upstream analyses to generate comprehensive transcription factor activity rankings. The workflow combines differential expression data, TF-gene binding affinities, and DYNAMITE regression coefficients to calculate composite TF-target gene (TF-TG) scores, then performs statistical testing to rank transcription factors by their regulatory activity.
Score Calculation: The pipeline computes TF-TG scores by integrating three key components:
- Differential Expression: Log2 fold changes and significance levels from DESeq2 analysis
- Binding Affinity: Quantitative TF-gene binding predictions from STARE
- Regulatory Coefficients: DYNAMITE-derived estimates of TF contributions to expression changes
These components are combined using a weighted scoring function that accounts for both the magnitude of expression changes and the strength of regulatory relationships.
Statistical Testing: The pipeline performs Mann-Whitney U tests to assess the statistical significance of TF activity differences between conditions. The --alpha
parameter sets the significance threshold for identifying differentially active transcription factors. This non-parametric test is robust to outliers and does not assume normal distributions, making it well-suited for gene expression and regulatory data.
Ranking Generation: Transcription factors are ranked based on their composite activity scores, with separate rankings generated for each chromatin accessibility assay (e.g., H3K27ac, H3K4me3, ChromHMM-derived enhancers/promoters). Target genes are also ranked to identify the most strongly regulated genes for each transcription factor.
Cross-Assay Integration: The pipeline generates several levels of ranking aggregation:
- Per-assay rankings: Individual rankings for each chromatin assay type
- Cross-assay integration: Combined rankings that integrate evidence across multiple assay types
- Comprehensive matrices: Easy-to-use tables that facilitate downstream analysis and visualization
The ranking outputs provide the primary results of the pipeline: prioritized lists of transcription factors most likely to drive differential gene expression between experimental conditions.
Output files
05_ranking/
01_tf_tg_score/
<contrast>_<assay>.score.tsv
: Intermediate TF–TG scores derived from affinities and counts (TF_TG_SCORE).
02_ranking/
<contrast>_<assay>.tf_ranking.tsv
<contrast>_<assay>.tg_ranking.tsv
: Ranked outputs for TFs and TGs by chosen criteria (CREATE_RANKING).
03_combined_tfs_per_assay/
<assay>.tf_ranking.tsv
: TF ranking matrices per assay (COMBINE_TFS_PER_ASSAY).
04_combined_tfs_across_assays/
all.tsv
: TF ranking matrices combined across assays (COMBINE_TFS_ACROSS_ASSAYS).
05_combined_tgs_per_assay/
<assay>.tg_ranking.tsv
: TG ranking matrices per assay (COMBINE_TGS_PER_ASSAY).
06_combined_tgs_across_assays/
all.tsv
: TG ranking matrices combined across assays (COMBINE_TGS_ACROSS_ASSAYS).
FIMO
This optional analysis step uses FIMO (Find Individual Motif Occurrences) from the MEME Suite to perform comprehensive motif scanning within candidate regulatory regions. FIMO provides detailed, site-specific predictions of transcription factor binding sites that complement the STARE affinity calculations.
Motif Preparation: The pipeline filters the processed motif collection to include only high-quality motifs suitable for scanning. Motifs are converted to MEME format, which is the native input format for FIMO analysis.
Sequence Extraction: Genomic sequences are extracted from the reference FASTA file for all candidate regulatory regions identified in the Peaks step. This includes regions from ChIP-seq peaks, ChromHMM-predicted enhancers/promoters, and ROSE-identified super-enhancers.
Motif Scanning: FIMO scans each regulatory region for occurrences of all transcription factor binding motifs using position weight matrices. The algorithm calculates p-values for each potential binding site based on the motif’s scoring distribution. FIMO’s statistical framework accounts for multiple testing correction and provides reliable significance estimates for predicted binding sites.
Output Formats: FIMO generates multiple output formats for each scan:
- TSV files: Tab-separated tables with detailed binding site predictions including coordinates, scores, p-values, and sequences
- GFF files: Genomic feature format files compatible with genome browsers for visualization
- HTML reports: Human-readable summaries with graphical representations of motif matches
- XML files: Machine-readable results in standardized format
Result Aggregation: Individual FIMO results are combined across all scanned regions and motifs, providing comprehensive binding site catalogs for downstream analysis or integration with other genomic datasets.
FIMO analysis is particularly valuable for users who need detailed binding site predictions, want to validate STARE affinity calculations, or plan to integrate results with other motif analysis pipelines.
Output files
06_fimo/
01_filtered_motifs/
motifs.filtered.meme
or similar: Final motif subset for FIMO scanning (FILTER_MOTIFS in FIMO context).
02_extracted_sequence/
<condition>_<assay>.fa
: FASTA sequences extracted from regions to be scanned (EXTRACT_SEQUENCE).
03_fimo/
<region_or_assay>_<motifId>/
: Per-id FIMO outputs (RUN_FIMO), including:*.tsv
,*.gff
,*.html
,*.xml
,*cisml.xml
,*best_site.narrowPeak
.
04_combined_results/
<condition>_<assay>.tsv
<condition>_<assay>.gff
: Collated FIMO hits across inputs (COMBINE_RESULTS).
SNEEP
This optional variant analysis step uses SNEEP (SNP Effect on Expression Prediction), a statistical approach for identifying single nucleotide variants that affect transcription factor binding and potentially alter gene expression. SNEEP provides insights into how genetic variants within regulatory regions might contribute to phenotypic differences between individuals or conditions.
Variant-Motif Analysis Framework: SNEEP evaluates the impact of SNPs on transcription factor binding by comparing motif scores between reference and alternative alleles. The method uses a probabilistic framework to assess whether variants significantly alter binding site strength, accounting for the natural variation in motif scores across the genome.
Motif Processing: The pipeline prepares motifs specifically for SNEEP analysis by applying organism-specific scaling factors (--sneep_scale_file
) and using curated motif collections (--sneep_motif_file
). These preprocessing steps ensure that binding site predictions are calibrated appropriately for variant effect analysis.
Region Preparation: Candidate regulatory regions from the Peaks analysis are converted from GFF to BED format and sorted for efficient intersection with variant datasets. Overlapping or duplicate regions are merged to create a non-redundant set of regulatory intervals.
SNP Filtering: The pipeline intersects the provided SNP dataset (--snps
) with regulatory regions to identify variants that fall within potential transcription factor binding sites. This filtering step focuses the analysis on variants most likely to have functional regulatory effects.
Statistical Analysis: SNEEP performs statistical tests to identify SNPs that significantly alter transcription factor binding affinity. The method accounts for multiple testing correction and provides effect size estimates for each variant-motif combination.
Applications: SNEEP results are particularly valuable for:
- Functional annotation of variants from GWAS or population genetics studies
- Prioritization of regulatory variants for experimental validation
- Understanding the molecular mechanisms by which genetic variants affect gene expression
- Integration with expression QTL (eQTL) mapping studies
The SNEEP analysis extends the pipeline’s capabilities beyond condition-based comparisons to include genetic variation as a source of regulatory differences.
Output files
07_sneep/
01_filtered_scales_motifs/
filtered_sneep_*_mouse_*.txt
: Motif set prepared for SNEEP (FILTER_SCALES_MOTIFS).
02_gff_to_bed/
<condition>_<assay>.bed
: Genomic annotations converted from GFF to BED (GFF_TO_BED).
03_sorted/
<condition>_<assay>_sorted.bed
: Sorted BED regions for downstream intersection (SORT_BED).
04_merged/
merged_<condition>_<assay>.bed
: Duplicate/overlapping regions merged per ID (MERGE_DUPLICATE_REGIONS).
05_filtered_snps/
filtered_<condition>_<assay>.bed
: SNPs filtered to annotated regions (FILTER_SNPS_BY_REGION).
06_sneep/
<condition>_<assay>/*
: SNEEP outputs for variant effect analysis (RUN_SNEEP).
Report
This final step generates a comprehensive, interactive HTML report that consolidates all pipeline results into an accessible format for sharing with collaborators and stakeholders. The report provides an intuitive interface for exploring transcription factor rankings, understanding regulatory relationships, and accessing detailed analysis results.
Interactive Visualization: The report features a modern, responsive web interface built with JavaScript frameworks that allows users to:
- Dynamically filter transcription factor rankings by assay type
- Search for specific transcription factors or target genes
- Explore detailed views for individual TFs including expression plots and regulatory networks
- Compare results across different experimental conditions and chromatin assays
- Access parameter settings used for the analysis
Content Organization: The report includes several key sections:
- Main Rankings Page: Interactive tables showing transcription factor activity scores across all assays
- Individual TF Pages: Detailed views for each transcription factor with expression analysis, target gene lists, and regulatory evidence
- Parameters Page: Complete documentation of all analysis parameters and settings used in the pipeline run
- Navigation System: Intuitive menu structure with links to external resources like GeneCards
Data Integration: The report integrates results from all pipeline steps:
- Transcription factor activity rankings from the statistical analysis
- Expression data and differential expression results from DESeq2
- Binding affinity predictions from STARE
- Regulatory coefficients from DYNAMITE analysis
- Motif information and binding site predictions
Distribution Formats: Results are provided in two convenient formats:
- ZIP Archive: Complete report bundle optimized for email sharing or archiving
- HTML Directory: Expanded report structure that can be hosted on web servers or opened directly in browsers
Quality Assurance: The report includes data quality indicators, analysis statistics, and links to the pipeline documentation, ensuring users can assess the reliability of results and understand the analytical methods.
The report serves as the primary deliverable for most users, providing both high-level summaries for quick interpretation and detailed data for in-depth analysis.
Output files
08_report/
report.zip
: Final report bundle, optimal for sending to collaborators.report/index.html
: Entry point for the interactive report, can be opened in the browser with a double-click.report/parameters/index.html
: Parameters used for the run.report/tf/<symbol>(<motifId>)/index.html
: Per-TF pages.
Workflow reporting and genomes
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andnf_core_tfactivity_software_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameters are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
The nf_core_tfactivity_software_versions.yml
file contains all software versions used in the pipeline execution, ensuring full reproducibility of results. This is particularly important for transcription factor analysis where different tool versions may produce slightly different results due to algorithm improvements or parameter changes.