Mutation¶

Two main types of genetic studies are population and familial/pedigree studies. In this workflow, individual mutation information is used to determine the relatedness between individuals and data from The Cancer Genome Atlas landmark paper on most common AML mutations is used to reproduce some of the figures in this publication.

$digraph MUTATION { fontname="sans-serif"; compound="true"; penwidth="0.1"; edge [comment="Wildcard edge", fontname="sans-serif", fontsize=10, colorscheme="blues3", color=2, fontcolor=3]; node [fontname="serif", fontsize=13, fillcolor="1", colorscheme="blues4", color="2", fontcolor="4", style="filled"]; subgraph cluster0 { label="Read data"; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c0_input1 [shape="invhouse", label="AML genotype"]; c0_input2 [shape="invhouse", label="AML meta"]; c0_read [label=<Read data read.delim()>]; c0_check [shape="box", label=<Validate data length(), intersect(), order()>]; c0_out [label="list()"]; c0_input1 -> c0_read; c0_read -> c0_out; c0_input2 -> c0_read; c0_out -> c0_check; { rank=same; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c0_input1; c0_input2; } { rank=same; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c0_out; c0_check; } } subgraph cluster1 { label="reproduce Figure 1a"; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c1_in [shape="invhouse", label=<aggragate and format data rbind(), split(), table('tier')>]; c1_meta [shape="invhouse", label="Meta data \n input$meta"]; c1_mrg [label=<Merge with meta information merge(x, y)>]; c1_subs [label="subset('tier1')"]; c1_plot [label="plot()"]; c0_out -> c1_in [ltail=cluster0, lhead=cluster1]; c0_out -> c1_meta [ltail=cluster0, lhead=cluster1]; c1_in -> c1_mrg [label="x"]; c1_meta -> c1_mrg [label="y"]; c1_mrg -> c1_subs; c1_subs -> c1_plot; } subgraph cluster2 { label="reproduce Figure 1b"; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c2_in [shape="invhouse", label=<aggragate and format data rbind(), split(), table('gene_name')>]; c0_out -> c2_in [ltail=cluster0, lhead=cluster2]; c2_ordr [label="order(sample_count)"]; c2_subs [label="subset('tier1')"]; c2_head [label="head(100)\ndata.frame(gene_name, tier, sample_count)"]; c2_in -> c2_ordr; c2_ordr -> c2_subs; c2_subs -> c2_head; } subgraph cluster3 { label="Load family data"; edge [comment="Wildcard node added automatic in EG."]; node [comment="Wildcard node added automatic in EG."]; c3_input [label="Read genotyping data\n read.delim(genotypes)"]; c3_genot [label=<Summarize genotype freq's rbind(), table('genotype')>]; c3_compu [shape="box", label=<create mutation likelihood matrix matrix(ncol(genotypes), nrow(genotypes))>]; c3_match [shape="box", label=<compute likelihood scores number of matching alleles * allele occurence>]; c3_ibdve [shape="box", label=<Run sliding window over mutations and compute 3rd Quantile of mean marker likelihoods for different window sizes using lapply(), summary(), base:::simplify2array(), seq()>]; c3_input -> c3_genot; c3_genot -> c3_compu; c3_compu -> c3_match; c3_match -> c3_ibdve; } }$

Diagram of mutation analysis workflow.

Packages and Dependencies¶

There are 2 core packages used in this workflow, which have no dependencies.

Used packages:

Core: stats, utils

Data¶

The familial data was obtained from the nice people at Genomes Unzipped, who make their own genomic data publicly available.

The dataset we are using comes from the 23andme v2 sequencing service.

Though the individuals are not related, this data can still be used to perform some typical tests carried out on pedigree studies, such as determining “relatedness” between individuals.

member	dataset id	link
Daniel MacArthur	DGM001	http://s3.amazonaws.com/gnz.genotypes/DGM001_genotypes.zip
Luke Jostins	LXJ001	http://s3.amazonaws.com/gnz.genotypes/LXJ001_genotypes.zip
Dan Vorhaus	DBV001	http://s3.amazonaws.com/gnz.genotypes/DBV001_genotypes.zip
Caroline Wright	CFW001	http://s3.amazonaws.com/gnz.genotypes/CFW001_genotypes.zip
Kate Morley	KIM001	http://s3.amazonaws.com/gnz.genotypes/KIM001_genotypes.zip
Vincent Plagnol	VXP001	http://s3.amazonaws.com/gnz.genotypes/VXP001_genotypes.zip
Jeff Barrett	JCB001	http://s3.amazonaws.com/gnz.genotypes/JCB001_genotypes.zip
Jan Aerts	JXA001	http://s3.amazonaws.com/gnz.genotypes/JXA001_genotypes.zip
Joe Pickrell	JKP001	http://s3.amazonaws.com/gnz.genotypes/JKP001_genotypes.zip
Don Conrad	DFC001	http://s3.amazonaws.com/gnz.genotypes/JKP001_genotypes.zip
Carl Anderson	CAA001	http://s3.amazonaws.com/gnz.genotypes/CAA001_genotypes.zip
Ilana Fisher	IPF001	http://s3.amazonaws.com/gnz.genotypes/IPF001_genotypes.zip

The population study data is from the TCGA consortium publication TCGA, 2013, publication data archive, mutation and annotation (maf), and patient meta data.

Mutation¶

Packages and Dependencies¶

Data¶

License¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Mutation¶

Packages and Dependencies¶

Data¶

License¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation