I finally found time to rewrite the code using base graphics rather than ggplot2. The code is now much faster, and if you're familiar with base R's plot options and graphical parameters, most of these can now be passed to the functions to tweak the plots' appearance. The code also behaves differently depending on whether you have results for one or more than one chromosome.
Here's a quick demo.
First, either copy and paste the code from GitHub, or run the following commands in R to download and source the code from GitHub (you can't directly read from https in R, so you have to download the file first, the source it). Note the command is different on Mac vs Windows.
Download the function code on Mac:
download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")
Download the function code on Windows, leave out the method="curl" argument:
download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r")
Now, source the script containing the functions.
source("./qqman.r")
Next, load some GWAS results, and take a look at the relevant columns (same as above, download first then read locally from disk). This is standard output from PLINK's --assoc option.
Download example data on Mac:
download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz", method="curl")
Download example data on Windows:
download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz")
Read in the sample results, and take a look at the first few lines:The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R. Do this with a quick sed command. Here's what the data looks like:
SNP CHR BP P 1 rs10495434 1 235800006 0.62220 2 rs6689417 1 46100028 0.06195 3 rs3897197 1 143700035 0.10700 4 rs2282450 1 202300047 0.47280 5 rs567279 1 66400050 NA 6 rs11208515 1 64900051 0.53430
Now, create a basic manhattan plot (click the image to enlarge):
manhattan(results)
If you type args(manhattan) you can see the options you can set. Here are a few:
colors: this is a character vector specifying the colors to cycle through for coloring each point. Here's a PDF chart of R's color names.
ymax: this is the y-axis limit. If ymax="max" (default), the y-axis will always be a little bit higher than the most significant -log10(p-value). Otherwise you can set this value yourself.
*** Update March 9 2012*** cex.x.axis is deprecated. To change the x-axis size, use the default base graphics argument cex.axis.
limitchromosomes: you can limit which chromosomes you want to display. By default this restricts the plot to chromosomes 1-23(x).
suggestiveline and genomewideline: set these to FALSE if you don't want threshold lines, or change the thresholds yourself.
annotate: by default this is undefined. If you supply a character vector of SNP names (e.g. rs numbers), any SNPs in the results data frame that also show up here will be highlighted in green by default. example below.
... : The dot-dot-dot means you can pass most other plot or graphical parameters to these functions (e.g. main, cex, pch, etc).
Make a better looking manhattan plot. Change the plot colors, point shape, and remove the threshold lines:
manhattan(results, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)
Now, read in a text file with SNP names that you want to highlight, then make a manhattan plot highlighting those SNPs, and give the plot a title:
Download a SNP list on Mac:
download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt", method="curl")
Download a SNP list on Windows:
download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt")
Read in the SNP list, and create an annotated plot:
snps_to_highlight = scan("./snps.txt", character())
manhattan(results, annotate=snps_to_highlight, pch=20, main="Manhattan Plot")
Finally, zoom in and plot only the results for chromosome 11, still highlighting those results. Notice that the x-axis changes from chromosome to genomic coordinate.
manhattan(subset(results, CHR==11), pch=20, annotate=snps_to_highlight, main="Chr11")
Finally, make a quantile-quantile plot of the p-values. To make a basic qq-plot of the p-values, pass the qq() function a vector of p-values:
qq(results$P)
Perhaps we should have made the qq-plot first, as it looks like we might have some unaccounted-for population stratification or other bias.
The code should run much faster and use less memory than before. All the old functions that use ggplot2 are still available, now prefixed with "gg." Please feel free to use, modify, and redistribute, but kindly link back to this post. The function source code, data, SNPs of interest, and example code used in this post are all available in the qqman GitHub repository.






This comment has been removed by the author.
ReplyDeleteThis is fantastic! Just what I was looking for. I stumbled across your earlier version this afternoon, and was having difficulty getting it to work. The new version worked beautifully and gave me the plot I was dreading having to code up myself from scratch. Could not have come a at a better time! Thanks!
ReplyDeleteJust wanted to add: great! Really, thanks a bunch. It works like a charm...
ReplyDeleteI've added some code so it'll automatically outputs .EPS-files.
Only thing I'm looking for now, is to annotate the top hits (-log10(P)>=8) with the gene name... That would be great!
Thanks!
I have run it and this code is very cooool
ReplyDeleteAlso, a quick question, when I creat my own results and followed the format of “SNP CHR BP P”
However, R returns
“Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 2”
Anybody knows why?
Thanks
I'm getting the same error message :/ I thought that maybe it was to do with my SNP names, as I'm working with another species, but even after renaming them to rsnumber I still get
DeleteError in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 7627
Joanna,
DeleteThis is a known issue. Do you "skip" a chromosome? I.e., have SNPs for chrs 1-3, 5-7, skipping 4, for example? I think this causes a problem. A solution is in the works. Stay tuned.
I'm guessing the error might be because you have something other than a number in the p-value column (maybe a NaN?). This column can only be a number between 0 and 1 or NA for missing. Next time I update the code I'll add in some checks for this.
ReplyDeleteHi Stephen,
DeleteThanks for posting a very nice code. I am entirely new to R and GWAS. I ran this script and got this same error as above & below (my pvalues are in exponential forms. Please see below).
> source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
> results <-read.table("LDLtopHits1.txt", T)
> head(subset(results, select=c(SNP, CHR, BP, P)))
SNP CHR BP P
1 kgp11202561 19 45426792 2.200786e-48
2 kgp7807118 19 45413576 1.488841e-35
3 kgp8411531 19 45414399 2.522903e-24
4 rs7528419 1 109817192 5.896030e-13
5 rs1160985 19 45403412 1.611327e-12
6 kgp807278 19 45389596 1.075362e-11
> manhattan(results)
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 3
Calls: manhattan -> $<- -> $<-.data.frame
Execution halted
Thanks
Segun
I've double checked that I only have values between 0 and 1 in my p-value column and still get the same error...
DeleteThank you!
ReplyDeleteI started learning how to run GWAS two hours ago, and thanks to this I'm already looking at some preliminary results.
Science has never been simpler! ;)
Thanks for the nice script! While using it I found out, that it gives an error when there are no snps for a specific Chromosome.
ReplyDeleteMany thanks! This is great!
ReplyDeleteOne thing though: when I try to use the "ymax" in order to limit the Y axis values, it doesn't really work. maybe I'm using it the wrong way.
Here's what I typed:
manhattan(results, ymax=5, limitchromosomes=F, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)
Thanks!
Sagi, I think I see your problem here. These two lines gave it the behavior I desired:
ReplyDeleteif (ymax=="max") ymax<-ceiling(max(d$logp))
if (ymax<8) ymax <- 8
... which is to say, if ymax="max", make the maximum value on the y axis at least one point higher on the -log10p scale than the most significant SNP. But if the max y was less than 8, set it at 8 (because genome-wide significance is conventionally 5e-8).
If you want this code to behave the way you want, remove the second line that resets ymax to 8 if it's less than that.
Is it also possible to change the y-axis min? If i start the y-axis from 1 it will decrease the output and result in smaller pdf files. Thank you
DeleteSure, just change the ylim=c(0,ymax) line to ylim=c(1,ymax) or whatever you want to start it at.
DeleteThank you for your quick response!
DeleteDo you need to do this twice in the script?
Is it possible to do this via:
manhattan(results, ylim=c(1,ymax))?
Somehow i don't get it working, right.
Thank you,
Jonas
Make sure to make the change on line 59, on the points() call in the else {} block
DeleteThis comment has been removed by the author.
DeleteHi,
ReplyDeleteAs many wrote: thanks a bunch! I'm having difficulty getting the genome-wide significance at 5e-8. It just won't get to 8, it's always a little off... Any clues?
I changed your code a little, but only in making the line for instance dotted. Here's the code:
# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("black", "gray50"), ymax="max", cex.x.axis=0.8, limitchromosomes=1:23, suggestiveline=-log10(1e-4), genomewideline=-log10(5e-8), annotate=NULL, ...) {
d=dataframe
if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")
if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]
d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0<P<=1
d$logp = -log10(d$P)
d$pos=NA
ticks=NULL
lastbase=0
colors <- rep(colors,max(d$CHR))[1:max(d$CHR)]
if (ymax=="max") ymax<-ceiling(max(d$logp))
if (ymax<9) ymax<-9
numchroms=length(unique(d$CHR))
if (numchroms==1) {
d$pos=d$BP
ticks=floor(length(d$pos))/2+1
} else {
for (i in unique(d$CHR)) {
if (i==1) {
d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP
} else {
lastbase=lastbase+tail(subset(d,CHR==i-1)$BP, 1)
d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP+lastbase
}
ticks=c(ticks, d[d$CHR==i, ]$pos[floor(length(d[d$CHR==i, ]$pos)/2)+1])
}
}
if (numchroms==1) {
with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab=paste("Chromosome",unique(d$CHR),"position"), ...))
} else {
with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab="Chromosome", xaxt="n", type="n", ...))
axis(1, at=ticks, lab=unique(d$CHR), cex.axis=cex.x.axis)
icol=1
for (i in unique(d$CHR)) {
with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
icol=icol+1
}
}
if (!is.null(annotate)) {
d.annotate=d[which(d$SNP %in% annotate), ]
with(d.annotate, points(pos, logp, col="darkorange", ...))
}
if (suggestiveline) abline(h=suggestiveline, col="royalblue4", lty=5)
if (genomewideline) abline(h=genomewideline, col="red",lty=5)
}
Thanks!
@swvanderlaan - the reason the line won't get to 8 on the y-axis is because genomewideline is set to -log10(5e-8), which is about 7.30 on the -log10 scale. The p=5e-8 number comes from the notion that there are ~1 million independent tests in the genome, so by a Bonferroni correction, 0.05/1e6=5e-8, the widely accepted holy grail p-value for "genome-wide significance."
ReplyDeleteIf you want the line to show up at 8 on the y-axis, set genomewideline=-log10(1e-8), or simply, genomewideline=8.
Hope this helps.
Hey Stephen,
ReplyDeleteSo, you stare for minutes, hours at the code, and you just don't see it... Haha, thanks!
Now, what I wrote earlier: I want names with my top hits. And I'll figure that one out, I'm sure. And when I do, I'll be sure to get back to this post, to share it!
Thanks.
Sander
Try using the text() function to add text to the plot. It probably wouldn't be too hard to work this into the function to annotate SNPs with gene names using the text() function. You'll probably want to limit it to SNPs having annotation information AND having p-value less than a certain threshold, so that it's readable and looks nice.
ReplyDeleteDoes Dr. swvanderlaan or Dr. Stephen Turner can help to solve how to make top 10 SNPs name on the plots with R?
ReplyDeleteCan anyone tell me how to adjust the size of the points on the plot?
ReplyDeleteUse the cex= option. See here: http://www.inside-r.org/r-doc/graphics/par
ReplyDeletehmm...Read through that but didn't understand a whole lot. par(pch=20, cex=0.5) kind of does what I want but it also resizes the axis label fonts. One would think there would be something like pch.cex to do this... Can you give me any clearer guidance please?
ReplyDeleteOh, I see what's going on here now. I included a graphical parameter that I named cex.x.axis to resize x-axis labels. When using the cex= argument, it conflicts with the argument I supply. Source this version, and you should be able to change the point size with the cex option.
ReplyDeletemanhattan(results) #normal size
manhattan(results, cex=.5) #half size
https://gist.github.com/1094160
Thanks Stephen that worked.
ReplyDeleteHi Stephen,
ReplyDeleteI ran this script and got the following error message :
"Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 36720"
a similiar problem somebody had.
I double-checked the P values, all fall in (0,1] and no missing values.
Do you know how this would happen? anything is wrong in my dataset?
Many thanks,
Hmm, not sure without seeing your data. Are you sure that all the other relevant columns contain numeric values only? How many chromosomes do you have?
ReplyDeleteHi Stephen,
ReplyDeleteI'm trying to run this script on some results from mice, and am getting this error: Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors.
I thought this may be due to the choromosome number being different, but I have now changed the script to chromosome 1:19 and it still has the same error. Do you know what might be causing this problem? Thanks.
Hi Stephen,
ReplyDeleteI posted a problem a few days ago when I ran this script and the error message was:
:"Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 36720"
I double-checked the varaibles in the dataset and all the other relevant columns contain numeric value. However, there was only 21 chromosomes (chromomsome 3 was missing). I am wondering if this is the reason which caused the problem?
Thanks,
Anonymous 1 and anonymous 2: I'll try to take a look at the code sometime soon but I've got a major grant deadline coming up that's sucking much of my time away. Anonymous 2: That could easily be the problem - for a dirty fix, try putting in a single line in your results file between chr 2 and 4, where the chr is 3, position is 1, and p-value is 1, see if it works and let me know. Anonymous 1 - it's difficult for me to troubleshoot without some example data. Could you use something like http://jotonce.com/ to copy a few lines out of your results file, and email me the password to access it?
ReplyDeleteStephen,
ReplyDeleteafter adding a redundant line for chr 3, the function works well! thank you so much!
Thanks for helping me figure that one out. I'm soon going to be updating the code to fix a few other things and add things like automatic annotation based on gene region, SNP features, etc. I'll add this to the list of things to fix.
ReplyDeleteHey Stephen,
ReplyDeleteI'm running into the following error when trying to create a manhattan plot:
Error in Math.factor(d$P) : log10 not meaningful for factors In addition: Warning messages: 1: In Ops.factor(P, 0) : > not meaningful for factors 2: In Ops.factor(P, 1) : <= not meaningful for factors
Any insight into what might be the source of this error would be greatly appreciated as I'm stumped!
Do you have something other than numbers between 0 and 1 (not including 0)?
ReplyDeleteThank you for sharing the code! I need to prepare one of these plots quickly and your programming is pretty straightforward. Nice work!
ReplyDeleteIs there a way to get a chromosome 0 for scaffolds not incorporated into CHR?
ReplyDeleteI'll add it to the growing list of changes for the next update. For now, what happens if you simply have a chromosome 0, or chromosome 24? Make sure you appropriately change the limitchromosomes argument.
ReplyDeleteIf I change the limit chromosome argument to 0:8, and include CHR 0 in my source file I get this error:
ReplyDeleteError in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 213
I get the same error if I set limitchromosomes=F
I have not tried it with 24 as my organism only has a few chromosomes.
Also, if I leave it as 1:8 it plots perfectly, but just excludes the CHR 0 data.
ReplyDeleteChanging this i==1 to i ==0 allows for CHR 0,but for some reason the last chromosome has not points.
ReplyDelete+ } else {
+ for (i in unique(d$CHR)) {
+ if (i==0) {
and changing icol=icol+0 restores those points for the last CHR, but the colors by CHR are missing
ReplyDelete+ for (i in unique(d$CHR)) {
+ with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
+ icol=icol+0
Finally got it with changing from i==1 to i==0 like so:
ReplyDelete+ } else {
+ for (i in unique(d$CHR)) {
+ if (i==0) {
and then changing type= "n" to type = "p"
+ with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(p))), xlab="Chromosome", xaxt="n", type="p", ...))
Excellent! Thanks for hammering away at this. I'll try to make the code more flexible on my next round of updates.
ReplyDeleteHi Stephen,
ReplyDeleteThanks. This is great !
I have tried to add the SNP name in the plot using text. Using the BP for that SNP, how can I find the x-coordinate in the plot? For example, what is x-coordinate in the plot for CHR=2, BP=53606599)?
Thanks in advance.
If you look at the code here, in lines 45-53 I'm creating a new variable in the data frame called d$pos. On chromosome 1 d$pos is set to d$BP. For every remaining chromosome, d$pos is set to the BP position on that chromosome, added to the last base position on the previous chromosome. For instance, if I have 3 SNPs on chromosome 1 at positions 10, 20, and 30, and 3 SNPs on chromosome 2 at positions 5, 15, 25, the X coordinates for the six SNPs would be 10, 20, 30, 35, 45, 55. The x-axis uses the d$pos variable, and you can mod the code to use d$pos as the x coordinate for your text. I'd love to see what you do with this, please post the code to https://gist.github.com/. Thanks!
ReplyDeleteWe were just using plink the other day and I wanted to create a Manhattan plot for our data. I am running into this error:
ReplyDeleteError in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 23
My plink.assoc file is from a small dataset so it only contains two CHR. I've searched the P column for non-0 or 1 but didn't find any.
Thanks for writing this code! I appreciate the time and help!
I can e-mail the the plink.assoc file if you need it.
ara
I actually figured it out using the args(manhattan). When you have less then 23 CHR you need to use the limitchromosomes=#,# . Where #s are your CHRs that are present in the file.
ReplyDeleteWell I thought I had it solved but...
ReplyDeleteSo I have two CHR 11 and 17. I use the limitchromosomes to get it to plot 11 but how to let R know I want just 11 and 17?
Thanks for figuring this out. As with the many other little inconsistencies, I'll try to iron this out and make the code a little more robust in the next version.
ReplyDeleteJust another quick note. I tried to use the limitchromosomes=11:12 (basically turned my 17 into 12 so it's consecutive) the same error pops up. It looks like you can do a single CHR using limitchromosomes but not consecutive CHRs unless you have all 1:23 present.
ReplyDeleteThanks for the note!
ReplyDeleteIs there any way to change the colour of the annotation from green to red? I don't quite understand how the dot-dot-dot function is supposed to work?
ReplyDeleteChange the color on line 70.
ReplyDeleteHi Stephen,
ReplyDeleteJust a quick question: how do you usually select you annotation SNPs? Do you say (using PLINK): pick a window of 100 SNPs around my hit? Or do make it broader?
Thanks!
Sander
Where I've needed this in the past is to highlight known hits based on RS number from the GWAS catalog or something. If you wanted to highlight "skyscrapers" in the manhattan plot, you could just select all the SNPs within a certain window like you said.
ReplyDeleteExactly, but what range would you take 100-250kb, or would rather take 500-1500kb?
ReplyDeleteIt's hard to say really. You might base it on how big the haplotype blocks are in that region. But if you're just coloring neighboring SNPs to highlight the peak on the plot, you probably just want to eyeball it, and pick whatever size looks good.
ReplyDeleteIt works great! Thanks a lot!!
ReplyDeleteI also encountered that error when there are no snps for a specific Chromosome. I solved it with a minor adjustment to the loop on line 45. Below is the code.
CHRs <- unique(d$CHR)
numchroms=length(CHRs)
if (numchroms==1) {
d$pos=d$BP
ticks=floor(length(d$pos))/2+1
} else {
for (i in 1:numchroms) {
if (i==1) {
d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP
} else {
lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)
d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP+lastbase
}
ticks=c(ticks, d[d$CHR==CHRs[i], ]$pos[floor(length(d[d$CHR==CHRs[i], ]$pos)/2)+1])
}
}
Bahar - thanks for sharing your solution. I think others have had this problem before but I never knew what was causing it.
ReplyDeleteThis is great. Thanks a lot Dr. Turner.
ReplyDeleteI was trying to figure it how to Zoom in a specific location of Chromosome. For example, if I want to have Manhattan plot only for BP "1e+07 to 2e+07" of CHR1. Any suggestions?
Use LocusZoom.
ReplyDeleteHi Stephen,
ReplyDeleteThanks for your code. It's awesome. I did a MH plot using your code. In the x-axis, all the chr # didn't show up. How do I resolve this issue?
All the #s came up between 1 and 9 and then I have 11,13, 15, 19 etc.
Thanks,
-Joey
Joey - make a bigger plog. Save the plot to a file and make it wider.
ReplyDeleteHi Stephen,
ReplyDeleteThis function is really helpful. I love reading your blog, really very informative, for people like me who have to learn statistics to be in field of human genetics.
I used the function to plot Manhattan but, I am having trouble with pch, dots are very big and when I plot 6million SNP's with imputed data big dots merge with each other and it looks messy. I tried to change size with cex, but it changed the size of x axis and dots remained of same size. Can you suggest something to deal with this?
Thanks,
Ah, I see what's going on here. I named the argument "cex.x.axis" so I could control this at line 60 in the code. Copy the function, and rename the "cex.x.axis" to something like xcex, and when you pass in the cec= argument, it should work properly to resize the points, not the x-axis labels.
ReplyDeleteGreat codes! Very useful. Thank you Stephen
ReplyDeleteHi Stephen,
ReplyDeleteThis code is so useful - thank you for sharing it!
I have been adapting your script, but I'm having some problems with adding a title to the Manhattan plot (although I can successfully change the colours):
# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
main="Viral load in cell lines", ...) {
If you could suggest what I should do to add a title, that would be much appreciated!
I'm also having trouble changing the size of the points using pch:
ReplyDelete# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
main="Viral load in cell lines", pch=20, ...) {
Hello Anonymous from July 19 - I believe I solved your cex problem.
ReplyDeleteHi Stephen,
ReplyDeleteAwesome work! I am using R studio to run the code. I am having trouble getting the last command to execute. manhattan(results) gives an Error: could not find function "manhattan".
Any ideas?
Thanks
It looks like you didn't source in the function. Copy and paste the function yourself, or source it from the web directly using source("http://www.StephenTurner.us/qqman.r")
ReplyDeleteHello Stephen,
ReplyDeleteIts a very nice piece of code you have compiled. I am using R to plot my GWAS data. I was wondering if we could plot lines instead of dots with colour intensity proportional to LOD scores. In that way, it'll be easy to draw region specific conclusion about association.
Regards,
Amol
Best Wishes
Amol - that would be pretty simple with the old ggplot2 code - just use geom="line" and colour=LOD or the like. I'm not sure how to build this into the current implementation, but contributions are welcome.
ReplyDeleteHi Stephen,
ReplyDeleteI want to have grids at beginning and end of each chromosome on X-axis in my Manhattan plot. Can you tell how should I specify that?
Thanks,
Manav
This should be easy to do with the abline function.
ReplyDeleteHi Stephen,
ReplyDeleteGreat code! Thank you so much!
I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.
I had the same error as some described above:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 401
So, I tried to input the data bit by bit to determine where the error is.
I located the first error in this line in the data:
rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672
at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 400
indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400). Don't know why it is only accepting "NA".
There are 23 chromosomes by the way.
Thank you,
Miah
Hi Stephen,
ReplyDeleteGreat code! Thank you so much!
I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.
I had the same error as some described above:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 401
So, I put in the data bit by bit to determine where the error is.
I located the first error in this line in the data:
rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672
at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 400
indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400)
There are 23 chromosomes by the way.
Thank you,
Miah
I'm guessing you changed the column headers? The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R.
ReplyDeleteI suppose I should add in some error checking but it's been a while since I've looked at this code. Hopefully I can update it soon. Let me know if this helps.
Does it work otherwise if you put say all the data from chromosomes 1-5 in there? If so, something's odd with the data at that point on chr 6.
Yes, I did change the corresponding headers. It does work when I put all the data from chromosome 1-5 until that point on chromosome 6. When I changed the p value at that point to NA, it worked fine.
ReplyDeleteThanks,
Miah
i have used "source("http://www.StephenTurner.us/qqman.r")" server several times before, but suddenly it's not working for last few days ! can you please suggest some way ?
ReplyDeleteBest,
Sabyasachi
Sabyasachi - I had these files hosted under my google sites account, but it looks like they changed something with how they handle redirects. I moved the files to my server at UVA and updated the code. You should be good to go with the updated code above.
DeleteOh, and you can always download the code from GitHub.
DeleteHi Stephen, thanks for this amazing code!
ReplyDeleteI am just wondering if I can change the label for chr X ("X" instead of "23").
Is it possible?
Thank you!
Diego
Diego - the quick and dirty solution would be to change line 60 of the code where it says lab=unique(d$CHR), change that to lab=c(1:22,"X").
ReplyDeleteStephen:
ReplyDeleteI have used the code successfully in the past. I am now obtaining the error code below. Do you have any guidance on this issue. I have attached by data.
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 1
CHR SNP BP A1 TEST NMISS OR SE L95 U95 STAT P
1 rs3813605 84743535 A DOM 2200 1.109 0.09787 0.9154 1.343 1.056 0.2908
2 rs4953223 45370846 C DOM 2201 1.323 0.09777 1.092 1.602 2.862 0.004208
2 rs34058885 49417012 A DOM 2200 0.9881 0.09331 0.8229 1.186 -0.1288 0.8975
2 rs7599054 135257016 G DOM 2201 0.7543 0.09637 0.6245 0.9111 -2.926 0.003435
3 rs3896023 173034387 A DOM 2197 0.883 0.09549 0.7323 1.065 -1.303 0.1925
4 rs1554668 114576711 T DOM 2199 1.28 0.1163 1.019 1.608 2.125 0.03362
5 rs12520934 67162619 A DOM 2199 1.38 0.09121 1.154 1.65 3.528 0.0004186
6 rs9469084 32188361 T DOM 2201 0.591 0.1319 0.4564 0.7653 -3.988 6.672e-005
6 rs1123969 155276341 A DOM 2201 1.299 0.09375 1.081 1.561 2.789 0.005294
8 rs1016646 20307688 A DOM 2200 0.7924 0.1284 0.6161 1.019 -1.811 0.07009
10 rs10788275 124051491 T DOM 2200 1.376 0.09063 1.152 1.643 3.521 0.0004295
11 rs733454 76155369 A DOM 2200 1.335 0.1226 1.05 1.698 2.358 0.01838
20 rs2865873 38808983 T DOM 2182 1.28 0.09351 1.066 1.537 2.64 0.008289
John Paul,
ReplyDeleteThis is a known issue - when you skip over chromosomes you run into this problem. It will work fine on the first 9 rows of your data. I haven't had time to fix this. I'd welcome any contributions!
Best,
Stephen
Stephen:
DeleteThanks for your response on this matter. Unfortunately I don't have the expertise to help with the program. With the information you've provided we can move ahead to generate the figure. Thanks for making the product available -- it has helped us a great deal.
Sincerely,
John Paul
I have a couple of suggestions to work around this. One would require a change to the code the other would not.
Delete1) I assume that in the provided code, the min and max values for the SNP positions are used as the length of the associated chromosome and that no SNP data for a particular chromosome would mean no min and max and thus generate an error. It might be a good idea to allow the user to provide a coordinates file as an input option to the function that details the expected lengths of each chromosome to plot. Then these more accurate widths can be used in place of the min and max values.
2) If you don't have data to plot for a particular chromosome but do for subsequent chromosomes and wanted to avoid this error you could create two fake entries in your input file each defining a fake SNP call and other required data for the start of the chromosome and one for the expected end. Then you could use the annotate option to annotate this SNPs in white. They would be plotted, but you wouldn't see them and the correct width would be maintained in the figure.
Not that I've not tested this out myself but logically it should work. Maybe the author could comment?
Cheers!
Anon - Thanks for the suggestion. I've known about this error for some time but have not had the time to make the necessary code updates. Hopefully will soon. Thanks for leaving these tips here.
DeleteHello,
ReplyDeleteI am using the manhattan() function to plot the results of THREE different analyses for the same subset of 20 snps (ie 3 sets of p-values for the 20 snps). The function seems to have no problem plotting multiple SNPs at the same position. However, I would like to attribute a different color to each set of P-values. Would this be possible with the manhattan() function? I've tried including a fourth column ("COL") with the colors - ie
SNP CHR POS P COL
rs1 8 123 0.05 green
...
and calling it with
manhattan(results,col=results$COL)
but the colors do not come up.
Any suggestions?
Thank you.
Try modifying line 63 of the code
ReplyDeleteHello.
ReplyDeleteI think it is a good program to make manhattan in r. But i had a problem
When I made manhattan plot in cattle(chromosome = 29) using your manhattan function, It had a one problem. I can not see whole chromesome number in x-axis
(for example, x-axis: 1,2,3,4,5,6,7,---, 23,25,27,28,29)
How to change font size on x-axis about numeric value(here, chromosome number)
Thank you.
Stephen,
ReplyDeleteThanks for the great code! How can I specify the size of the plot? I'd like to reduce the height and make it wider.
png("filename.png", width=2000, height=500); manhattan(...); dev.off()
DeleteI couldn't figure out how to run the QQ plot without the HLA region MB25-MB35 on chromosome 6. Any suggestions? Thanks!
ReplyDeleteUse R's subset() command to get rid of those regions before running.
DeleteDear Stephen,
ReplyDeleteOn the QQ plot, I'm trying to adjust the point size, and the maximum on the x- and y-axes using:
> myQQ<-qq(mydata$P, pch=20, xmax=10, ymax=10)
However R doesn't like this.
1: In plot.window(...) : "xmax" is not a graphical parameter
2: In plot.window(...) : "ymax" is not a graphical parameter
Error in localWindow(xlim, ylim, log, asp, ...) :
formal argument "pch" matched by multiple actual arguments
I might be missing something fundamental to R as I haven't used it much (yet...!)
Cheers,
Blue
P.S. Thanks for posting the scripts and hope the grant application went well.
Because you're trying to use the qq() function, not the manhattan() function. ymax and xmax are specific to manhattan() only.
DeleteHi Stephen,
ReplyDeleteFirst of all: Thanks for the useful code!
I've used it several times before without problems but now qq(results) comes up with the "P value vector is not numeric" error. Worryingly, the problem persists even when I use your example dataset, whether it's using the R desktop interface or R in a unix environment.
manhattan(results) works fine, but gg.qq(results) also apppears broken, with the following error: "Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected"
Do you have any idea what this might be about?
Thanks,
Laura
You actually need to provide the P-value vector. I.e., qq(results$P). Just tried this using my example dataset.
ReplyDeleteD'oh. Sorry, I'm an idiot. Thanks..
ReplyDeleteGuys THIS does not map for X, MT Y DNA
ReplyDeleteSo, you can convert Cromossome X to a 23 , and y yo 24 and MT to 25
For that, on the manhattan function, put on the line after
d=data...
paste this:
d$CHR[d$CHR=="X"] <- "23"
d$CHR[d$CHR=="Y"] <- "24"
d$CHR[d$CHR=="MT"] <- "25"
Then all your problems will be solved.
Also , there is an erro i think in the line , lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)
the i-1 is not correct, since it outputs erro in my data, you can put just
lastbase=lastbase+tail(subset(d,CHR==CHRs[i])$BP, 1) ?
Try out, see if this changes Works good!
I believe I used the CHRs[i-1] bit to get the last base from the *previous*, i.e., i-1'th chromosome, so as to start numbering from there on the x-axis for the next chromsome.
ReplyDeleteThanks for providing code for X, Y, and MT. I didn't want to make assumptions about how users were naming these chromosomes. I.e., some name them 23, 24, 25 as you did. Others have PAR regions from Y, not sure how they're naming those.
To annotate the rs number for pvalues >0.001 if you can use this:
ReplyDeletefor (i in 1:length(d$P)){
if(d[i,"logp"]> -log10(0.001)){
text(d[i,"pos"],d[i,"logp"],d[i,"SNP"],pos=3)
}}
before the line " if (!is.null(annotate)) { "
The dataframe does not accept X, Y, MT strings characters as a numerical, so its easy to see it wont work, and will give error, when plotting the value "X on the axis x". So thats the only aproach i see right now.
ReplyDeleteAbout the i-1 you probably right, but somehow, it gives me error.. i have to check again.
About the last chromossome, the function is not very ... well contructed, because it assumes your data as the chromossomes from 1 to 23 all of them.
ReplyDeleteAnd what happens is if you have chromossome 7 and then only the 9..
the function as it is, will call chrom[i] where i is the 9 and then you make i-1 which is 8, then on the list there is no chromossome 8, so it gives error, thats the problem i found. So i have to rewritte the code, and now its working like i want.
I had to call a new list, with the unique chromossomes, then use that array as reference for the i , so then it will call the correct index-1 chromossome on the list of "chromossomes that exist only".
So this is the code:
for the whole manhattan function. Although the axis has some weird bars that dont get named, dont know if thats normal.
# manhattan plot using base graphics
manhattan <- function(dataframe, colors=c("#0174DF", "#DF013A"), ymax="max", limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL, ...) {
d=dataframe
d$CHR[d$CHR=="X"] <- "23"
d$CHR[d$CHR=="Y"] <- "24"
d$CHR[d$CHR=="MT"] <- "25"
d$CHR <- sapply(d$CHR, as.numeric)
if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")
if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]
d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0 -log10(0.001)){
text(d[i,"pos"],d[i,"logp"],paste(d[i,"SNP"],",",d[i,"Allele"]),pos=2,cex=0.6)
}
}
if (!is.null(annotate)) {
d.annotate=d[which(d$SNP %in% annotate), ]
with(d.annotate, points(pos, logp, col="green3", ...))
}
if (suggestiveline) abline(h=suggestiveline, col="blue")
if (genomewideline) abline(h=genomewideline, col="red")
}
This comment has been removed by the author.
ReplyDeleteHere is the google Drive Link to the code:
ReplyDeleteSomehow posting the whole code here, removes some parts (some protection from blogger perhaps).
Link: https://docs.google.com/document/pub?id=1YM8HXVZ_gBRdKe3Yp5aVcA6EI6z428adHgT8NXd7nDA
MQ - thanks for updating this. FYI you can easily fork my code on GitHub, which will syntax-highlight if you name the Gist with a .R extension.
ReplyDeleteDear Stephen,
ReplyDeleteI start using R recently, and I need to plot some chromosome wise values in manhattan plot.
but my values are greater-than 1 (Max 11, Min 0). But I am confused how to change in the R script, you provided for manhattan plot. Can you please help me, I'll be very thankful to you.
Dear Stephen,
ReplyDeleteI really love your script, however I cannot get it to work. That is probably because I have five chromosomes called 2L, 2R, 3L, 3R and X. Is there a way to get around this?
Best, Palle
Hello,
ReplyDeleteI tried to run the function "manhattan" but I got this error:
Error: could not find function "manhattan".
Which package has this function?
I couldn't find it in the ggplot2.
Can anyone help?
Maybe it's because I am using a newer version of R?
Thanks,
Einat
Looks like you didn't define the functions. You can copy and paste the code from the github gist, or source the function in from here: source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
Deletehello stephen. i have a (hopefully simple) question. can you explain how i can make sure my manhattan plot y-axis is always high enough to show the suggestive and genomewide lines?
ReplyDeletethanks in advance, and happy new year!
I think the ymax option will control this.
DeleteI'm trying to plot a manhattan plot for 431885 SNPs. I 've already excluded chromossomes X, Y, MT and 0, and the following error appears:
ReplyDeleteError in Math.factor(d$P) : log10 not meaningful for factors
In addition: Warning messages:
1: In Ops.factor(P, 0) : > not meaningful for factors
2: In Ops.factor(P, 1) : <= not meaningful for factors
What do I have to change on the original script? Is it possible to plot only with data from chromosomes and p-value?
Thank you.
Hi Stephen,
ReplyDeleteI wonder if you can help.
The simple file below (which has lost its formatting here but was tab delimited) fails when producing a manhattan plot but remove the chr 21 line and all is well. I can't for the life of me work out why.
The error message I'm getting is Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) : and in fact changing the 21 to a 19 makes it all work again.
The code I was using is straight from your demo:
source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
results <- read.table("file", header=T, sep="\t")
manhattan(results, suggestiveline=F, genomewideline=F, pch=20, main="Manhattan Plot")
Do you have any suggestions?
CHR SNP BP P
19 LbrM.19.embl_731631 731631 0.934
19 LbrM.19.embl_731734 731734 0.246
19 LbrM.19.embl_731758 731758 0.098
21 LbrM.21.embl_3445 3445 0.277
19 LbrM.19.embl_731898 731898 0.534
In case anyone is also struggling with this and like me didn't see the earlier post about this same issue (oops). This error can appear when there is no data present for a particular chromosome. In my case it was chr20.
DeleteI've replied above to make a suggestion to prevent this error.
Cheers!
Hi!
ReplyDeleteI can't run the function 'annotate' and I don't have a clue... Could you help me?
Hello,
ReplyDeleteThank you very much for your great code.
I am having a problem, I am using this code for cattle dataset with 29 chromosomes. I have changed the limitchromosomes=1:29 but when I plot it, a gap appears between chromosomes 23 and 24, it would be great if you could help me with that.
Cheers!
Thanks for sharing the code! Any advice on how to add a shaded region around the abline corresponding to 95% confidence intervals?
ReplyDeleteDoes anyone here have the time or interest to help me go through all these bugs and feature requests and clean up the code, perhaps even create an R package?
ReplyDeletea question for more than 23 chromosomes plotting how I can do since the genomoma handling of cattle that is 30.
ReplyDeletegreetings.
When I tried to open the output Rplots.pdf from the manhattan function, it said that the file is damaged and cannot be repaired. I thought it's because it's too big (600MB!) and tried to convert it to PNG, but it still tells me that the pdf is damaged. I tried to rerun it several times and also ran the qqplot, yet it never worked. Any insight?
ReplyDeleteThanks!
Hello, I would like to print my SNP names in the Axis X and not the BP. How can I do it?
ReplyDeleteThanks!
Julie
Just want to say thanks, really helpful
ReplyDeleteI am having problems annotating with SNPs I want to highlight. My SNPs are in both files (all the ones to plot first and then the chosen SNPs to highlight in green) characters (not a string of characters). I get no error message when I make the manhattan function plot with my annotation data.frame but I also get no green dots.. Anyone know how to solve this? Thanks!
ReplyDeleteHi Stephan,
ReplyDeleteI'm newbie here. I just start learning R a month ago.
I was trying to make manhattan plot with 32 chromosomes and the last chromosomes is chromosome X. I tried to change your script to chromosome 1:32 and it gave an error like this. "Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors."
I saw there was someone on August 15, 2011 at 5:08 AM also having the same problem with me, but I couldn't find the solution *or maybe I missed it*
Can you help me to solve this problem? Thanks!
I wrote yesteray about annotating, I have now fixed it by transposing the list of SNPs to highligt. AO. Herlino: I think you have to make sure that the column with CHR is numeric.
ReplyDeleteHi Anonymous,
DeleteThanks for the reply! I was sure CHR column is numeric, I don't know what happen actually with this? I just change the script from limitation of 1:23 into 1:32, but it didn't work. Do you have any idea with it?
Thanks!
Hello Stephen,
ReplyDeletegreat code! I have found it very helpful in my continuing quest to "get genetics done" I am slowly getting to the point where I have somewhat of an understanding of what the code is doing. I would like to create a plot in which the SNP surrounding a particular lead SNP are color coded by their LD with that SNP. My initial thought was to add an additional variable to the annotation file and then use that on line 70 in place of green3. Am I even on the right track here?
Thanks
Darryl
Hi Darryl. My strategy would be to look at the snps in the snps annotate file and get their positions, then add more snps to that file extending out by a certain distance, a kb for example. Other ways to do this. I hope to get back to this code at some point to clean up all the bugs listed above, and add this as a feature. Stay tuned.
Delete> snps_to_highlight <- scan("http://www.StephenTurner.us/snps.txt", character())
ReplyDeleteError in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") : cannot open: HTTP status was '404 Not Found'
Is anyone else getting this error?
Hannah: So sorry about that. I should take my own advice and upload the code and data somewhere other than my personal website. All the code and data is now available on GitHub, and I've modified the code in the post above to read data from GitHub directly.
DeleteI still cannot seem to get it..
DeleteWarning message:
In download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", :download had nonzero exit status
I have also tried manually downloading the file and importing it, but I also cannot get it to work!
Hannah,
DeleteThat's one single line that unfortunately wraps in the blog display text. Did you type the entire line?
download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")
Yes, I am typing in the entire line of code provided. I am using rstudio..
DeleteHannah - thanks for pointing this out. I just tried this on a Windows machine and got the same error. If you're using Windows, omit the method="curl" argument and that should work. I updated the code.
DeleteHi Stephen,
ReplyDeleteHope all is well! I am making several Q-Q plots and I am trying to compare different association methods that correct for population stratification. Anyway I would like the y-axis on the plots to be the same. Fro example, I have some plots that have p-values of 1.0E-24 and some with p-values of 1.0E-8. Is there a way to use ymax with the QQ command? I tried but it didn't work for me.
Thanks!
Janina
Hey Janina. I added an extra argument called ymax. Copy/paste this function and the examples to see it in action.
Deletehttps://gist.github.com/stephenturner/5321887
Thanks Stephen. Everything worked just fine :)
DeleteHi Stephen,
ReplyDeleteThanks a lot for the code! I'm new to R and it made my life easier. One quick question please: Is it possible to plot a Manhattan plot with two p values for two phenotypes for the same SNP ? What do I need to change in the code ?
Thanks gain for the post !
Maen
Several ways to do this. Not really recommended to modify results files, but the easiest way is to make one of them have a position 1bp higher or lower than the other, and if you're hilighting, add a _1 and _2 to the rs numbers.
DeleteI see, Thanks a lot.
DeleteMaen
Stephen, thanks for the great script. I am using it on Illumina 450K chip methylation data so instead of using rs numbers I am using IDs like: "cg26544072". Doing the basic manhattan plot is working fine but when I try to do:
ReplyDeletemanhattan(graphTable, annotate=highlight)
where "highlight" is a character vector of IDs I get the error:
"Error in i - 1 : non-numeric argument to binary operator"
Any ideas?
Hmm.. hard to say without seeing your data. Are the columns still named "CHR", "SNP", "BP", and "P"?
DeleteYep, I kept the same names for the columns of the data.frame and it seems to work fine unless I include the "annotate=LDsites" bit. Both the SNP column and the LDsites vector are character vectors.
DeleteHmmm, saved my workspace, reopened and now I am getting the same error trying without the 'annotate'...something odd is going on!
DeleteCracked it. After doing it without 'annotate' I used my data.frame to generate the list for 'annotate', and somewhere in there I changed the 'CHR' column to a character vector (to deal with X and Y) instead of the numeric vector that is required.
Delete