This website contains information on obtaining the whole genome shotgun sequence of the Cannabis Sativa cultivar "Chemdawg." The data is provided by Medicinal Genomics with the help of Nimbus Informatics. Academic use is free of charge but Amazon EC2 costs are the responsibility of the user. If you are a commercial enterprise please contact Medicinalgenomics@gmail.com for a commercial license.


The sequence data is derived from an ILMN HiSeq v2.0 chemistry with 2x100 reads. There are 7 Lanes in total which add up to 131Gb of sequence. Quality statistics for the run can be found at here. The genome is estimated to be 400Mb thus an estimated 327X coverage.

There are several ways in which we anticipate people will want to use this data:

  • Reassembly of the data with different assemblers. Only two have been tried so far. SOAPdenovo and CLC bio and neither have assembled more than 2 lanes of data. Its possible a far better assembly could be made by using contrail, or the celera assembler found on the web.
  • SNP and indel calling. We have performed preliminary calls and are mapping these to blastX hits to prioritize functional variants. The C. sativa strain is more polymorphic than the C. indica strain currently being assembled.
  • Other cloud based annotation tools.

If improvements are made to the assembly or variant calls we ask people post those to Amazon in public EBS volumes and send a note to Medicinalgenomics@gmail.com so we can link to your improvements from our website.

Download the Assembly

We have made a preliminary assembly available via S3:

Access the Sequence

We have removed the direct download links for the fastq files in favor of the public EBS snapshot as a distribution mechanism for the C. Sativa genome. You can create your own EBS volume from the snapshot "snap-f8af5298", please see the public dataset page hosted by Amazon here. For more information on using an EBS volume snapshot please see the documentation provided by Amazon here.

Latest News

September 2, 2011

Please note, we have removed the direct download links. Use the public EBS snapshot to access the sequence data instead. You can get more information by going to the dataset page.

August 27, 2011

Amazon is now hosting the C. sativa genome snapshot as a public dataset. You can get more information by going to the dataset page. The snapshot ID is still "snap-f8af5298" and remains unchanged from the version linked to previously.

August 18, 2011

Today we posted the fastq files for download and made an EBS snapshot of the data available on Amazon.