04-04-2013

Create New File in Finder with Alfred 2


I have created a simple workflow for Alfred 2 which makes it easy to create a new text file in the frontmost finder window. An alternative, more advanced workflow for Alfred 2 has also been created by Ian Isted.

Usage

Open Alfred 2, and type new followed by the name of the file. If you just type new, a file called ‘untitled.txt’ will be created.

Screen Shot 2013-04-04 at 11.32.29 AM

Screen Shot 2013-04-04 at 11.07.18 AM

Download

03-06-2013

Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate


Featured on Getting Genetics Done, this excel template helps you map 96 well templates to a single 384 well plate using Microsoft Excel.

Download

Download

(more…)

01-09-2013

Django models for Chado


Here is my first stab at models for django of the Chado database schema

(more…)

12-27-2012

Install Tabix and Samtools on Mac


I was having a tough time getting Tabix and Samtools installed on my mac – but found a very easy way to do it. You’ll have to install Homebrew and xcode.

xcode can be installed using the appstore.

Homebrew can be installed by copying and pasting the following into the terminal:

ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"

Next you type this:

brew install tabix
brew install samtools

And you are done. Homebrew has easy commands for symlinking these too. These details are mentioned when you install items.

12-19-2012

ccmatch


ccmatch is used to randomly match cases and controls based on specified criteria. For instance, if you wanted to randomly match cases and controls based on age, you can use ccmatch to pair up people with the same age. You can use multiple variables to match based on multiple criteria.

Installation

ssc install ccmatch

Syntax

ccmatch variable_list, cc( ) [id( )]

*specifying an id is optional

  • variable_list The variable list are categorical or discrete variables you want to match on (example: age, sex, weight class, etc.).
  • cc( ) Specify your case control variable here. 0=control; 1=case makes the most sense to me but it could be the reverse as well.
  • id( ) (optional) Specify a variable you use as an ID and the match_id variable will be created and list the case/control partner.

ccmatch creates one to two variables:

  • match – an integer shared by a case and control.
  • match_id – Optional – the ID partner of the case control pair (specified in a separate variable).

Example

match_id match name case_control age
a6 1 a2 0 15
a2 1 a6 1 15
a7 2 a4 0 16
a4 2 a7 1 16
a8 3 a5 0 17
a5 3 a8 1 17
a10 4 a1 0 19
a1 4 a10 1 19
. a3 0 15
. a9 1 18

The above output is an example of what match can do. The highlighted variables were created by ccmatch. The original data (name, case_control, age) is unchanged, except that it has been reordered. The command used was:

ccmatch age, id(name) cc(case_control)

Age was specified following ccmatch to indicate that we wanted to match cases/control who are the same age.

The case/control variable is specified as an option using cc( ), and the id of each individual is specified using id( ).

12-19-2012

vcf


The Variant Caller Format developed by the 1000 genomes project makes it easy to filter and manage large amounts of variant information for a set of subjects.

STATA offers an easy interface for sorting, filtering, and manipulating large datasets. I have developed a tool, vcf that makes it easy to import .vcf files into Stata (no easy task!).

The program does two challenging things to prepare the file for Stata:

  1. It Splits the INFO column (delimited by ; ) into seperate columns. This is necessary because STATA has a string limit of 244 characters and truncates this column otherwise.
  2. It recodes genotypic data, showing the genotypes of each individual.

Installation

ssc install vcf

Requirements

I have only tested with STATA 12/SE. I believe it will also work with STATA 11 and perhaps earlier.

Usage

vcf using "path/to/file.vcf"

Limits

  1. While it is possible to read in very large files – this program cannot handle enormous VCF Files. I have successfully loaded in files that are a few gigabytes. Therefore ideally you’ll filter enormous VCF Files prior to using this.
  2. If your VCF Files has more than 9 alternative alleles, this program will incorrectly assign alleles beyond the 9th alternative allele.

Important!

This program is still under development. I need your feedback – comments / suggestions / ideas. It has been tested with VCF Format 4.1 but not 4.0 or earlier.

12-18-2012

dataplink


Description

dataplink is a simple program for importing recoded data from plink. Dataplink imports genotypic data from .ped files and also imports variable names (snp names) from .map files.

Installation

ssc install dataplink

Usage

Export from plink

Data from plink must be exported using the following commands:

  • –recode OR –recode12
  • –tab

Syntax

dataplink using "/path/to/file/without/extension"

Important!

When you specify the filename do not use extensions (i.e. do not add .ped or .map). Dataplink looks for a .map and .ped file of the same name.

Limits

STATA SE and MP flavors support a maximum of 32,767 variables while IC supports 2,047. This means you can only import ~32,000 SNPs with SE/MP or ~2,000 with IC.

12-18-2012

Sync STATA programs and settings with Dropbox


With the release of STATA 12, users are allowed to install STATA across platform (linux, mac, windows) on up to three computers/user. If you frequently install/edit programs you can sync files from the ado directory, where programs are stored, across your computers using Dropbox.

Step 1: Install Dropbox

Go to Dropbox, signup for an account and install.

Step 2: Create an ado directory

Create the following directories within your dropbox directory:

  1. dropbox/ado
  2. dropbox/ado/plus – For installed ado files.
  3. dropbox/ado/personal – For personal ado files

Step 3: Edit profile.do

profile.do is a file that runs every time you startup STATA. STATA will look in a variety of places for the file depending on your operating system. Type help profile for more information on where it is stored on your operating system.

This file needs to be created and edited on each system you want to sync. Here’s where you might store it on Mac and linux:

  • Mac /Users/[your username]/Library/Application Support/Stata/ado/personal/profile.do
  • Linux /bin/profile.do

Next you need to edit this file on each system to point at the appropriate dropbox directory. On Mac this might look like this:

*! profile.do
sysdir set PERSONAL "~/Dropbox/ado/personal/"
sysdir set PLUS "~/Dropbox/ado/plus/"

Restart STATA and you should see something like this:

running /Users/Dan/Library/Application Support/Stata/ado/personal/profile.do 

Dropbox does the rest – syncing across your systems. You can run an additional do file located in your dropbox folder if you want to centrally edit startup settings like setting memory or turning the ‘more’ option off.