I have created a simple workflow for Alfred 2 which makes it easy to create a new text file in the frontmost finder window. An alternative, more advanced workflow for Alfred 2 has also been created by Ian Isted.
Open Alfred 2, and type new followed by the name of the file. If you just type new, a file called ‘untitled.txt’ will be created.
Featured on Getting Genetics Done, this excel template helps you map 96 well templates to a single 384 well plate using Microsoft Excel.
Here is my first stab at models for django of the Chado database schema
I was having a tough time getting Tabix and Samtools installed on my mac – but found a very easy way to do it. You’ll have to install Homebrew and xcode.
xcode can be installed using the appstore.
Homebrew can be installed by copying and pasting the following into the terminal:
ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"
Next you type this:
brew install tabix
brew install samtools
And you are done. Homebrew has easy commands for symlinking these too. These details are mentioned when you install items.
ccmatch is used to randomly match cases and controls based on specified criteria. For instance, if you wanted to randomly match cases and controls based on age, you can use ccmatch to pair up people with the same age. You can use multiple variables to match based on multiple criteria.
ssc install ccmatch
ccmatch variable_list, cc( ) [id( )]
*specifying an id is optional
- variable_list The variable list are categorical or discrete variables you want to match on (example: age, sex, weight class, etc.).
- cc( ) Specify your case control variable here. 0=control; 1=case makes the most sense to me but it could be the reverse as well.
- id( ) (optional) Specify a variable you use as an ID and the match_id variable will be created and list the case/control partner.
ccmatch creates one to two variables:
- match – an integer shared by a case and control.
- match_id – Optional – the ID partner of the case control pair (specified in a separate variable).
The above output is an example of what match can do. The highlighted variables were created by ccmatch. The original data (name, case_control, age) is unchanged, except that it has been reordered. The command used was:
ccmatch age, id(name) cc(case_control)
Age was specified following ccmatch to indicate that we wanted to match cases/control who are the same age.
The case/control variable is specified as an option using cc( ), and the id of each individual is specified using id( ).
The Variant Caller Format developed by the 1000 genomes project makes it easy to filter and manage large amounts of variant information for a set of subjects.
STATA offers an easy interface for sorting, filtering, and manipulating large datasets. I have developed a tool, vcf that makes it easy to import .vcf files into Stata (no easy task!).
The program does two challenging things to prepare the file for Stata:
- It Splits the INFO column (delimited by ; ) into seperate columns. This is necessary because STATA has a string limit of 244 characters and truncates this column otherwise.
- It recodes genotypic data, showing the genotypes of each individual.
ssc install vcf
I have only tested with STATA 12/SE. I believe it will also work with STATA 11 and perhaps earlier.
vcf using "path/to/file.vcf"
- While it is possible to read in very large files – this program cannot handle enormous VCF Files. I have successfully loaded in files that are a few gigabytes. Therefore ideally you’ll filter enormous VCF Files prior to using this.
- If your VCF Files has more than 9 alternative alleles, this program will incorrectly assign alleles beyond the 9th alternative allele.
This program is still under development. I need your feedback – comments / suggestions / ideas. It has been tested with VCF Format 4.1 but not 4.0 or earlier.
dataplink is a simple program for importing recoded data from plink. Dataplink imports genotypic data from .ped files and also imports variable names (snp names) from .map files.
ssc install dataplink
Export from plink
Data from plink must be exported using the following commands:
- –recode OR –recode12
dataplink using "/path/to/file/without/extension"
When you specify the filename do not use extensions (i.e. do not add .ped or .map). Dataplink looks for a .map and .ped file of the same name.
STATA SE and MP flavors support a maximum of 32,767 variables while IC supports 2,047. This means you can only import ~32,000 SNPs with SE/MP or ~2,000 with IC.
With the release of STATA 12, users are allowed to install STATA across platform (linux, mac, windows) on up to three computers/user. If you frequently install/edit programs you can sync files from the ado directory, where programs are stored, across your computers using Dropbox.
Step 1: Install Dropbox
Go to Dropbox, signup for an account and install.
Step 2: Create an ado directory
Create the following directories within your dropbox directory:
- dropbox/ado/plus – For installed ado files.
- dropbox/ado/personal – For personal ado files
Step 3: Edit profile.do
profile.do is a file that runs every time you startup STATA. STATA will look in a variety of places for the file depending on your operating system. Type help profile for more information on where it is stored on your operating system.
This file needs to be created and edited on each system you want to sync. Here’s where you might store it on Mac and linux:
- Mac /Users/[your username]/Library/Application Support/Stata/ado/personal/profile.do
- Linux /bin/profile.do
Next you need to edit this file on each system to point at the appropriate dropbox directory. On Mac this might look like this:
sysdir set PERSONAL "~/Dropbox/ado/personal/"
sysdir set PLUS "~/Dropbox/ado/plus/"
Restart STATA and you should see something like this:
running /Users/Dan/Library/Application Support/Stata/ado/personal/profile.do
Dropbox does the rest – syncing across your systems. You can run an additional do file located in your dropbox folder if you want to centrally edit startup settings like setting memory or turning the ‘more’ option off.