Our Work

Updated April 8, 2002

All of these tools are written in perl. Many of the scripts have module dependencies, which are listed at the ends of the descriptions. They have all been tested under Win2K Pro running ActiveState Perl (build 631). I have also packaged the scripts into windows executables using perlapp of the Perl Developer's Kit 4.0. The executables already have the modules prepackaged, and thus work as standalone programs. Most of the scripts should run on any system which supports Perl and the required modules.

All of the software is copyrighted under the terms of the GNU General Public License. You can view this license at http://www.gnu.org/licenses/gpl.txt. If you have not read my disclaimer yet, please do so here.

Please e-mail cckim47@gmail.com with any bugs, suggestions, and success stories (encourages me to maintain and write more programs!).

Microarray

A brief summary of file formats used in the following programs.
Updated 3/25/03
AACK : Add Annotation

Adds annotation from a file to a PCL (pre-clustering) data file. The annotation file requires an ID and Name column. The ID's are read from the first column of the PCL file, and the annotations are added to the second column. Does not affect the data. Intended for situations in which a different annotation is desired, such as when an annotation is published for a microarray constructed pre-annotation.
Requires: Tk
Current Version: 1.0
Last Update: 3/25/02
Source
Windows Executable
CCACK : Constant Cutoff Analysis

This tool takes an input PCL or CDT (pre- or post-clustering) file and converts all values to a binary scale. The cutoff is user-defined. Primarily intended for genomotyping analysis. Also see GACK below.
Requires: Tk
Current Version: 1.0
Last Update: 3/25/02
Source
Windows Executable
FLICK : File Linker

This tool takes multiple PCL files and creates a single aggregate PCL file. Intended for joining datasets which have been created on separate occassions, and which may have different ID values (normally requiring the creation of a database to join the data fields).
Requires: Tk
Current Version: 1.1
Last Update: 3/25/02
Source
Windows Executable
FRICK : Filter/Retrieve IDs

Load a list of IDs, and filter or retrieve the associated data from a .pcl or .cdt file. Version 1.0 (5/14/02) Retrieve IDs counted improperly; fixed in version 1.1.
Requires: Tk
Current Version: 1.1
Last Update: 6/18/02
Source
Windows Executable
GACK : Genomotyping Analysis

Dynamically chooses cutoffs for grouping into present/divergent genes based on the shape of the distribution. More resistant to variation in hybridization data than CCACK.
If you use GACK, please cite:
Kim CC, Joyce EA, Chan K, Falkow S.
Improved analytical methods for microarray-based genome-composition analysis.
Genome Biol. 2002 Oct 29;3(11):RESEARCH0065.
Requires: Tk, GD, POSIX, Math-Round, File-Basename, Cwd
Current Version: 3.631
Last Update: 2/13/02
Publication
Source
Windows Executable
Manual

I received an e-mail from Adam Witney indicating that GACK works on MacOS X with a few modifications. Here are his notes.
GODACK : Good Data

A simple filtering tool which demands that a certain percentage of datapoints be present in a PCL file. Spots which fail to contain the user-specified percentage of good data points are removed from the dataset.
Requires: Tk
Current Version: 1.0
Last Update: 3/25/02
Source
Windows Executable
HIMACK : Histogram Maker

This tool takes an input PCL or CDT file and generates a graphical histogram file in JPEG format. Allows rapid stimultaneous viewing of histograms for multiple datasets for quality assessment. Primarily intended for DNA and RNA hybridizations.
Requires: Tk, GD, Math-Round, POSIX
Current Version: 1.4
Last Update: 3/25/02
Source
Windows Executable
LACK : Lexical Analysis

I often do a SAM or Cluster analysis and see a common theme in some of the significant genes. However, I'm usually not sure if it's really overrepresented, or just a product of observer bias. This program addresses whether or not a theme is actually overrepresented in your significant genes list. The program takes a list of significant genes and a list of user-specified search terms, and counts the number of genes which contain one of the search terms. Then, the program takes a random set of genes of the same size as the significant set from a genome annotation file and counts the hits. This process is repeated a user-specified number of times so that statistics regarding the randomness of the frequencies can be calculated. Statistics and histogram data is output to a text file. NOTE: PLEASE DO NOT USE VERSIONS OF THIS SOFTWARE OLDER THAN VERSION 2.02. THE STATISTICAL ANALYSIS HAS BEEN CHANGED.

If you use LACK, please cite:
Kim CC, Falkow S.
Significance analysis of lexical bias in microarray data.
BMC Bioinformatics. 2003 Apr 3;4(1):12.
Requires: Tk, Statistics-Descriptive, Math::BigInt
Current Version: 4.3
Last update: 04/20/06
Source
Windows Executable
v4.3 Note: I've updated some functionality to handle Mac input better.
v4.2 Note: this version has several new functionalities, including the ability to search for a list of terms individually, as well as the automated word list generation previously available as automated LACK (see below). The calculations for the binomial statistics have also been partially optimized to dramatically improve speed of execution. The software has performed well with all of my test files, but please note that testing has been limited. Feedback is appreciated.

Previous Version: 3.1
Last Update: 01/24/05
Source
Windows Executable
Manual
Sample files (zipped)

ALACK: Automated LACK
This is an automated version of LACK which does not require advance generation of a word list. However, this version is limited to single-word analysis; LACK must be used for multiple search-term analyses. NOTE THAT THIS SOFTWARE HAS BEEN INTEGRATED INTO LACK 4.2 ABOVE.
Previous Version: 0.1
Last Update: 03/23/03
Perl Source
Windows Executable
NACK : Name Averaging

Averages data values for a PCL file, but only if the Name (second) column is identical. Intended for averaging data values when multiple spots are present on a single array for a given gene.
Requires: Tk, File::Basename
Current Version: 1.2
Last Update: 3/25/02
Source
Windows Executable
REDUCK : Remove Duplicates

Removes duplicate lines from a PCL file. Actually, it removes duplicate lines from any text file, but was intended for use with PCL files. Does exactly what it says; if the lines are not 100% identical, they are not removed.
Requires: Tk
Current Version: 1.0
Last Update: 3/25/02
Source
Windows Executable
Samster : SAM to Cluster

You've done a SAM analysis and have your lists of significantly upregulated and downregulated genes. You now want a more visual representation, or you want to see if there is even more detailed substructure within these genes by using Cluster. Samster will take an Excel spreadsheet or text files and extract the raw data into a text output file, which can be fed directly into Cluster or opened in Treeview. This program circumvents the need to create databases each time you wish to accomplish this task.
Update: Version 1.4 did not work with Cluster 3. It has been updated and some additional minor bugs have been fixed; the newer version 1.5 works with both Eisen's Cluster and Cluster 3.

If you use SAMster, please cite:
Mueller A, O'Rourke J, Chu P, Kim CC, Sutton P, Lee A, Falkow S.
Protective immunity against Helicobacter is characterized by a unique transcriptional signature.
Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12289-94.
Requires: Tk, Spreadsheet::ParseExcel::Simple

Current Version: 1.5 (For SAM versions prior to 2.0)
Last Update: 2/19/04
Source
Windows Executable
Manual
Sample input file (Excel format)

Current Version: 2.0 (For SAM version 2.0)
Last Update: 6/21/05
Source
Windows Executable

Pathogenesis Tools

COP : Competition Plotter

One of the measures of virulence of different strains of an organism is the competitive index. In this model, a mixed infection is performed in a single host with the assumption that the more fit strain will outperform the other. The current standard is to report competitive index (CI), the ratio of recovered colony forming units of one strain to the other. While this information is informative, it discards overall organ-load information due to the one-dimensional nature of the CI. I've developed a special plot type and a tool, COP, for generation of plots which preserve this information. Paired t-test statisitcs are calculated.
Requires: Tk, Tk::NumEntry, Statistics::Descriptive, Statistics::Distributions
Current version: 0.3
Last update: 4/1/03
Source
Windows Executable
Sample data file
LD50 Calculators

Input a tab-delimited text file of surviving mice. LD50 values calculated by Reed-Muench (command-line interface only) or Moving Average Interpolation (graphical interface).

If you use the LD50 calculators, please cite:
Kim CC, Monack D, Falkow S.
Modulation of virulence by two acidified nitrite-responsive loci of Salmonella enterica serovar Typhimurium.
Infect Immun. 2003 Jun;71(6):3196-205.
Last update: 3/25/03
Reed-Muench source
Reed-Muench Windows executable
MAI source
MAI Windows executable
Sample input file

Molecular Biology

DRACK : Differential Restriction Analysis

This program takes up to 6 FASTA DNA sequence files as input and outputs a tab-delimited text file containing sizes of restriction fragments (which can be opened in a spreadsheet program). Several options are available, including 4-6 base cutters and running in differential or list-all cutters mode. Differential analysis will analyze the sequences and only output restriction enzymes which distinguish between the sequences, while listing all cutters will list even those cutters which do not distinguish between the sequences. This program's primary purpose was to automatically choose sites to distinguish plasmid clones with an insert in two possible orientations.
Requires: Tk, Bioperl
Current Version: 3.1
Last Update: 3/25/02
Source
Windows Executable

Bioinformatics

FOCK : Frequency of Oligos

Analyzes a FASTA sequence for n-mer frequency (specified by user). Useful for identifying common or rare restriction sites, etc. Command-line interface.
Requires: Bioperl, File::IO
Current version: 1.0
Last update: 11/14/04
Source
Windows Executable (Recompiled 11/14/04)
LOCK : Locater of Oligos

Locates user-specified oligonucleotide patterns within a larger sequence. For example, restriction digest sites and fragment sizes can be determined for a plasmid or genome.
Requires: Tk, Win32::FileOp, Statistics::Descriptive, GD
Current version: 1.2
Last update: 11/14/04
Source
Windows Executable
Motif Search

A simple motif searcher. No matrices, nothing fancy, just simple searching for base strings. Allows degenerate bases to be used.
Requires: Tk, Bioperl
Current Version: 0.2
Last Update: 10/17/03
Source
Windows Executable NOTE: This was not working, but has now been updated and tested 1/22/04

Falkow Main Page
Webmaster

Microarray

AACK : Add Annotation

CCACK : Constant Cutoff Analysis

FLICK : File Linker

FRICK : Filter/Retrieve IDs

GACK : Genomotyping Analysis

GODACK : Good Data

HIMACK : Histogram Maker

LACK : Lexical Analysis

NACK : Name Averaging

REDUCK : Remove Duplicates

Samster : SAM to Cluster

Pathogenesis Tools

COP : Competition Plotter

LD50 Calculators

Molecular Biology

DRACK : Differential Restriction Analysis

Bioinformatics

FOCK : Frequency of Oligos

LOCK : Locater of Oligos

Motif Search