Design of 240,000 orthogonal 25mer DNA barcode probes

Qikai Xu, Michael R. Schlabach, Gregory J. Hannon and Stephen J. Elledge

P.N.A.S., (2009)

Downloads

FASTA file of the 240K 25mer barcodes (ZIP archive)

Implementation of the Network Elimination Algorithm

Introduction

The Network Elimination Algorithm select a subset of non-connected vertices from a network. The algorithm is simple, however the implementation can be tricky because it will not be feasible to construct a network with large number of vertices in computer memory. Here we describe an implementation of this algorithm. The key is that the network is not initiated into computer memory, it is rather a “thought” network that actually exists in the BLAST output file.

When BLAST was run with the option set to “-m 8”, the output is in tabular format which list the query and target in the first two columns. A representative fraction of the BLAST output is shown below:

r1_115  r1_115          100.00  25      0       0       45      69      1       25      1e-07   50.1
r1_115  r1_507959       100.00  14      0       0       17      30      20      7       0.41    28.2
r1_115  r4_8034963      100.00  13      0       0       3       15      21      9       1.6     26.3
r1_115  r4_6475537      100.00  13      0       0       23      35      11      23      1.6     26.3
r1_115  r3_8812463      100.00  13      0       0       36      48      1       13      1.6     26.3
r1_115  r3_7963191      100.00  13      0       0       26      38      6       18      1.6     26.3
r1_115  r3_4728731      100.00  13      0       0       8       20      15      3       1.6     26.3
r1_115  r3_1563017      100.00  13      0       0       9       21      23      11      1.6     26.3
r1_115  r4_9915080      100.00  12      0       0       65      76      11      22      6.4     24.3
r1_115  r4_8881354      100.00  12      0       0       68      79      11      22      6.4     24.3
r1_115  r4_8427789      100.00  12      0       0       60      71      12      1       6.4     24.3
r1_115  r4_7539348      100.00  12      0       0       57      68      15      4       6.4     24.3
r1_370  r1_370          100.00  25      0       0       45      69      1       25      1e-07   50.1
r1_370  r4_6135221      100.00  14      0       0       41      54      3       16      0.41    28.2
r1_370  r3_5112696      100.00  14      0       0       60      73      6       19      0.41    28.2
r1_370  r1_507959       100.00  14      0       0       17      30      20      7       0.41    28.2
r1_370  r4_8960831      100.00  13      0       0       66      78      12      24      1.6     26.3
r1_370  r4_8034963      100.00  13      0       0       3       15      21      9       1.6     26.3

We first create two buckets, one for storage of orthogonal probes (the “keeper” bucket), the other for discarded non-orthogonal probes (the “trashbin”). We then read in the BLAST output file line by line. If it is a self match line (query and target are the same sequence) or the HSP length is below 13 bases, we discard the line and continue. Otherwise, we check the query in the keeper bucket, if it’s already there, we put the target in the trashbin and go for next line. Otherwise, we look it up in the trashbin, if it’s there we do nothing and go for next line, if not, we put the query in the keeper bucket and the target in the trashbin and continue on next line. This procedure is shown in the flow chart below:

The algorithm can be easily implemented in PERL or any other programming languages. With this implementation, it only takes minutes to run the select-and-eliminate procedure on even low-end computers.