I was trying to create an algorithm that would take the list of motifs/regular expressions and extract redundant sections out of them.

I think it is easier to explain this with an example:

The regular expressions I am dealing with are of the form:

1. [A B C] [B D] [A C E] [B D E]

2. [A C] [B C] [C E] [B D F]

3. [A B E] [D] [A C D E] [B C E]

Although these motifs are different from each other if one tries to enumerate all possible instances there are overlapping instances e.g.

1 can create ABDB so does 2.

One way to clean the motifs so that they do not overlap at all is to take out intersections of motifs in an algorithm like this:

1. Start from first motif.

2. Find the intersection of the first and second motifs and add the difference between this and the second motif to the list.

3. Add the third motif to the list by adding the difference of 1 and 3's intersection from 3 and then the difference of this newly added element from the intersection of the second element and the newly created third element. So we will have now four elements in our list.

This sounded good at the beginning until I realized that in this fashion I would need $$2^{n-1}$$ elements for and original n number of motifs and it is not feasible. So I need to think about something else.

## Labels

- Bioinformatics (13)
- blog (78)
- General Site (28)
- Ideas (1)
- Multiscale Analysis (1)
- Network Analysis (2)
- Pattern Recognition (2)
- Peptides (4)
- Presentation (1)
- Projects (1)
- Rant (1)
- Research (1)
- Resources (17)
- Technology (4)

## 0 comments:

## Post a Comment