Título Tesis: "Compressed Suffix Tree for Repetitive Collections
based on Block Trees".
Resumen:
Suffix Trees which are capable of solving many problems in stringology. However, a serious problem about these trees is their space usage. Carefully engineered implementations require at least 10 bytes per symbol, something that some applications cannot afford.
A promising line of research to deal with this problem is the construction of compact representations, named Compressed Suffix Trees. Practical implementations based on these ideas face a well known time/space trade-off, some of them achieving as little as ∼ 5 bits per character (less than 1 byte). Despite the progress done, these implementations are not enough to process huge amounts of data, due to their space usage or time complexity. Fortunately, many of the longest text collections are highly repetitive; for example DNA sequences of multiple people differ in less than 0.5% of their content.
This work will be dedicated to adapt and implement Block Trees to use them in new implementations of Compressed Suffix Trees. The Block Tree is a relatively new Lempel-Ziv based index that has shown good results in terms of space and access time on highly repetitive text collections.