The Positional Burrows-Wheeler Transform (PBWT) is a data structure designed for efficiently representing and querying large collections of sequences, such as haplotype panels in genomics. Forward and backward stepping operations - analogues to LF- and FL-mapping in the traditional BWT - are fundamental to the PBWT, underpinning many algorithms based on the PBWT for haplotype matching and related analyses. Although the run-length encoded variant of the PBWT (also known as the μ-PBWT) achieves O(r̃)-word space usage, where r̃ is the total number of runs, no data structure supporting both forward and backward stepping in constant time within this space bound was previously known. In this paper, we consider the multi-allelic PBWT that is extended from its original binary form to a general ordered alphabet {0, … , σ-1}. We first establish bounds on the size r̃ and then introduce a new O(r̃)-word data structure built over a list of haplotypes {S_1, … , S_h}, each of length w, that supports constant-time forward and backward stepping. We further revisit two key applications - haplotype retrieval and prefix search - leveraging our efficient forward stepping technique. Specifically, we design an O(r̃)-word space data structure that supports haplotype retrieval in O(log log_w h + w) time. For prefix search, we present an O(h + r̃)-word data structure that answers queries in O(m' log log_w σ + occ) time, where m' denotes the length of the longest common prefix returned and occ denotes the number of haplotypes prefixed the longest prefix.

Bonizzoni, P., Cozzi, D., Gao, Y. (2026). Optimal-Time Mapping in Run-Length Compressed PBWT. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026) (pp.1-18).

Optimal-Time Mapping in Run-Length Compressed PBWT

Bonizzoni, P
;
Cozzi, D
;
Gao, Y
2026

Abstract

The Positional Burrows-Wheeler Transform (PBWT) is a data structure designed for efficiently representing and querying large collections of sequences, such as haplotype panels in genomics. Forward and backward stepping operations - analogues to LF- and FL-mapping in the traditional BWT - are fundamental to the PBWT, underpinning many algorithms based on the PBWT for haplotype matching and related analyses. Although the run-length encoded variant of the PBWT (also known as the μ-PBWT) achieves O(r̃)-word space usage, where r̃ is the total number of runs, no data structure supporting both forward and backward stepping in constant time within this space bound was previously known. In this paper, we consider the multi-allelic PBWT that is extended from its original binary form to a general ordered alphabet {0, … , σ-1}. We first establish bounds on the size r̃ and then introduce a new O(r̃)-word data structure built over a list of haplotypes {S_1, … , S_h}, each of length w, that supports constant-time forward and backward stepping. We further revisit two key applications - haplotype retrieval and prefix search - leveraging our efficient forward stepping technique. Specifically, we design an O(r̃)-word space data structure that supports haplotype retrieval in O(log log_w h + w) time. For prefix search, we present an O(h + r̃)-word data structure that answers queries in O(m' log log_w σ + occ) time, where m' denotes the length of the longest common prefix returned and occ denotes the number of haplotypes prefixed the longest prefix.
paper
PBWT, LF-Mapping, prefix searches, run-length encoding
English
37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026) - June 15-17, 2026
2026
Bille, P; Prezza, N
37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)
9783959774208
2026
1
18
Cap. 22
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.22
open
Bonizzoni, P., Cozzi, D., Gao, Y. (2026). Optimal-Time Mapping in Run-Length Compressed PBWT. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026) (pp.1-18).
File in questo prodotto:
File Dimensione Formato  
Bonizzoni et al-2026-CPM-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 980.37 kB
Formato Adobe PDF
980.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611863
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact