Motivation

The analysis of bacterial isolates to detect plasmids is important due to their role in the propagation of antimicrobial resistance. In short-read sequence assemblies, both plasmids and bacterial chromosomes are typically split into several contigs of various lengths, making identification of plasmids a challenging problem. In plasmid contig binning, the goal is to distinguish short-read assembly contigs based on their origin into plasmid and chromosomal contigs and subsequently sort plasmid contigs into bins, each bin corresponding to a single plasmid. Previous works on this problem consist of de novo approaches and reference-based approaches. De novo methods rely on contig features such as length, circularity, read coverage, or GC content. Reference-based approaches compare contigs to databases of known plasmids or plasmid markers from finished bacterial genomes.

Results

Recent developments suggest that leveraging information contained in the assembly graph improves the accuracy of plasmid binning. We present PlasBin-flow, a hybrid method that defines contig bins as subgraphs of the assembly graph. PlasBin-flow identifies such plasmid subgraphs through a mixed integer linear programming model that relies on the concept of network flow to account for sequencing coverage, while also accounting for the presence of plasmid genes and the GC content that often distinguishes plasmids from chromosomes. We demonstrate the performance of PlasBin-flow on a real dataset of bacterial samples.

Availability and implementation

https://github.com/cchauve/PlasBin-flow.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.