The data provides an example for PacBio CCS data, which was used in our data analysis, software and algorithm development.
It is from a project of de novo sequencing for MSG isoforms in human pneumocystis. Pnuemocystis is a fungus which could infect people with depressed immunity, such as cancer patient, AIDS patient. It affects much the life quality and shortens the life time of those patients. MSG is the major surface glycoprotein located on the surface of pneumocystis. MSG is believed to be very critical in pneumocystis invading hosts. Yet people don't know much about MSG gene. That is why we wanted to sequence MSG.
There are about 50 to 100 copies of MSG gene on pneumocystis genome. The similarity among them is in the range of 80~90%. The high similarity hurdled assembly very much when we tried other short-read sequencing platforms. PacBio RS can produce long accurate read, which greatly facilitates our MSG sequencing.
The example data contains the mixture of 10 known 1.5kb known MSG fragments, which was used as a control in our PacBio sequencing. Researchers could use it to test our QC software and the upcoming algorithm for quality improving.