PhytoREF database contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats.

Distribution and number of PhytoREF plastidial 16S rDNA sequences in the tree of eukaryotic life. The schematic phylogenetic tree is based on up-to-date phylogenomics and morphological evidence (Burki & Keeling 2014). Each plastid- containing eukaryotic lineage is highlighted in green, and the number of plastidial 16S rDNA sequences available in the PhytoREF database is indicated in small grey circles.

Taxonomic composition of the PhytoREF database at the class level. Bar charts represent the number of PhytoREF plastidial 16S rDNA sequences and taxonomically described families, genera and species that are present in a given class. Several key groups of microalgae lack full taxonomic description, such as the prasinophytes (clade VII) and the rappemonads. Streptophytes (land plants) that are represented by 2973 sequences (373 families and 796 genera) were not considered here for a better clarity.

Roscoff Culture Collection Centre National de la Recherche Scientifique Station Biologique de Roscoff Oceanomics Université Pierre et Marie Curie