The recently released draft horse genome is incompletely characterised in terms of its repetitive element profile. This paper presents characterisation of the endogenous retrovirus (ERVs) of the horse genome based on a data-mining strategy using murine leukaemia virus proteins as queries. 978 ERV gene sequences were identified. Sequences were identified from the gamma, epsilon and betaretrovirus genera. At least one full length gammaretroviral locus was identified, though the gammaretroviral sequences are very degenerate. Using these data the RNA expression of these ERVs were derived from RNA transcriptome data from a variety of equine tissues. Unlike the well studied human and murine ERVs there do not appear to be particular phylogenetic groups of equine ERVs that are more transcriptionally active. Using this novel approach provided a more technically feasible method to characterise ERV expression than previous studies.
Highlights
► A large number of epsilon like ERVs were identified in the horse genome via data mining. ► Further gamma and beta retroviral ERV groups were also identified. ► At least one full length gamma retroviral locus was identified. ► RNA expression of ERVs was quantified using RNA transcriptome data. ► Phylogenetic analysis did not show clustering of transcriptionally active viruses.