Skip to main content

Convergent evolution of type I antifreeze proteins from four different progenitors in response to global cooling

Abstract

Alanine-rich, alpha-helical type I antifreeze proteins (AFPs) in fishes are thought to have arisen independently in the last 30 Ma on at least four occasions. This hypothesis has recently been proven for flounder and sculpin AFPs, which both originated by gene duplication and divergence followed by substantial gene copy number expansion. Here, we examined the origins of the cunner (wrasse) and snailfish (liparid) AFPs. The cunner AFP has arisen by a similar route from the duplication and divergence of a GIMAP gene. The coding region for this AFP stems from an alanine-rich region flanking the GTPase domain of GIMAPa. The AFP gene has remained in the GIMAP gene locus and has undergone amplification there along with the GIMAPa gene. The AFP gene originated after the cunner diverged from its common ancestor with the closely related spotty and ballan wrasses, which exhibit similar gene synteny but lack AFP genes. Snailfish AFPs have also recently evolved because they are confined to a single genus of this family. In these AFP-producing species, the AFP locus does not share any similarity to functional genes. Instead, it is replete with repetitive DNAs and transposons, several stretches of which could encode alanine tracts with a dominant codon (GCC) that matches the bias observed in the AFP genes. All four known instances of type I AFPs occurring in fishes are independent evolutionary events that occurred soon after the onset of Northern Hemisphere Cenozoic glaciation events. Collectively, these results provide a remarkable example of convergent evolution to one AFP type.

Peer Review reports

Introduction

Ice-binding proteins (IBPs) share a common ligand, namely ice, but have a variety of functions, including anchoring to ice, controlling the growth of ice channels, preventing recrystallization of ice in the frozen state, or preventing internal ice growth in freeze-intolerant organisms. They are found in a small percentage of known species scattered throughout the tree of life, including bacteria, diatoms, insects, plants and fish [1, 2]. When they are used to prevent freezing, they are generally called antifreeze proteins (AFPs), and they act by irreversibly binding to ice crystals, lowering their non-equilibrium freezing point [3, 4].

Fish living in ice-laden regions of the ocean often employ AFPs since they freeze at subzero temperatures higher than that of seawater [5]. To date, four types of AFPs (types I, II, III, and IV) and one antifreeze glycoprotein (AFGP) have been described in fish [6]. Type III AFP is a globular protein derived from the C-terminal domain of sialic acid synthase [7,8,9], and it is found exclusively in the infraorder Zoarcales [10]. Type II AFP is also globular, but it is derived from a C-type lectin [11, 12]. It is found in three separate fish orders that diverged from each other more than 200 Ma and was gained in two of these orders through lateral gene transfer [13, 14]. Type IV was identified in longhorn sculpin serum and shows similarity to apolipoproteins that form helical bundles [15]. However, its serum concentration is insufficient for freeze protection [16], and type I AFPs were subsequently found in the skin of this fish [17]; therefore, its function may be unrelated to freeze protection.

In contrast to the AFPs described above, type I AFP and AFGP are non-globular. AFGPs are extremely repetitive, containing between 4 and ~ 50 tripeptide repeats, mostly Ala-Ala-Thr, with an O-linked disaccharide on each Thr residue [18]. It adopts a poly-proline type II structure in solution [19] and is found in fishes at both poles. The AFGP gene of Antarctic fishes in the suborder Notothenioidei was derived from a duplicated trypsinogen gene [20]. Here, the bulk of the gene was lost, but the signal peptide and 3ʹ UTR were retained, along with a 9 bp segment encoding Ala-Ala-Thr that spanned the start of the second exon that was amplified many times. Northern cods from suborder Gadoidei were found to produce similar AFGPs [21], but in this group, they arose from non-coding DNA [22]. Type I AFPs are Ala-rich α-helical proteins that are somewhat less repetitive than AFGPs. Many of these proteins have repeats that are 11 a.a. in length, delineated by a single non-glycosylated Thr residue [23, 24]. The Ala residues that dominate one side of the helix are well conserved, but residues on the other side are more variable. Like AFGPs, they vary in length, from 33 to 195 a.a [24, 25]. They are found in three different orders and four different families of fish: the flounders (order Pleuronectiformes) [26,27,28], cunner (Tautogolabrus adspersus, order Labriformes, family Labridae) [29], snailfish (Liparis atlanticus and L. gibbus, order Perciformes, family Liparidae) [30] and a number of sculpins, including shorthorn sculpins (Myoxocephalus scorpius, family Cottidae) [17, 24, 31, 32] (Fig. 1). The shorter type I variants are less active and form isolated helices, often with N- and C-terminal modifications [33, 34], whereas the longest, hyperactive variant (Maxi) folds in half and dimerizes to form a four-helix bundle stabilized by internal waters [35].

Fig. 1
figure 1

Divergence of AFP-producing fishes during changing climatic conditions. The relationships between the fishes or fish groups shown, all within the clade Percomorphaceae, were obtained from a time-calibrated phylogeny of almost 2000 fishes [84] or more than 200 labrids [85]. While the majority of the species within this clade do not produce AFPs, only those examined in this study are included (black font). Cartoon graphics of the AFP types generated using PyMOL [87] are shown along each branch. The timing and intensity of the warmest (red) and coldest (blue) climactic periods of the last 125 Ma are indicated above the time scale [83]

Type I AFPs may have pro-peptides and/or signal peptides, or they may have neither. All three variations are found in flounder AFPs [25, 26, 36], whereas cunner, sculpin and snailfish AFPs lack signal peptides and are still secreted [17, 29, 30, 32]. Most of these AFPs have Thr residues at 11 a.a. intervals, while those from snailfish [30] and one from shorthorn sculpin [37] do not. Interestingly, the skin isoforms of flounder are more similar to sculpin sequences than to either the liver or maxi isoforms. However, the patchy taxonomic distribution of type I-producing fishes (Fig. 1), lack of similarity between untranslated regions (UTRs) and differential codon usage led us to suggest that these proteins were not homologous and that their similarities arose through convergence [38].

Phenotypic similarities frequently arise independently, often but not exclusively, when similar environmental challenges are encountered, and Michael P. Speed and Kevin Arbuckle argued that “analyses of convergence should typically be paired with broader investigations of the evolutionary history of the trait” [39]. This has now been rectified for type I AFPs, as the last piece of the puzzle necessary to unequivocally demonstrate their convergence in these four fish groups, namely, a demonstration of how they arose, has been achieved for all of the lineages. We recently determined that the flounder AFP arose from Gig2 [40], which encodes a protein involved in viral resistance [41]. Subsequently, we found that the sculpin AFP arose from lunapark [42], a gene whose protein resides within the endoplasmic reticulum and stabilizes membrane junctions in this organelle [43]. Here, we demonstrated that the cunner AFP also arose from a different pre-existing gene, confirming its origin via convergent evolution. The snailfish AFP gene does not resemble any other gene locus. Instead, the coding sequence is tightly flanked by transposons from which it likely originated.

Methods

BLAST searches and databases

Known type I AFP sequences (nucleotide and protein) were used as queries at the BLAST interface of NCBI [44], limiting the taxonomic range to teleost fishes. Low-complexity filters, masks and/or compositional adjustments were turned off, and moderate stringency was selected (BLASTN discontiguous megablast, or BLASTP with the BLOSUM45 matrix) to detect more divergent sequences. High-throughput genomic and transcriptomic sequence datasets were accessed through the NCBI SRA portal (https://www.ncbi.nlm.nih.gov/sra), and genome assemblies were accessed through the genome portal (https://www.ncbi.nlm.nih.gov/genome/).

Analysis of cunner and snailfish AFP sequences

A 250-kb segment of chromosome 4 that contained AFP genes was downloaded from NCBI (GenBank JAJGRF010000003.1, 6,121,000 to 6,371,000 bp) from the representative genome of cunner (GCA_020745675.1) [45]. Gene annotation was performed in SnapGene Viewer (from Insightful Science; available at snapgene.com) using BLASTn to identify AFP coding sequences and GeneWise [46] to identify AFP-adjacent genes based on homologs in the annotated genomes of the closely related ballan wrasse (Labrus bergylta GCA_900080235.1, scaffold NW_018114954.1, 83,944 to 184,596 bp) [47] and New Zealand spotty (Notolabrus celidotus GCA_009762535.1, chromosome 7, NC_048278.1, 5,367,202 to 5,491,088 bp). Data use policy: https://genome10k.ucsc.edu/data-use-policies/.

A similar process as above was used with an unannotated 70 kb segment of chromosome 11 from Tanaka’s snailfish (JAYMGU010000011.1, 3,810,001 to 3,880,000 bp) from assembly GCA_036178185.1 [48] which was compared to a 50 kb annotated segment of chromosome 8 from the hadal snailfish (NC_079395, 17,210,001 to 17,260,000 bp) from assembly GCF_029220125.1 [48]. Another segment of 86.7 kb (JBEEID010000351.1, 1861 to 88,524) from the long-read dusky snailfish assembly (GCA_040955725.1) was annotated [49]. Repetitive elements within these sequences, found > 30 times within any single Cottid genome, were identified using BLASTn searches. The majority were correlated with known transposable elements using the Dfam sequence search tool, which was also used to identify low-complexity sequence (https://www.dfam.org/search/sequence) [50] The online tool ‘YAPP Eukaryotic Core Promoter Predictor’ at http://www.bioinformatics.org/yapp/cgi-bin/yapp.cgi was used to predict promoter elements.

Phylogenetic comparisons of the GIMAP sequences

The coding sequences of the GIMAP genes of the ballan wrasse were verified or reannotated based on matching reads from transcriptomic sequences (accession SRR5454465) from the NCBI Sequence Reads Archive (SRA). The spotty GIMAP genes were reannotated, when necessary, by comparison with teleost GIMAPs from the non-redundant (nr) protein database. The sequences of the GTPase domains of the GIMAPs of the cunner, spotty and ballan wrasses were aligned in SeaView version 5.0.1 [51] and are shown in Supplementary Fig. 1. A maximum likelihood phylogenetic tree was generated in MEGA 10.1.8 [52] using the JTT G + I model with 100 bootstrap replicates.

Derivation of codon usage statistics

The frequency at which Ala codons were used in various sequences was determined using the online Codon Usage Calculator from Biologics International Corp. (Indianapolis, IN, USA, https://www.biologicscorp.com/tools/CodonUsageCalculator/). The sequences used for cunner were the combined coding sequences of the 11 AFP isoforms as well as the segment encoding the Ala-rich C-terminal region of GIMAP-a5 (56 a.a.). For the ballan wrasse, the sequence encoding the last 114 a.a. of GIMAP-a1 was used. For snailfish, the seven sequences encoding the AFPs shown in Fig. 6 were used, as well as the consensus sequence of a Danio rerio copia element DF000003478.1 [50] and an intronic region with Ala-coding potential shown in Fig. 5C (JBEEID010000351 bases 79,339 to 80166) [49]. Ala codon usage in teleost fishes was taken from the CoCoPUTs database [53], in which 5,376,783 teleost coding sequences had been analyzed as of March 3, 2022 (https://dnahive.fda.gov/dna.cgi?cmd=codon_usage&id=537&mode=cocoputs).

Fig. 2
figure 2

AFP locus in cunner compared to the syntenic location in two other wrasses. A) The cunner AFP locus on chromosome 4, showing the locations of the 11 AFP genes (cyan) relative to the GIMAPa genes (dark yellow) and GIMAPb genes (red). The arrows indicate gene orientation (all AFP genes and GIMAPa genes are transcribed left to right) and span the coding region and intron(s). Flanking genes are numbered consecutively in gray. The scale bar shows a 20-kb stretch of DNA. The GenBank accession number for the region shown is JAJGRF010000003.1, bases 6,121,000 to 6,371,000. B) Ballan wrasse locus colored as above, GenBank accession NW_018114954.1, bases 83,944 to 184,596. C) Spotty locus, colored as above, with GIMAPc genes in dark red, GenBank accession NC_048278.1, bases 5,367,202 to 5,491,088. The unrelated flanking genes, numbered sequentially, are solute carrier family 25 member 10-like (SLC25A10) and claudin-9-like (CLDN9) at the 5ʹ side and Ras-related dexamethasone-induced 1-like (RASD1), MYCBP-associated protein-like (MYCBPAP) and epsin 3-like (EPN3) at the 3ʹ side. D) Comparison of the cunner AFP11 and GIMAP-a5 genes. The percent identity between the homologous regions (gray shading) is indicated, with the two exons (wide) and intron (narrow) of each gene in cyan or dark yellow. The portion of the second exon of GIMAP-a5 that contains an Ala-rich region is shaded darker. The scale bar shows a 0.5-kb stretch of DNA

Results

Part 1: Cunner

Cunner AFPs reside at a single locus

The cunner reference genome, included in the Vertebrate Genomes Project [54], was screened for AFP sequences using the cunner cDNA sequence [29]. Matches were found at a single location, spanning 133 kbp, on chromosome 4 (Fig. 2A). This genome was not annotated, so the eleven AFP genes found here were annotated based on the known cDNA sequence (Fig. 2A, cyan arrowheads). Additionally, seven interspersed (yellow and red arrowheads) and five flanking genes (grey arrowheads) in the immediate neighborhood were also identified and marked based upon the annotated genomes of the spotty and ballan wrasses [54], two closely related fishes in the same family (Labridae, commonly called wrasses). The microsynteny of the flanking genes is conserved among the three species (Fig. 2, grey arrows), with the five encoded proteins sharing 89–99% identity between the cunner and ballan wrasse. As expected, the identities between the cunner/ballan wrasse and the more distantly related spotty are lower, ranging from 73 to 95%. The two other cunner assemblies (pseudohaplotype GCA_020745675.1 and GCA_024362835.1) were incomplete in this region, underscoring the difficulty of assembling multigene families.

Fig. 3
figure 3

Phylogenetic relationships of the GIMAP proteins found in the three wrasses via maximum-likelihood analysis of an alignment of the GTPase domains (Supplementary Fig. 1). The coloring of the labels matches the coloring of the genes in Fig. 2. The bootstrap values (%) are shown at each node. Note that cunner-a1 and -a4 are identical

AFP genes in the cunner are interspersed with GIMAP genes and share sequence similarity

A total of seven proteins belonging to the GTPase IMAP family (GIMAPs) were encoded by genes interspersed among the AFP genes of the cunner (Fig. 2A). The ballan wrasse and spotty each had six GIMAPs (Fig. 2B-C), but AFP genes were not present at these loci or elsewhere in these genomes. The AFP genes share both proximity and sequence similarity with the GIMAP genes. The pair with the highest similarity was AFP11 with GIMAP-a5, where five segments had identities ranging from 73 to 98% (Fig. 2D). These segments lie both upstream and downstream of the coding sequence and overlap the first exon and the majority of the intron. The most notable difference between the loci is that the majority of the coding sequence within exon 2 is absent from the AFP. These similarities are sufficient to indicate that the AFP gene arose from a duplicated GIMAP-a gene.

Before the GIMAP genes were compared between the cunner, spotty and ballan wrasse, errors in the automated annotation of this repetitive gene family were corrected as described in the Materials and Methods. The accession numbers and sequences, if modified, are shown in Supplementary Table 1. Phylogenetic analysis indicated that the GIMAPs of these three species cluster into three groups, herein labeled type a, b or c (Fig. 3). Type c is restricted to spotty (four isoforms), where there is just one each of the type a and type b isoforms. Ballan wrasse and cunner have two divergently transcribed type b genes and four or five type a genes. These type a proteins cluster along species lines, with shorter branch lengths between the cunner isoforms, indicating that these genes were duplicated after the divergence of these two lineages and that this occurred more recently in the cunner. Taken together, these findings indicate that the GIMAP gene family is dynamic and that the AFP genes arose from a GIMAP-a gene, with subsequent tandem amplification of both genes within the cunner lineage.

Fig. 4
figure 4

Sequences, models, and codon usage of cunner AFPs and Ala-rich C-terminal regions of GIMAP genes. A) AFP isoforms aligned with Ala highlighted in yellow, acidic and basic residues in red and blue font, respectively; Gly and Pro highlighted in pink, polar residues other than Thr highlighted in green, and aliphatic residues highlighted in gray, with the spacing of Thr residues (black highlighting) indicated above and asterisks indicating 100% conservation below. The last two residues (faded gray) are naturally removed when the C-terminus is amidated [29]. The AFPs are numbered sequentially as they appear in JAJGRF010000003.1, bases 6,121,000 to 6,371,000. B) Models of the long (AFP-1) and short (AFP-2) cunner AFPs generated using AlphaFold2-Colab [88] and rendered using PyMOL [87]. Residues are colored as above but with backbone atoms in light gray, Thr in green and other polar non-charged residues in dark green. Two 180°-degree rotations are displayed with their termini marked N and C. C) Ala-rich C-terminal regions of Cunner-A5 and Ballan-A1 GIMAPs relative to the shorter Cunner A isoforms. The Ala and Thr residues within the extension are colored as in A above, with the end of the GTPase domain in italics. Conservation between the unambiguously aligned residues of the four isoforms with extensions is indicated below the alignment by asterisks (all four sequences identical) or dots (three of four sequences identical). The beginnings of the two segments used to derive Ala codon usage are indicated with arrows. Accession numbers and reannotated sequences are given in Supplementary Table 1. D) Ala-codon usage of cunner AFPs (all isoforms, 322 codons) compared to the Ala-rich extensions of Cunner-A5 (26 codons), Ballan-A1 (75 codons), and Ala codons sampled from more than 5 million coding sequences from teleost fishes

The eleven cunner AFPs are highly similar

There are four AFP isoforms encoded by the eleven AFP genes (Fig. 4A), seven of which (2, 4–8, 10) match the previously characterized sequence [29]. Over 50% of the residues are Ala, and with one exception, Thr is spaced at 11-residue intervals. AFP9 differs from the main sequence at a single position (residue 4, Gly to Arg), while AFP11 contains one additional 11-a.a. repeat. Two genes (AFP1 and AFP3) encode identical isoforms that match AFP11, except that they contain an 18-a.a. insertion in which the Thr residues are spaced 18 residues apart.

Fig. 5
figure 5

A comparison of snailfish AFP genes with respect to repeats, sequence identities, and codon biases. A) Schematics of an AFP-containing gene locus from dusky snailfish (GenBank accession JBEEID010000351 bases 1861 to 88,664) with the entire region, including the flanking genes ETV6 and PARP12, shown on top. An expansion of the AFP-containing region is shown beneath this in two segments. AFP coding sequences are indicated with blue arrows, repetitive elements with bars, and simple repeats with narrow red bars. Bars of the same color indicate that the repetitive elements are homologous and unique elements are shown in alternating shades of light and dark grey. The matching regions of AFP3 and the inverted AFP4 are indicated by black lines beneath. Segments corresponding to fragments that match the PARP12 gene are colored dark red and labelled. B) Characteristics of the locus encoding Tanaka-1 (GenBank accession JAYMGU010000011.1, 3811k-3878k), showing flanking genes (light brown). The expansion of the AFP-containing region is colored as above, with inverted repeats indicated with arrows and the matching region of the AFP and pseudogene by black lines beneath. C) Detailed schematics of four genes encoding AFPs from above. Repetitive elements identified as transposable elements (TEs) by Dfam [50] are indicated with wider bars and colored by type, other repetitive sequences are indicated by narrower bars with color indicating similarity. Matching segments are indicated with gray shading, with percent identity indicated. D) Ala codon usage in the AFPs from Fig. 6, the Ala-rich segment within intron 3 of PARP12 and a D. rerio Copia transposon

AlphaFold2 models of both the longest (AFP-1) and shortest (AFP-2) isoforms, in which the longer isoform has 29 additional residues, are very similar (Fig. 4B). Both form extended amphipathic α-helices that have a hydrophobic, Ala-rich surface punctuated by Thr residues. The other side of the helix is also enriched for Ala, but all of the charged residues are found here, many of which appear to form helix-stabilizing salt bridges. The disruption in the 11 aa spacing of the Thr residues by one 18 aa segment in the long isoform is an interesting deviation. An exact periodicity of 11 a.a. corresponds to 11 residues/3 turns, or 3.67 residues/turn, whereas the typical α-helix has 18 residues/5 turns, or 3.60 residues/turn. An examination of winter flounder AFP isoforms, including the crystal structure of a short isoform [34] and NMR structure of an engineered variant [55], as well as the crystal structure of the hyperactive isoform [35], revealed that residues at 11-a.a. intervals have a slight precession as the periodicity approaches ~ 3.65 residues/turn. Therefore, the 18 a.a. insertion serves to counteract this, bringing the Thr back into register (Fig. 4B, top).

Cunner AFP arose from the C-terminus of the GIMAP-a protein

The Ala content of the GIMAP proteins is generally low. For example, cunner GIMAPa-1 has only seven Ala residues, making up 4% of the total. The four exceptions to this are cunner GIMAPa-5 (12%), ballan wrasse GIMAPa-1 (24%) and GIMAPa-4 (11%), plus spotty GIMAPa (10%). Their Ala-richness is restricted to the C-terminal region, which is outside of the GTPase domain, as shown in Fig. 4C. These extensions are present in all three wrasses being compared, whereas AFPs are found only in the cunner, so the Ala-rich extension arose prior to the AFP.

While these extensions are rich in Ala, they lack the periodicity of the Thr residues and contain more Gly and fewer charged residues than the AFPs. An AlphaFold2 model of the isoform with the longest C-terminal extension (Ballan-a1, Fig. 4C) predicts three α-helical segments in this region (Supplementary Fig. 2A), with the last spanning 37 aa (underlined in Fig. 4C) with 27 Ala residues (73%). The Ala-rich region of the shorter extension of Cunner-a5 is predicted to be unstructured (Supplementary Fig. 2B). Nevertheless, there is sufficient sequence similarity (Fig. 2D, darker yellow) to indicate that the Ala-rich extension gave rise to the AFP. The partial overlap of two of the matches is consistent with an internal duplication within the longer AFP11 allele. Interestingly, the similarity between the coding sequences was lower than that between the non-coding regions, consistent with positive selection of the AFP for its new function. Taken together, these data indicate that the AFP arose from a duplicated GIMAP-a gene containing an Ala-rich extension from which the GTPase domain was lost.

Ala codon usage of GIMAP-a and AFPs is similarly atypical

A further line of evidence that supports the Ala-rich extension of GIMAP-a as the progenitor of the AFP is that they share a similar codon usage bias. Cunner AFPs are unique among the type I AFPs in that Ala is preferentially encoded by GCT (72%), rarely by GCC (< 1%), and not at all by GCG (Fig. 4D). In contrast, in teleost fishes, GCT and GCC each encode approximately one-third of all Ala residues, while GCG encodes 11%. A similar bias is observed in the Ala-rich extensions, with Ballan-a1 employing GCT almost exclusively. This GCT bias is not observed in the flanking genes of any of these fishes (not shown), indicating that it is a characteristic of the C-terminal extension of the GIMAP-a genes that was retained in the AFP genes.

Part 2: Snailfish

AFP sequences are only present in one genus of Liparidae (snailfishes)

BLAST searches of genome sequences, the transcriptome shotgun assembly, and a selection of SRA datasets using both cDNA and protein sequences from snailfish AFPs [30, 56] revealed that in addition to species previously known to produce AFPs, namely, Atlantic (Liparis atlanticus), dusky (L. gibbus) and Tanaka’s (L. tanakae) snailfish, they are also found in L. liparis and L. tunicatus. Similar searches failed to identify homologs in other members of the same family (Liparidae) in different genera (Supplementary Table 2). However, the low complexity of snailfish AFPs (55–61% Ala, encoded primarily by GCC) means that divergent AFP sequences are difficult to identify.

Snailfish AFPs are members of a multigene family

There are genome assemblies for both Tanaka’s snailfish and the dusky snailfish that were generated from long-read sequences. The dusky contig-level assembly was generated from PacBio sequences [49] and the Tanaka chromosome-level assembly from Oxford Nanopore sequences [48]. A dusky locus containing three AFP genes and a putative pseudogene is shown in Fig. 5A. Three AFP loci were previously identified from a short-read Tanaka genome assembly from a fish isolated from the Sea of Japan [56], but only one, located on chromosome 11, was found in the long-read genome assembly of the fish from the Yellow Sea (Fig. 5B) [49].

Fig. 6
figure 6

Alignment of snailfish AFP sequences colored and annotated as in Fig. 4A. The GenBank accession numbers of the DNA sequences encoding these isoforms are Atlantic from cDNA; AY455862 [30]; Dusky-1 from a transcriptome; MT678484 [56]; Dusky-2 from cDNA; AY455863 [30]; Dusky-3 to 5 from genomic DNA, JBEEID010000351.1 bases 34,522 to 34,758, 42,139 to 41,882, 75,299 to 75,556 [49]; Tanaka-1 from chromosome 11, JAYMGU010000011.1 bases 3,855,161 to 3,855,574 [48]

Snailfish AFPs vary in length and sequence and largely lack regular Thr periodicity

An alignment of seven AFP sequences revealed three size classes, ranging from 78 to 85 a.a., 113 to 116 a.a, and 137 a.a. (Fig. 6). The Atlantic and Dusky-2 sequences are almost identical, with four differences restricted to the C-terminus. The three dusky sequences from the same locus (dusky-3, -4, and − 5, Fig. 6A) share 85 to 91% identity between themselves, whereas all other pairings drop below 70% identity.

Like all type I AFPs, the Ala content of these snailfish isoforms is high, ranging from 54 to 61%. However, the 11-aa Thr periodicity, which is prevalent in the cunner AFPs (Fig. 4A), is largely lacking, with each sequence having only one or two pairs of Thr residues with this spacing (Fig. 6). Nevertheless, Thr was the second most abundant residue in all of these sequences, ranging from 8 to 14%. Another notable difference is that the snailfish sequences all have two or more helix-breaking residues (Gly or Pro) around their midpoints (Fig. 6, pink highlighting) that are lacking in cunner AFPs (Fig. 4A).

The Tanaka AFP locus is absent from the dusky and hadal snailfishes

The single locus in the long-read Tanaka assembly contains one AFP gene and one AFP pseudogene (Fig. 5B, Supplementary Fig. 3A). This pseudogene shares 92% DNA sequence identity with Tanaka-1 but has two single nucleotide deletions (not shown) that disrupt the open-reading frame. These genes lie between the tensin-3 like protein (TNS3) and two convergently-transcribed isoforms of the insulin-like growth factor-binding protein (IGFBP) (Fig. 5B).

This locus was compared to the corresponding genomic region of a fish from the same family, the hadal snailfish, Pseudoliparis swirei [48]. These deep-water fish are unlikely to require an AFP as ice does not form at the constant near zero temperatures and high pressures found in the deep ocean [57]. As expected, type I AFPs were absent at this location, nor were they found elsewhere in the genome. A similar comparison was made with the Tanaka genome and again, AFP sequences were absent at this location. The genomic sequences of all three fishes aligned very well throughout most of their length in the flanking regions overlapping the TNS3 and IGFBP genes, indicated by the green lines on Fig. 5B, albeit with some insertions and deletions ≤ 1 kb in length. However, they did not share similarity to the Tanaka sequence in the the region between these genes, indicating that Tanaka’s snailfish was the only species of the three with AFP genes at this locus.

The dusky AFP locus is absent from the hadal and Tanaka snailfishes

There are three AFP genes and a putative AFP pseudogene found between the ETV6 and Poly(ADP-Ribose) Polymerase Family Member 12 (PARP12) genes of the dusky snailfish (Fig. 5A). The corresponding Tanaka locus is devoid of AFPs and similar to above, only matches the flanking genes as indicated by the green lines. The match with the hadal snailfish locus ended at the same spot near PARP12 but did not extend past the end of ETV6. The putative pseudogene (Supplementary Fig. 3B) may actually be a functional AFP, as although only the last third of the open reading frame resembles the other AFPs, the first two-thirds is Ala-rich. A monomeric model of this sequence suggests it could form a bundle of six helices of equal length (Supplementary Fig. 3C).

The coding sequences of Tanaka and dusky snailfishes are flanked by numerous TEs

The intergenic regions of both the dusky and Tanaka locus are populated largely by repetitive sequences, shown by bars in Fig. 5A, B. Many of these show similarity to transposable elements (TEs) detected by Dfam searches [50] and these are shown as wider bars in Fig. 5C and are listed in Supplementary Table 3. Additional segments (thinner bars) showed more than 80% identity to sequences present at least 100 times in several different fish genomes, suggesting that they are likely TEs that are not yet present in the Dfam database.

All of the AFP sequences are flanked by the same four TEs/repetitive elements (Fig. 5). The first flanking TE corresponds to the end of TE2 (magenta bars) and lies 23 bases upstream of each AFP gene and pseudogene. This segment was predicted to contain both a high-scoring TATA box (0.97) and initiation motif (0.84), with 16 bp of intervening sequence. This suggests that this portion of this TE, identified in an African cichlid (Supplementary Table 3), has been co-opted as a promoter. Additional segments overlapping a second region of this TE lie nearby or adjacent (dark purple bars).

The three additional flanking repetitive elements lie downstream of the genes. The first two (bright green and forest green bars) were not identified by Dfam. These are followed by TE3 (yellow). Even though these repetitive elements are shared, there are numerous insertions and deletions both within these elements and elsewhere that break up the matches. For example, Tanaka-1 contains an insertion of two segments of TE8 (light pink bars) between the coding sequence and the green elements, and a large portion of TE3 (yellow bars) was lost in dusky-3. Dusky AFP3 and AFP4 appear to be inverted repeats with some additional similarity (Fig. 5A) and dusky AFP 3 is flanked by inverted repeats of TE2 and TE5 (Fig. 5C), which suggests that these elements may have led to duplication of this gene. The regions between the AFPs share minimal similarity, as indicated by the alternating light and dark grey bars which correspond to repetitive sequences found only once at these loci. Those shown in other colors occur more than once. The majority of the low complexity sequence indicated by narrow red bars consists of dinucleotide repeats.

The snailfish AFP likely arose from a combination of repetitive non-coding DNA and transposons

Given that the flanking sequences of the flounder [40], cunner (Fig. 2), and sculpin genes [58] were clearly associated with progenitor genes, it was presumed that the same would be true for snailfish. This may well be the case, although rather than being associated with any one gene, they are associated with a variety of TEs and putative TEs, suggesting that these could have been the progenitors (Fig. 5). Additionally, there are many instances of simple repeats within liparid genomes that have the potential to encode runs of Ala residues. One of these is found nearby, within intron 3 of PARP12 (Fig. 5, Supplementary Fig. 4A) and the DNA sequence can be aligned to that of dusky AFP3 with 67% identity (Supplementary Fig. 5A). Examples from other fishes include the Dada transposon from the Siamese fighting fish (Betta splendens) [59] that contains several stretches with Ala coding potential, the longest of which is 177 bp (not shown), or the Copia transposon from Danio rerio (Supplementary Fig. 4B) that can be aligned to dusky AFP3 with 62% identity (Supplementary Fig. 5B). An extreme example is (GCC)151, which is located within the last intron of the gene encoding potassium voltage-gated channel subfamily H member 5 (GenBank: XP_048106505) from Allis shad, which can be aligned to dusky AFP3 with 68% identity (Supplementary Fig. 5C). An origin from any one of these sequences could explain the biased codon usage (Fig. 5D) in which 75% of Ala codons are GCC, in contrast with the cunner AFP, where GCT is dominant (Fig. 4C).

Fragments of the PARP12 gene suggest the snailfish AFP gene arose in its vicinity

The sequence with Ala-coding potential in intron 3 of PARP12 is suggestive of a possible origin for the AFP gene, but as demonstrated above, the AFP coding sequence shares no more similarity to the AFP coding sequence than to (GCC)n. Therefore, the flanking regions were compared. No similarity was found to dusky-5 (Fig. 5C), or to any of the other AFP genes (not shown) within a 4 kb span that included the coding region.

However, beyond this span, there is evidence that small portions of the PARP12 gene were duplicated along with the AFP genes (Fig. 5A, B, dark red bars with labels). For example, a 322 bp segment found 2.9 kb upstream of dusky AFP4 matches a segment overlapping part of exon 1 and intron 1 of PARP12 with 97% identity (Supplementary Fig. 6A). Additional matches of 88% near Tanaka AFP6 and upstream of the putative dusky pseudogene are also found (Supplementary Fig. 6B, C). Therefore, it is possible that the N-terminal region of PARP12 spanning the region from exon 1 through intron 3 was duplicated, after which all but the segment with Ala-coding potential and small fragments were replaced by TEs, one of which provided a promoter for the nascent AFP gene. The sense orientation of the PARP12 fragments upstream of both dusky-4 and the putative dusky pseudogene support this hypothesis. However, the lack of clear homology adjacent to the AFPs could also suggest that the AFP gene arose entirely from repetitive elements in the vicinity of PARP12, in which a small portion of the 5ʹ region of the gene had been duplicated.

The AFPs of snailfish form a folded helix and may be dimeric

Snailfish AFPs are known to be largely helical [60], but there are two short regions where this is likely not the case. At the N-terminus, one or two Pro residues may prevent this segment from forming a helix (Fig. 6). The rest of the protein is roughly bisected by two helix-breaking residues (pink highlighting) spaced three to five residues apart. This is reminiscent of the large isoform of winter flounder, where a 195-a.a. alpha helix folds in half and then associates with another molecule as an antiparallel dimer that forms a four-helix bundle [35]. Therefore, the snailfish AFP was modeled here both as a monomer and a dimer.

The models that were generated (Fig. 7), whether for the monomer or the dimer, folded the polypeptide chain in the same manner. The first five residues were not predicted to form part of the helix. The rest of the chain was helical, with the exception of the bend, punctuated by Pro and/or Gly residues (pink). The helical segments on either side of the bend are predicted to lie alongside each other. In the dimeric model, the two monomers are antiparallel. Interestingly, the surface of one side of these models is very flat (Fig. 7B), and like other type I AFPs, it is dominated by Ala and Thr and devoid of charged residues, unlike the opposite surface (Fig. 7C). The dimer also appears plausible because there are several potential intermolecular salt bridges predicted from the antiparallel pairing (red and blue).

Fig. 7
figure 7

Models of the dusky-1 AFP dimer (left) and monomer (right) generated using AlphaFold2-Colab [88] and rendered using PyMOL [87]. A) Cartoon representation of helices with the N (back) and C (front) termini indicated. B) Space filling model with the putative ice-binding surface facing forward. Residues are colored as in Fig. 4 but with backbone atoms in light grey, Thr in green, and other polar residues in dark green. C) View of the reverse side relative to B

Part 3: Summary of the convergent origins of the four type I AFPs

The AFPs of flounders, sculpins, cunner and snailfishes arose recently enough, sometime within the last 30 Ma (Fig. 1), that the origins of all but snailfish AFP could be definitively traced to pre-existing functional genes via duplication and divergence due to the similarities that their non-coding regions have to other sequences. The cunner AFP arose from the GIMAP-a gene (Fig. 2), a conclusion that has been independently confirmed by Rives et al. [61], whereas the flounder AFP arose from a Gig-2 gene (Fig. 8A, B) [40], and the sculpin AFP arose from a lunapark gene (Fig. 8C) [58]. In the flounder, the antiviral Gig-2 genes were duplicated at a new location (not shown), and the AFP gene arose from a single copy of the preexisting Gig-2 gene (Fig. 8B). It was later duplicated at the site of origin multiple times, giving rise to a single locus containing multiple AFP genes in tandem (Fig. 8A), but Gig-2 genes were not retained at this location. In cunner (Fig. 2D), as in flounder (Fig. 8B), the gene structure and much of the flanking sequence of the progenitor were retained. However, in cunner, the GIMAP-a progenitor was retained at the site of origin of the AFP, as both genes were duplicated multiple times in situ (Fig. 2A). In sculpin, the 15-exon lunapark gene gave rise to the AFP (Fig. 8C), but only small portions of the original gene were retained in the AFP. Despite this, the AFPs of these three species share up to 80% identity, with similar N-termini (Met-Asp), Thr periodicity and Ala-richness, which would be indicative of homology in the absence of additional information (Fig. 8D). In contrast, the snailfish lacks short AFPs, and its longer isoforms display little Thr periodicity, but one isoform does begin with Met-Asp (Fig. 6). The snailfish AFP gene may have arisen from the PARP12 gene that is found flanking the dusky AFP region, but the sequence similarity is too limited to provide a definitive answer. What is evident is that the majority of the flanking sequence, and perhaps the coding sequence as well, likely arose from transposons and repetitive DNA (Fig. 5).

Fig. 8
figure 8

Origin of flounder and sculpin type I AFPs from progenitor genes. A) Comparison of the corresponding genomic regions containing the HDAC5 and XYLT1 genes of starry flounder and Pacific halibut. The two alleles from an individual flounder contained either 4 or 33 AFP gene copies. B) Comparison of a single flounder Gig-2 gene and skin AFP gene. The percent identity between the homologous regions (gray shading) is indicated, with the two exons (wide) and intron (narrow) of each gene in blue or purple. The portion of the second exon containing the coding region is shaded darker. C) Comparison of the AFP gene of sculpin with the lunapark locus of a closely related fish, denoted as above, except that the lunapark exons are in orange and the lunapark is scaled 1:10 relative to the smaller AFP gene. D) Alignment of flounder, cunner and sculpin AFPs, colored as in Fig. 4A with the spacing of Thr residues (black highlighting) indicated above and asterisks indicating 100% conservation below. GenBank accession numbers are UW46952.1 (starry flounder), see Fig. 4 (cunner AFP2), and AGZ85412 (shorthorn sculpin)

The Ala-rich coding sequences of the AFPs are the portions of the genes that have diverged the most since their origins, consistent with positive selection for a new function. Nevertheless, there are clues as to their origins, and these also differ between the four groups. In flounder, a very small helical region of Gig-2, encoding a single Thr and three Ala, was likely tandemly amplified many times, while the rest of the coding sequence was lost. As two of the three Ala were encoded by GCC in the Gig-2 gene [40], this may explain the preference for GCC in the AFPs of this species [38]. In sculpin, the coding sequence arose by frameshifting and mutation of a Glu/Gln-rich region at the end of the GIMAP-a coding sequence, giving rise to an Ala-rich region with a preference for GCG (substitution within the Gln (GAG) codon) and GCA (frameshifted Glu (CAG) codon) [58]. However, in cunner, a long Ala-rich stretch was present in the progenitor prior to the origin of the AFP (Fig. 4), and the GCT bias present in the GIMAP gene was retained in the AFP genes. The strong bias toward GCC in the snailfish AFP (Fig. 5D) could be due to conversion of repetitive DNA containing GCC repeats into the AFP coding sequence.

Discussion

AFPs with diverse structures are found in a wide variety of organisms [2]. β-helical folds are prominent in all but fish and have arisen via convergence multiple times. In contrast, helical type I AFPs have only arisen in fish, albeit four times. Most type I AFPs, as well as other types of fish AFPs, are only moderately active [62]. This may be related to the different environments that AFP-producing species inhabit. Oceans do not cool below ~-1.8 °C, so fish require only ~ 1 °C of TH protection together with ~ 0.8 °C of colligative freezing point depression [5, 6], whereas terrestrial organisms can be exposed to much colder temperatures. β-helical AFPs can have large ice-binding surfaces [63, 64], and small increases in the area of the ice-binding face, through duplication of coils, can result in large increases in activity [65, 66]. A single straight α-helix can only generate a narrow ice-binding face, and the increase in activity upon duplication of a repeat is more modest [67]. Therefore, these moderately active AFP types are unlikely to be of much use to terrestrial organisms, whereas they are active enough to protect fish. However, type I AFPs with significantly greater activity have been found to either dimerize [35] or form multimers [68], but these AFPs appear to have arisen from much shorter progenitors [40].

The TH activity of cunner plasma during the winter is low (0 to 0.16 °C) [29, 69], with higher activity in the skin that is still below 1 °C [69], even in fish living in ice-laden waters off Newfoundland. This modest activity can be explained by two factors. First, all of the isoforms are expected to fold as isolated α-helices, albeit with longer isoforms that are likely to be somewhat more active. Second, there are only eleven AFP genes within the fish that was sequenced, which was obtained from New Brunswick Waters (GenBank BioSample SAMN22589422). Ocean pouts (type III AFP) from the same waters have approximately 40 gene copies [70]. However, cunners become torpid and do not feed during the winter [69], so they may not require high levels of AFPs throughout their bodies, in contrast to more winter-active species.

Snailfishes have much higher TH levels in their plasma than do cunners, averaging 0.73 °C in the Atlantic snailfish and 0.92 °C in the dusky snailfish [60]. Both of these fishes inhabit colder waters than Tanaka’s snailfish [71]. The fish used to generate the chromosomal-level genome assembly for which only one AFP gene was found (Tanaka-1) was caught in the Yellow Sea [48]. The other fish, in which several loci were identified from short contigs assembled from Illumina reads [56], originated from the Sea of Japan, a much icier location [72]. However, as assembly of multigene families from Illumina reads is notoriously difficult, the actual number of AFP genes within this fish cannot be determined. However, multiple genes were found residing at the same locus in a dusky snailfish from Nova Scotia waters [49]. These AFPs are significantly longer, on average (78–137 a.a.), than those of the cunner (47–76 a.a.). Rather than forming extended α-helices, they are predicted by AlphaFold2 to form hairpins that likely dimerize in an antiparallel fashion, similar to the larger (195-a.a.) dimeric isoform of winter flounder [35]. If so, two species will have independently generated Ala-rich AFPs that are more complex than a single extended α-helix and that fold in the same way.

When Evans and Fletcher probed a snailfish cDNA library using AFP sequences from sculpin and flounder, they did not obtain AFP sequences. Instead, they recovered clones encoding keratin or eggshell proteins that could encode an Ala-rich protein, either directly or by frameshifting [73]. Although this was a solid hypothesis, the AFP genes identified herein do not share any similarity with these genes outside of the coding sequence. Interestingly, frameshifting did appear to play a role in the origin of the sculpin AFP from the lunapark locus [58]. Without the presence of conserved flanking non-coding DNA, the similarity between lunapark and the AFP would have been unrecognizable because the AFP coding sequences are under strong selection for their new function. With nothing but TEs flanking the snailfish genes, which are present hundreds of times within the genome, the progenitor of the AFP coding sequence of snailfish has not been fully characterized. However, the codon bias is consistent with an origin from DNA rich in GCC repeats. The genomes assembled from long-read sequences contains both functional AFP genes and pseudogene(s). However, it is clear that these genes are surrounded primarily by sequences that correspond to TEs and other repetitive elements, with a few short segments that suggest, but do not prove, that the gene is somehow related to the PARP12 locus. Therefore, it appears that the bulk of the snailfish AFP gene originated from repetitive DNA.

The three genes that gave rise to type I AFPs have different functions. The GIMAP family of proteins, one of which gave rise to the cunner AFP, are generally found in the cytoplasm of lymphocytes, where their loss has been shown to be detrimental to cellular function in mammals. These proteins have an N-terminal GTP-binding domain and C-terminal tails of different lengths that likely confer unique properties, such as membrane anchoring, to different family members [74]. The Gig2 proteins that gave rise to the flounder AFP also belong to a family that is involved in immune function, but these proteins appear to be restricted to non-amniote vertebrates, with their overexpression leading to resistance to viral infection [75]. The progenitor of the sculpin AFP, lunapark, is not a member of a multigene family, nor is it involved in immune function. Instead, this protein is involved in the modeling of junctions of the endoplasmic reticulum [43]. Other fish AFPs have also arisen from the neofunctionalization of duplicated genes with unrelated functions. Type II arose from C-type lectins [11], type III from the C-terminal domain of sialic synthase [7], and the AFGPs of Antarctic fishes from trypsinogen [20].

It is not just pre-existing genes that have given rise to AF(G)Ps, as the AFGP of northern cods arose from non-coding DNA [22]. The snailfish gene may also have arisen from repetitive non-coding DNA, as well as TEs that define its non-coding regions. Despite the myriad detrimental effects that TEs have on the genome, they also have positive effects, and their regulatory sequences have been co-opted by a number of genes [76, 77]. In addition, numerous genes, including proteins involved in the function of the brain and placenta [77], as well as a large number of transcription factors [78], have arisen through the ‘domestication’ of TEs. They can also act as drivers of gene duplication, even after they have lost the ability to transpose on their own [77].

Positive selection has acted at two levels to enable AFP-producing fish to avoid freezing. First, the similarities between the type I AFPs of sculpins [42], cunner, and flounders [40] and their progenitor genes are greatest between non-coding sequences and are much lower or barely detectable between coding sequences. These are extreme examples of positive selection driving the diversification of duplicated genes [79]. Second, multiple copies of all of these genes are present, which is indicative of positive selection for increased dosage in response to environmental stress [80]. An additional benefit is that with more gene copies, there is a greater chance that one of them will accrue a beneficial mutation. Amplification of the beneficial variant will again generate a greater number of targets for additional beneficial mutations [81], resulting in rapid evolution of the gene family through multiple rounds of birth-and-death evolution [82]. Positive selection of gene duplicates can usually be measured by comparing the ratio of non-synonymous (change in amino acid) to synonymous (silent) mutations in the copies. Proteins under strong positive selection for a new function will have ratios near or exceeding unity, whereas those under negative selection will have ratios well below one [80]. In the case of these AFPs, however, the ratio cannot be calculated because the coding sequences have diverged to such a degree that they cannot be accurately aligned, making a one-to-one correlation of their codons impossible. Nevertheless, these sequences are clearly subject to strong positive selection, as the flanking sequences can be accurately aligned, as they have diverged at a far slower rate.

The convergent evolution of AFPs in numerous lineages, in which multiple AFP types were generated from different progenitors, suggests that the onset of global cooling at approximately 30 MA [83] was a strong environmental stressor for fish. The lineages that possess AFPs diverged from their non-AFP producing relatives during this period (Fig. 1). This includes the snailfish, that diverged from the type II producing sea raven ~ 25 million years ago, and cunner that diverged from wrasses that lack the AFP, also ~ 25 million years ago [84, 85]. However, the snailfish AFP may have arisen even more recently as it diverged from other snailfishes within family Labridae that do not produce AFPs around ~ 12 million years ago [84]. In stark contrast, lineages of Collembola (primitive arthropods commonly referred to as springtails) possess a unique type of AFP that they have received through descent rather than by multiple convergent evolutionary events. This polyproline type II helical bundle protein is not found in any other organisms to date. It arose during a much earlier glaciation event 400 million years ago during the Ordovician Period and is, therefore, widespread in Collembola species that radiated from a survivor(s) of this event [86].

In summary, Ala-rich α-helix type I AFPs arose independently in four different fish lineages during the late Cenozoic Ice Age, but not in any other organism. Equally remarkable are the various mechanisms by which similar AFPs arose via convergent evolution from both duplicated genes and from nongenic sequences. At least in the case of AFPs derived from pre-existing genes, the initial gene duplication event was followed by rapid diversification and additional gene duplication events.

Data availability

The sequence data that support the findings of this study were extracted from and can be found in the NCBI database. For the analysis of the cunner AFP locus, the following three genomic DNA regions were used; cunner JAJGRF010000003.1, 6,121,000 to 6,371,000 bp; ballan wrasse NW_018114954.1, 83944 to 184,596 bp; New Zealand spotty NC_048278.1, 5,367,202 to 5,491,088 bp. For the analysis of the snailfish AFP locus, the following three genomic regions were used; Tanaka’s snailfish JAYMGU010000011.1, 3,810,001 to 3,880,000 bp; hadal snailfish NC_079395, 17,210,001 to 17,260,000 bp; dusky snailfish JBEEID010000351.1, 1861 to 88,644 bp. The protein sequences corresponding to the genes examined are included within the manuscript or the supplementary information and accession numbers are provided therein.

References

  1. Duman JG. Animal ice-binding (antifreeze) proteins and glycolipids: an overview with emphasis on physiological function. J Exp Biol. 2015;218(Pt 12):1846–55.

    Article  PubMed  Google Scholar 

  2. Bar Dolev M, Braslavsky I, Davies PL. Ice-Binding Proteins and Their Function. Annu Rev Biochem. 2016;85:515–42.

    Article  CAS  PubMed  Google Scholar 

  3. Raymond JA, DeVries AL. Adsorption inhibition as a mechanism of freezing resistance in polar fishes. Proc Natl Acad Sci U S A. 1977;74(6):2589–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Celik Y, Drori R, Pertaya-Braun N, Altan A, Barton T, Bar-Dolev M, Groisman A, Davies PL, Braslavsky I. Microfluidic experiments reveal that antifreeze proteins bound to ice crystals suffice to prevent their growth. Proc Natl Acad Sci U S A. 2013;110(4):1309–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. DeVries AL. Glycoproteins as biological antifreeze agents in antarctic fishes. Science. 1971;172(3988):1152–5.

    Article  CAS  PubMed  Google Scholar 

  6. Davies PL, Graham LA. Protein evolution revisited. Syst Biol Reprod Med. 2018;64(6):403–16.

    Article  CAS  PubMed  Google Scholar 

  7. Baardsnes J, Davies PL. Sialic acid synthase: the origin of fish type III antifreeze protein? Trends Biochem Sci. 2001;26(8):468–9.

    Article  CAS  PubMed  Google Scholar 

  8. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.

    Article  CAS  PubMed  Google Scholar 

  9. Deng C, Cheng CH, Ye H, He X, Chen L. Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict. Proc Natl Acad Sci U S A. 2010;107(50):21593–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hobbs RS, Hall JR, Graham LA, Davies PL, Fletcher GL. Antifreeze protein dispersion in eelpouts and related fishes reveals migration and climate alteration within the last 20 Ma. PLoS ONE. 2020;15(12):e0243273.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ewart KV, Rubinsky B, Fletcher GL. Structural and functional similarity between fish antifreeze proteins and calcium-dependent lectins. Biochem Biophys Res Commun. 1992;185(1):335–40.

    Article  CAS  PubMed  Google Scholar 

  12. Ng NF, Hew CL. Structure of an antifreeze polypeptide from the sea raven. Disulfide bonds and similarity to lectin-binding proteins. J Biol Chem. 1992;267(23):16069–75.

    Article  CAS  PubMed  Google Scholar 

  13. Graham LA, Lougheed SC, Ewart KV, Davies PL. Lateral transfer of a lectin-like antifreeze protein gene in fishes. PLoS ONE. 2008;3(7):e2616.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Graham LA, Davies PL. Horizontal Gene Transfer in Vertebrates: A Fishy Tale. Trends Genet. 2021;37(6):501–3.

    Article  CAS  PubMed  Google Scholar 

  15. Deng G, Andrews DW, Laursen RA. Amino acid sequence of a new type of antifreeze protein, from the longhorn sculpin Myoxocephalus octodecimspinosis. FEBS Lett. 1997;402(1):17–20.

    Article  CAS  PubMed  Google Scholar 

  16. Gauthier SY, Scotter AJ, Lin FH, Baardsnes J, Fletcher GL, Davies PL. A re-evaluation of the role of type IV antifreeze protein. Cryobiology. 2008;57(3):292–6.

    Article  CAS  PubMed  Google Scholar 

  17. Low WK, Lin Q, Stathakis C, Miao M, Fletcher GL, Hew CL. Isolation and characterization of skin-type, type I antifreeze polypeptides from the longhorn sculpin, Myoxocephalus octodecemspinosus. J Biol Chem. 2001;276(15):11582–9.

    Article  CAS  PubMed  Google Scholar 

  18. DeVries AL, Komatsu SK, Feeney RE. Chemical and physical properties of freezing point-depressing glycoproteins from Antarctic fishes. J Biol Chem. 1970;245(11):2901–8.

    Article  CAS  PubMed  Google Scholar 

  19. Izumi R, Matsushita T, Fujitani N, Naruchi K, Shimizu H, Tsuda S, Hinou H, Nishimura S. Microwave-assisted solid-phase synthesis of antifreeze glycopeptides. Chemistry. 2013;19(12):3913–20.

    Article  CAS  PubMed  Google Scholar 

  20. Chen L, DeVries AL, Cheng CH. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc Natl Acad Sci U S A. 1997;94(8):3811–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. O’Grady SM, Schrag JD, Raymond JA, Devries AL. Comparison of antifreeze glycopeptides from arctic and antarctic fishes. J Exp Zool. 1982;224(2):177–85.

    Article  Google Scholar 

  22. Baalsrud HT, Torresen OK, Solbakken MH, Salzburger W, Hanel R, Jakobsen KS, Jentoft S. De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data. Mol Biol Evol. 2018;35(3):593–606.

    Article  CAS  PubMed  Google Scholar 

  23. DeVries AL, Lin Y. Structure of a peptide antifreeze and mechanism of adsorption to ice. Biochim Biophys Acta. 1977;495(2):388–92.

    Article  CAS  PubMed  Google Scholar 

  24. Hew CL, Joshi S, Wang NC, Kao MH, Ananthanarayanan VS. Structures of shorthorn sculpin antifreeze polypeptides. Eur J Biochem. 1985;151(1):167–72.

    Article  CAS  PubMed  Google Scholar 

  25. Graham LA, Marshall CB, Lin FH, Campbell RL, Davies PL. Hyperactive antifreeze protein from fish contains multiple ice-binding sites. Biochemistry. 2008;47(7):2051–63.

    Article  CAS  PubMed  Google Scholar 

  26. Hew CL, Wang NC, Yan S, Cai H, Sclater A, Fletcher GL. Biosynthesis of antifreeze polypeptides in the winter flounder. Characterization and seasonal occurrence of precursor polypeptides. Eur J Biochem. 1986;160(2):267–72.

    Article  CAS  PubMed  Google Scholar 

  27. Marshall CB, Fletcher GL, Davies PL. Hyperactive antifreeze protein in a fish. Nature. 2004;429(6988):153.

    Article  CAS  PubMed  Google Scholar 

  28. Scott GK, Davies PL, Kao MH, Fletcher GL. Differential amplification of antifreeze protein genes in the pleuronectinae. J Mol Evol. 1988;27(1):29–35.

    Article  CAS  PubMed  Google Scholar 

  29. Hobbs RS, Shears MA, Graham LA, Davies PL, Fletcher GL. Isolation and characterization of type I antifreeze proteins from cunner, Tautogolabrus adspersus, order Perciformes. FEBS J. 2011;278(19):3699–710.

    Article  CAS  PubMed  Google Scholar 

  30. Evans RP, Fletcher GL. Type I antifreeze proteins expressed in snailfish skin are identical to their plasma counterparts. FEBS J. 2005;272(20):5327–36.

    Article  CAS  PubMed  Google Scholar 

  31. Chakrabartty A, Hew CL, Shears M, Fletcher G. Primary Structures of the Alanine-Rich Antifreeze Polypeptides from Grubby Sculpin, Myoxocephalus-Aenaeus. Can J Zool. 1988;66(2):403–8.

    Article  CAS  Google Scholar 

  32. Yamazaki A, Nishimiya Y, Tsuda S, Togashi K, Munehara H. Freeze Tolerance in Sculpins (Pisces; Cottoidea) Inhabiting North Pacific and Arctic Oceans: Antifreeze Activity and Gene Sequences of the Antifreeze Protein. Biomolecules 2019;9(4).

  33. Kwan AH, Fairley K, Anderberg PI, Liew CW, Harding MM, Mackay JP. Solution structure of a recombinant type I sculpin antifreeze protein. Biochemistry. 2005;44(6):1980–8.

    Article  CAS  PubMed  Google Scholar 

  34. Sicheri F, Yang DS. Ice-binding structure and mechanism of an antifreeze protein from winter flounder. Nature. 1995;375(6530):427–31.

    Article  CAS  PubMed  Google Scholar 

  35. Sun T, Lin FH, Campbell RL, Allingham JS, Davies PL. An antifreeze protein folds with an interior network of more than 400 semi-clathrate waters. Science. 2014;343(6172):795–8.

    Article  CAS  PubMed  Google Scholar 

  36. Gong Z, Ewart KV, Hu Z, Fletcher GL, Hew CL. Skin antifreeze protein genes of the winter flounder, Pleuronectes americanus, encode distinct and active polypeptides without the secretory signal and prosequences. J Biol Chem. 1996;271(8):4106–12.

    Article  CAS  PubMed  Google Scholar 

  37. Low WK, Miao M, Ewart KV, Yang DS, Fletcher GL, Hew CL. Skin-type antifreeze protein from the shorthorn sculpin, Myoxocephalus scorpius. Expression and characterization of a Mr 9, 700 recombinant protein. J Biol Chem. 1998;273(36):23098–103.

    Article  CAS  PubMed  Google Scholar 

  38. Graham LA, Hobbs RS, Fletcher GL, Davies PL. Helical antifreeze proteins have independently evolved in fishes on four occasions. PLoS ONE. 2013;8(12):e81285.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Speed MP, Arbuckle K. Quantification provides a conceptual basis for convergent evolution. Biol Rev Camb Philos Soc. 2017;92(2):815–29.

    Article  PubMed  Google Scholar 

  40. Graham LA, Gauthier SY, Davies PL. Origin of an antifreeze protein gene in response to Cenozoic climate change. Sci Rep. 2022;12(1):8536.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Sun C, Liu Y, Hu Y, Fan Q, Li W, Yu X, Mao H, Hu C. Gig1 and Gig2 homologs (CiGig1 and CiGig2) from grass carp (Ctenopharyngodon idella) display good antiviral activities in an IFN-independent pathway. Dev Comp Immunol. 2013;41(4):477–83.

    Article  CAS  PubMed  Google Scholar 

  42. Graham LA, Davies PL. Fish antifreeze protein origin in sculpins by frameshifting within a duplicated housekeeping gene. FEBS J. 2024;291(18):4043–61.

    CAS  PubMed  Google Scholar 

  43. Chen S, Desai T, McNew JA, Gerard P, Novick PJ, Ferro-Novick S. Lunapark stabilizes nascent three-way junctions in the endoplasmic reticulum. Proc Natl Acad Sci U S A. 2015;112(2):418–23.

    Article  CAS  PubMed  Google Scholar 

  44. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.

    Article  CAS  PubMed  Google Scholar 

  45. Nugent CM, Kess T, Brachmann MK, Langille BL, Duffy SJ, Lehnert SJ, Wringe BF, Bentzen P, Bradbury IR. Whole-genome sequencing reveals fine-scale environment-associated divergence near the range limits of a temperate reef fish. Mol Ecol. 2023;32(17):4742–62.

    Article  CAS  PubMed  Google Scholar 

  46. Madeira F, Pearce M, Tivey ARN, Basutkar P, Lee J, Edbali O, Madhusoodanan N, Kolesnikov A, Lopez R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res 2022.

  47. Lie KK, Torresen OK, Solbakken MH, Ronnestad I, Tooming-Klunderud A, Nederbragt AJ, Jentoft S, Saele O. Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish. BMC Genomics. 2018;19(1):186.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Xu W, Zhu C, Gao X, Wu B, Xu H, Hu M, Zeng H, Gan X, Feng C, Zheng J et al. Chromosome-level genome assembly of hadal snailfish reveals mechanisms of deep-sea adaptation in vertebrates. Elife 2023;12.

  49. Correard S, Jones SJ, Leelakumari S, Yueh H, Chida A, Paton T, Ho K, Djambazian H, Berube P, Emberley J, et al. The genome of the dusky seasnail (Liparis gibbus), unpublished. In.: Canadian BioGenome Project, Canada’s national platform for genome sequencing and analysis. 2024.

  50. Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA. 2021;12(1):2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Gouy M, Guindon S, Gascuel O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27(2):221–4.

    Article  CAS  PubMed  Google Scholar 

  52. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35(6):1547–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Alexaki A, Kames J, Holcomb DD, Athey J, Santana-Quintero LV, Lam PVN, Hamasaki-Katagiri N, Osipova E, Simonyan V, Bar H, et al. Codon and Codon-Pair Usage Tables (CoCoPUTs): Facilitating Genetic Variation Analyses and Recombinant Gene Design. J Mol Biol. 2019;431(13):2434–41.

    Article  CAS  PubMed  Google Scholar 

  54. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Liepinsh E, Otting G, Harding MM, Ward LG, Mackay JP, Haymet AD. Solution structure of a hydrophobic analogue of the winter flounder antifreeze protein. Eur J Biochem. 2002;269(4):1259–66.

    Article  CAS  PubMed  Google Scholar 

  56. Burns JA, Gruber DF, Gaffney JP, Sparks JS, Brugler MR. Transcriptomics of a Greenlandic Snailfish Reveals Exceptionally High Expression of Antifreeze Protein Transcripts. Evol Bioinform Online. 2022;18:11769343221118347.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Vernberg WB, Vernberg FJ. The Deep Sea. In: Environmental Physiology of Marine Animals. Edited by Vernberg WB, Vernberg FJ. Berlin, Heidelberg: Springer Berlin Heidelberg; 1972: 302–318.

  58. Graham LA, Davies PL. Fish antifreeze protein origin in sculpins by frameshifting within a duplicated housekeeping gene. FEBS J 2024, In press.

  59. Kojima KK. Diversity and Evolution of DNA Transposons Targeting Multicopy Small RNA Genes from Actinopterygian Fish. Biology (Basel) 2022, 11(2).

  60. Evans RP, Fletcher GL. Isolation and characterization of type I antifreeze proteins from Atlantic snailfish (Liparis atlanticus) and dusky snailfish (Liparis gibbus). Biochim Biophys Acta. 2001;1547(2):235–44.

    Article  CAS  PubMed  Google Scholar 

  61. Rives N, Lamba V, Cheng CHC, Zhuang X. Diverse origins of near-identical antifreeze proteins in unrelated fish lineages provide insights into evolutionary mechanisms of new gene birth and protein sequence convergence. Mol Biol Evol. 2024;41(9).

  62. Scotter AJ, Marshall CB, Graham LA, Gilbert JA, Garnham CP, Davies PL. The basis for hyperactivity of antifreeze proteins. Cryobiology. 2006;53(2):229–39.

    Article  CAS  PubMed  Google Scholar 

  63. Hakim A, Nguyen JB, Basu K, Zhu DF, Thakral D, Davies PL, Isaacs FJ, Modis Y, Meng W. Crystal structure of an insect antifreeze protein and its implications for ice binding. J Biol Chem. 2013;288(17):12295–304.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Lin FH, Davies PL, Graham LA. The Thr- and Ala-rich hyperactive antifreeze protein from inchworm folds as a flat silk-like beta-helix. Biochemistry. 2011;50(21):4467–78.

    Article  CAS  PubMed  Google Scholar 

  65. Marshall CB, Daley ME, Sykes BD, Davies PL. Enhancing the activity of a beta-helical antifreeze protein by the engineered addition of coils. Biochemistry. 2004;43(37):11637–46.

    Article  CAS  PubMed  Google Scholar 

  66. Leinala EK, Davies PL, Doucet D, Tyshenko MG, Walker VK, Jia Z. A beta-helical antifreeze protein isoform with increased activity. Structural and functional insights. J Biol Chem. 2002;277(36):33349–52.

    Article  CAS  PubMed  Google Scholar 

  67. Chao H, Hodges RS, Kay CM, Gauthier SY, Davies PL. A natural variant of type I antifreeze protein with four ice-binding repeats is a particularly potent antifreeze. Protein Sci. 1996;5(6):1150–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Mahatabuddin S, Hanada Y, Nishimiya Y, Miura A, Kondo H, Davies PL, Tsuda S. Concentration-dependent oligomerization of an alpha-helical antifreeze polypeptide makes it hyperactive. Sci Rep. 2017;7:42501.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Valerio PF, Kao MH, Fletcher GL. Thermal hysteresis activity in the skin of the cunner, Tautogolabrus adspersus. Can J Zool. 1990;68(5):1065–7.

    Article  Google Scholar 

  70. Hew CL, Wang NC, Joshi S, Fletcher GL, Scott GK, Hayes PH, Buettner B, Davies PL. Multiple genes provide the basis for antifreeze protein diversity and dosage in the ocean pout, Macrozoarces americanus. J Biol Chem. 1988;263(24):12049–55.

    Article  CAS  PubMed  Google Scholar 

  71. FishBase. [www.fishbase.org].

  72. Nihashi S, Ohshima KI, Saitoh S-I. Sea-ice production in the northern Japan Sea. Deep Sea Res Part I. 2017;127:65–76.

    Article  Google Scholar 

  73. Evans RP, Fletcher GL. Type I antifreeze proteins: possible origins from chorion and keratin genes in Atlantic snailfish. J Mol Evol. 2005;61(4):417–24.

    Article  CAS  PubMed  Google Scholar 

  74. Limoges MA, Cloutier M, Nandi M, Ilangumaran S, Ramanathan S. The GIMAP Family Proteins: An Incomplete Puzzle. Front Immunol. 2021;12:679739.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Zhang YB, Liu TK, Jiang J, Shi J, Liu Y, Li S, Gui JF. Identification of a novel Gig2 gene family specific to non-amniote vertebrates. PLoS ONE. 2013;8(4):e60588.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Gebrie A. Transposable elements as essential elements in the control of gene expression. Mob DNA. 2023;14(1):9.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvak Z, Levin HL, Macfarlan TS, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19(1):199.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Mukherjee K, Moroz LL. Transposon-derived transcription factors across metazoans. Front Cell Dev Biol. 2023;11:1113046.

    Article  PubMed  PubMed Central  Google Scholar 

  79. Wolfe K, O’HUigin C. Significance of positive selection and gene duplication in adaptive evolution: in memory of Austin L. Hughes. Immunogenetics. 2016;68(10):749–53.

    Article  PubMed  Google Scholar 

  80. Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc Biol Sci. 2012;279(1749):5048–57.

    PubMed  PubMed Central  Google Scholar 

  81. Andersson DI, Jerlstrom-Hultqvist J, Nasvall J. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol. 2015;7(6).

  82. Eirin-Lopez JM, Rebordinos L, Rooney AP, Rozas J. The birth-and-death evolution of multigene families revisited. Genome Dyn. 2012;7:170–96.

    Article  CAS  PubMed  Google Scholar 

  83. Scotese CR, Song H, Mills BJW, van der Meer DG. Phanerozoic paleotemperatures: The earth’s changing climate during the last 540 million years. Earth Sci Rev. 2021;215:103503.

    Article  CAS  Google Scholar 

  84. Betancur RR, Wiley EO, Arratia G, Acero A, Bailly N, Miya M, Lecointre G, Orti G. Phylogenetic classification of bony fishes. BMC Evol Biol. 2017;17(1):162.

    Article  Google Scholar 

  85. Hughes LC, Nash CM, White WT, Westneat MW. Concordance and Discordance in the Phylogenomics of the Wrasses and Parrotfishes (Teleostei: Labridae). Syst Biol. 2022.

  86. Scholl CL, Holmstrup M, Graham LA, Davies PL. Polyproline type II helical antifreeze proteins are widespread in Collembola and likely originated over 400 million years ago in the Ordovician Period. Sci Rep. 2023;13(1):8880.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. The PyMOL Molecular Graphics System. Version 2.5.2 Schrödinger, LLC. [https://pymol.org/].

  88. Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful for the pioneering work performed by Drs. Rod Hobbs and Garth Fletcher to characterize the AFPs of the cunner and snailfish. This research was supported by CIHR Foundation Grant FRN 148422 and NSERC Discovery Grant RGPIN-2016-04810 to PLD.

Funding

This research was supported by CIHR Foundation Grant FRN 148422 and NSERC Discovery Grant RGPIN-2016-04810 to PLD.

Author information

Authors and Affiliations

Authors

Contributions

L.A.G. and P.L.D. conceived the study and made manuscript revisions. L.A.G. accessed the databases, analyzed data, generated the figures and wrote the first draft. P.L.D. supervised the study.

Corresponding author

Correspondence to Peter L. Davies.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Graham, L.A., Davies, P.L. Convergent evolution of type I antifreeze proteins from four different progenitors in response to global cooling. BMC Mol and Cell Biol 25, 27 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12860-024-00525-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12860-024-00525-5

Keywords