Search Ensembl Pig
.
What's New in Release 57
View other news
Ensembl Roadmap - Get a preview of upcoming developments!
Data updates
Xref projection
Gene names and GO xrefs have been projected between species as usual.
Updated ontology database
The ontology database, ensembl_ontology_57, has been updated with the latest data from GO and SO.
Species name in meta table
The meta table has been populated with the scientific name for each species.
Multiple alignments
- 5-way primates EPO (human, chimp, gorilla, orang, macaque)
- 10-way eutherian mammals EPO
- 33-way eutherian mammals EPO-2X based on the 10-way EPO
- 13-way Mercator-Pecan
- 3-way avian Pecan (chicken, turkey, zebrafinch)
Families
- Updated MCL families, including all Ensembl transcript isoforms and newest Uniprot Metazoa
- Clustering by MCL
- Multiple Sequence Alignments with MAFFT
- Family stable ID mapping
Protein Homologies
- GeneTrees with new/updated genebuilds and assemblies
- Clustering using hcluster_sg
- Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm
- Homology inference including the recent 'putative gene split' and 'contiguous gene split' exceptions
- Pairwise gene-based dN/dS calculations for high coverage species pairs only
- GeneTree stable ID mapping
Release 57 Ensembl marts
Ensembl Genes 57
- Addition of new species Meleagris gallopavo (Turkey)
- Elephant new assembly (loxAfr3), Gorilla new assembly (gorGor3), Rabbit new assembly (oryCun2.0)
- Addition of ortholog ancestor attribute and paralogy type attribute in the Homologs attribute section
- Addition of GO Term Name (e.g. regulation of biological process) filter as well as GO Term Accession (e.g. GO:0050789) in the GENE ONTOLOGY filter section
- GPCR db filters and attributes removed for Drosophila, as these will also be removed from Flybase.
- The variation filter->variation type is now multi select
- Example ID's added to ID list limit filter drop down
- Recommended Pubmed ID for the Transcript Event has been added to the filters and attribute section
- Transcript splicing event information added for more species:
- human
- mouse
- rat
- worm
- zebrafish
- fruitfly - Gorilla now has a chromosome drop down in the region filter section
- SMART ID URl link has been fixed
Ensembl variation 57
- *NEW* Homo sapiens Structural Variation dataset now available
- New Horse variation database (dbSNP 130)
- New Pig variation database (dbSNP 128)
- Update of Rat, Zebrafish and Cow to dbSNP 130
- GENE ASSOCIATED VARIATION FILTERS->Consequence Type now multi select
- Addition of phenotype drop-down in the variation mart filters
- Biomart code has been patched to fix the problem of downstream sequences missing 1bp for cases where there is an Indel [-/N]
- Variation->SEQUENCE VARIATION attribute section now has Strain SNP and Strain - Other Variants (Indels, Multiple Nucleotide Polymorphisms).
- New Variation->variation set information (filters and attributes) in Human dataset
- Restructuring of the strain polymorphism section for mouse and rat and removal of this radio button for all other species as the genotype information will now be in the variation attribute section.
Ensembl Functional Genomics 57
- Addition of link to Ensembl eFG documentation in filter sections.
Web features
Linking to Ensembl
For the benefit of external sites linking to Ensembl, we now support URLs that rely on the gene or transcript stable ID to identify the species. For example, you can now link to http://www.ensembl.org/id/ENSG00000139618 which redirects to http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000139618.
SNP Effect Predictor
By popular demand, we have added a web-based tool which can calculate the consequence type of SNPs. Upload your data in our GTF-like SNP format and it will be available to download as text or HTML.
Improvements to user data upload
The following features have been added to the user upload facility:
- Support for gzipped files (all formats)
- To reduce the risk of timeouts, an upload limit of 5Mb is now in force, although this also includes compressed files.
- Support for the 'priority' attribute in track info lines to order tracks in Location/View
- Except on large gzipped files (which are slow to parse), the upload page gives feedback on the number of features found in the file
- User data tracks can now be up to 20 features deep
In addition, if uploading via a Location page with chromosome coordinates, a link to the nearest 100KB region with data from your file is supplied. If no nearest location can be identified, the first feature in the file is used for the location of the link. Please note that these coordinates are currently stored with your file during upload and do not get updated when you navigate to a new location.
Image configuration
- Lefthand menu buttons have a more prominent style, including a small icon
- New species selector for multi-species view
- Individual images refresh when the configuration for them is updated
- Updating the configuration of an image causes only that image to reload (not the whole page)
- "Configure page" tab now has a left menu on pages where configuration options could be sensibly grouped together (eg compara alignment views)
- Image configurations now have an Enable/Disable All option for each non-external track type
- Image configuration menus now have an "External data sources" header where appropriate, and the "show info" buttons have been right-aligned to improve readability
- Section headers on the Active Tracks page for image configuration now act as links to the appropriate sections
- The option to display tracks with no data in the region has been reinstated
- Labelling of genes and transcripts now depends on the renderer: when collapsed, the track is labelled according to the biotype of the gene, when expanded it is labelled according to the biotype of the transcript. The label is much shorter than before, being just biotype rather than status-biotype-analysis
- Colouring of genes and tracks is according to biotype and analysis, with features being coloured according to the gene properties when the track is collapsed and transcript properties when the track is expanded. For all species, protein-coding genes/transcripts are coloured red and processed transcripts coloured blue. For the human and mouse merged Ensembl-Havana genes/transcripts, this approach is overridden and they are coloured gold regardless of biotype.
Variation web features
- Location/Genome - SNP locations can be displayed by phenotype (linked to from variation pages
- Structural variation track now available on Region in Detail
- Variations displayable in Exon view
- LDview is now available in the Location menu and can display data for more than one population.
Marker displays
- Markers without locations can now be viewed on the Location/Markers page
- Markers will now display on the bottom panel of 'Region in Detail' regardless of map weight, but only when the marker synonym is passed in the URL (e.g. linking from Location/Marker).
New configuration tool for alignments and LD data
Two of our Location pages, 'Multi-species view' and 'Linkage data', now use a pop-up window to configure the data to be viewed.
- Multi-species view
- Click on the 'Select species for comparison' button above the image to add or remove one or more species to the view
- Linkage data
- When going straight to this page via the left-hand menu, there will be no data displayed. Click on the 'Select populations for comparison' button to add your choice of data.
Also note that LD tables can now be exported from the 'Linkage data' page.
Other new web features
- Transcript table on gene/transcript pages has been expanded and re-ordered to try and make the protein coding transcripts easier to identify
- Glossary entries now pop up when you mouse over biotype names on gene/transcript summary tables
- A "BLAST this sequence" button has been added on gene sequence, transcript sequence and cDNA views
- A "Download as RTF" button has been added on cDNA view and Exon view
- Form validation now takes place on the fly, with error messages appearing once the user leaves an input box with an invalid entry. The submit button is disabled when this happens.
- Drag/select navigation is now enabled on Location alignments, but only the primary species is used to find the new location
API and schema changes
Variation schema and API changes
- Structural variation table Indels have been moved from the compressed_genotype table to the multiple_basepair_genotype table
- A fix has been added to the subsnp ID system to allow BioMart to build.
- set and variation_set tables added to allow sub-setting of variations
- API calls added to retrieve strain sequence with "N" for areas with no sequencing coverage
- Added source object with adaptor
DBAdaptor species selection
The Funcgen::DBAdaptor now uses the meta table to automatically set the species parameter if present, facilitating automatic selection of the correct dnadb.
DAS multi assembly support
The eFG DAS environment now supports DAS hydra sources on multiple assemblies.
eFG Array Mapping Pipeline Config
The eFG array mapping environment now handles the configuration of the pipeline BatchQueue.pm file automatically, removing the need to edit this file.
eFG Array Mapping Pipeline LOCAL mode
The eFG array mapping pipeline now supports running in LOCAL mode for those without access to LSF.
Alternative initiation
To support alternative initiation (one transcript having many translations), we have added references to canonical translations to the transcript table. There is also Core API support for fetching all translations of a transcript (fetch_all_by_Transcript() in the TranslationAdaptor) as well as the canonical transcript (using old unchanged semantics).
InputSet support
InputSet is a new class in the functional genomics codebase which supersedes the ExperimentalSet class, with new functionality to capture short reads alignments within a ResultSet. This will be used as a basis for storing short reads alignments in the efg DB.
ResultFeature Collections
The ResultFeature class and underlying result_feature table has changed to support compressed binary representations of raw chip or shorts reads data.
Other news
Turkey - new species (Turkey)
Ensembl 57 features a complete genebuild for a new species, Turkey (Meleagris gallopavo). This our third bird genome, after chicken and zebrafinch, and the first to use new sequencing technologies. The build includes an otherfeatures database.
Gorilla genebuild (Gorilla)
A new genebuild has been done on the latest gorilla assembly, gorGor3. The new assembly uses high-coverage Solexa short reads to improve the low-coverage WGS.
A new otherfeatures database has also been prepared.
Elephant genebuild (Elephant)
There is a new genebuild on the latest elephant assembly, loxAfr3. This is a high coverage 7X assembly which replaces the previous 2X build.
New rabbit assembly and genebuild (Rabbit)
Ensembl 57 features a new genebuild on a new assembly for rabbit, oryCun2.0. An other_features database has also been created for this species.
Human gene update (Human)
The human gene set has been cleaned up and about 1000 genes have been removed.
Mouse Havana Merge (Mouse)
The mouse gene set has been updated using the new HavanaMerge code.
Human: updates to otherfeatures database (Human)
Two updates have been made to the human otherfeatures database:
- The human EST alignments have been rerun. This means that recently sequenced ESTs should be available.
- The NCBI gene set, currently attached as a DAS track in Ensembl, now live in the otherfeatures database.
The gene set is a more recent set than that currently seen in the DAS track and viewing will be faster now that the genes are stored in the otherfeatures database.
External database references (Human, Mouse, Rat, Cow)
There are new external database references for human, mouse and cow, and the reference for rat have been updated.
Human eFG data update (Human)
The human eFG DB has been updated with some new histone modification data sets.
Array Mapping (multiple species)
Array mapping information has been updated for all species, including a new Affy porcine array for Sus scrofa.
lincRNAs (Human, Mouse)
Ensembl human and mouse databases now include lincRNA (large intervening non-coding RNA) data.
Pairwise alignments (multiple species)
Pairwise alignments
- chicken-turkey Blastz
- chicken-zebrafinch Blastz
- pig-cow Blastz
- opossum-wallaby Blastz
- human-turkey Blastz
- human-elephant Blastz
- human-gorilla Blastz
Pig Xrefs (Pig)
Extra xrefs have been added to pig by projecting human HGNC names to pig genes, enabling us to display VEGA gene names for some of the pig genes.
Clarification of MT analyses (Human, Mouse)
Imported MT genomes now have a distinct analysis/logic_name to distinguish them from the Ensembl annotated gene set. Their source has also been changed to reflect this.
Fish multiple alignments (Zebrafish, Fugu, Tetraodon, Medaka, Stickleback)
Ensembl Compara now includes a multiple alignment of all five species: Danio, Gasterosteus, Oryzias, Takifugu and Tetraodon.
cDNA updates (Human, Mouse)
An updated version of the cDNA database is now available for human and mouse.
Human variation data updates (Human)
- GWAS data from a paper in BMC Medical Genetics: "An Open Access Database of Genome-wide Association Results"
- A set of old/retired rsIDs has been added in variation_synonym table for corresponding current rsIDs for human.
- Import of EGA data from a genomewide association study of variations linked to stroke and ischemic stroke ("Genomewide Association Studies of Stroke", Ikram et al., N Engl J Med. 2009 Apr 23;360(17):1718-28).
- New variation consequence predictions for the new human gene set import additional data from the NHGRI GWAS catalog
- First CNV data set from:
- Redon 2006 "Global variation in copy number in the human genome" PMID:17122850
- Wang 2009 "The diploid genome sequence of an Asian individual" PMID:18987735
- Fixes in human database:
- flanking sequences that should have been reversed
- merged rsIDs that map to the same location
- corrected typo in Watson source
- reimported Affymetrix data
- watson and venter's read_coverage redone to include MT chromosome
New variation data (multiple species)
- Imported dbSNP 130 for horse, rat, zebrafish and cow
- Imported and re-mapped dbSNP 128 data for pig
- update the import of SNP data from UniProt
- SS IDs for the zebra finch data
- Improved descriptions for the variation data sources displayed in the configuration track panel
- New variation consequence predictions for the new human, mouse, and rat gene sets.
- Sample name for sequenced orangutan individual change from 'abelii' to 'Susie'
- Remapped MT variations to the new MT sequence on human and rat.
Loading new rat MT (Rat)
The rat MT genome has been replaced with its corrected version, NC_001665.
Cleaning joined human genes and unlinked xrefs (Human)
A number of "fused" genes and unlinked xrefs were removed from Ensembl Human, including the following transcripts:
- ENST00000361469, which caused JAK3 and INSL3 genes to get fused and be called JAK3
- ENST00000428942, which caused TMEM91 and BCKDHA to join, labelling BCKDHA as TMEM91
We would like to thank our users who reported these errors. Please contact us at helpdesk@ensembl.org if you see any others.
Clean up of pairwise alignments (Human, Rat)
Pairwise alignments using the old MT chromosome sequence from human and rat have been deleted from the compara database.
Synteny update (Human, Cow, Pig, Gorilla)
Synteny data has been added for human-gorilla and pig-cow.
Patch - update versions (all species)
There have been no core schema changes this release, so the only patch required is one to update the version number in the meta table to 57.
Array mapping - Transcript alignments (multiple species)
The ProbeTranscriptAlign analyses have been rerun for all species with functional genomics data, to correct a data bug which was overlooking some valid alignments.
RegulatoryFeature StableID prefixes (Human, Mouse)
The eFG API now uses stable_id_prefix from the core meta table.
Human MT genome import (Human)
The human MT genome has been replaced by the revised reference sequence NC_012920 (AC_000021).
Alternative Splicing Events (Mouse, Zebrafish, Rat, Fly, C.elegans)
Computations have been made of alternative splicing events for the following species (in addition to Homo sapiens): Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster.
.

