We are interested in finding an excellent postdoc with interests in
protein functional annotation, machine learning and computer grids. The
position is open for 3.5 years at the Université Pierre et Marie Curie,
in the heart of paris.
Research topic: Protein function annotation, multiple probabilistic
models, domain architecture, machine learning, combinatorial
optimization, computer grid.
Title: A novel integrative platform for large scale protein annotation
that exploits a multitude of diversified probabilistic models in several
protein signature databases.
Abstract: Precise genome annotations are a gold mine for biologists that
use them to identify proteins involved in biological processes.
Databases of protein domains and functional sites are vital resources to
provide functional analysis for these new proteins. Most of databases
describe known domains with probabilistic models representing consensus
among all domain sequences, while only a few ones associate to each
protein domain family different probabilistic models, built from a
sample of diversified homologous sequences. In the attempt of unifying
the annotation process and providing a more accurate tool, integrative
approaches combine different types of protein signatures from multiple
databases into a single searchable resource. However, the increasing
number of proteins with no annotation, present in highly divergent
genomes, and the large number of erroneous annotations produced by
current tools ask for the development of innovative solutions.
We propose a novel integrated approach for large scale protein
annotation that will exploit an unprecedented amount of genomic data as
well as sophisticated machine learning techniques and combinatorial
optimization approaches taking advantages of High Performance Computing
(HPC) environments. The idea is to uncover as much as possible the
evolutionary processes of protein sequences that took place throughout
the whole tree of life and that affected the evolution of a protein
family. We have already demonstrated in a previous work that the problem
of functional annotation is inherent to the ability of uncovering such
paths. Now, we shall extend this approach to large scale genome
annotation by considering 11 different protein databases, constituted by
about 10^9 protein sequences, and by producing a large pool of
diversified probabilistic models coding for about 10^7 evolutionary
protein pathways. Such models will be used to search for specific
domains in genomes to be annotated. Our previous methodology needs to be
fundamentally improved to deal with this large amount of biological
data. In this project, we shall work on the algorithms to reduce the
space of models and the search complexity, and we shall implement some
important algorithmic changes towards the realization of a powerful
integrated annotation tool.
Where: This project is run on the Laboratoire de Biologie
Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics
team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin,
Laboratoire d’Informatique de Paris 6 – Equipe DECISION.
Period: The postdoc will be payed under a contract of Ingénieur de
Recherche lasting 3.5 years and it is available from september 1st,
Contact: Alessandra Carbone at email@example.com
Laboratory of Computational and Quantitative Biology
UMR 7238 Université Pierre et Marie Curie-CNRS
Les Cordeliers, Escalier A 4e étage
15, rue de l’Ecole de Médecine, 75006 Paris