eScience Research Network

Modern science is interdisciplinary and data-intensive. For instance, in the 1000 Genomes Project (www.1000genomes.org), the comparative study of 629 individuals has already generated 7.3 TB of data. Analogous situations exist in fields such as astronomy, agriculture, social sciences, etc. Ten years ago, the problem was how to obtain data. Today, the bottleneck is the need for new computational strategies and tools so that scientists can manage these massive volumes of heterogeneous, distributed, data, so that they can generate new knowledge from the processing, analysis and visualization of the data. This launched the basis of the so-called eScience: the combination of advanced research in computer science and mathematical modeling to allow and accelerate research in other knowledge domains. National programs in eScience have been created in the US, GB, Australia and other countries, that recognized the importance of this theme for the advancement of science. The main goal of this project is the design and construction of a collaborative network for research in eScience, in a partnership that involves computer science, mathematical modeling and specific domains in the exact, life, agricultural sciences and social sciences.

General objectives:

  • Development of basic research on computer science, mathmatics, statistics and on the target domain associated to each collaboration (e.g. biology and medicine). It is expected that original relevant results will be obtained and published both on computer science and on the taret domain.
  • Establishing the e-Science collaborative network at USP with the purpose of developing statistical and computational tools for operational modeling of biological functions and other complex systems from systematical measures obtained through experiments performed in specific domains (genomics, imaging, etc.)
  • Setting up the e-Science infrastructure in the associated laboratories – The infrastructure for data acquisition and treatment will be organized as a network. This infrastructure will be composed by hardware and software (mostly developed at the network) in order to support the main research steps: data capture, storage, maintenance, analysis and visualization. Data will be stored in databases distributed and integrated to other bases available at Internet. Computational statistical and mathematical analysis tools and data visualization techniques will be developed in order to help domain specialists in data interpretation.
  • Human Resources Formation in e-Science – The proponent group is already involved in courses related to this issue, such as the USP Inter-units Graduate Programs in Bioinformatics and in Biotechnology, besides the specific graduate and undergraduate courses in related disciplines (statistics, computational science, genomics, etc). It is expected that project’s activities will contribute to e-Science divulgation and to the formation of new researchers and professionals acting in this area.
  • Technology Generation and Transference & Knowledge Diffusion – The main Network’ e-Science products transferable for external users are: i) biological systems open-source software development, mainly in partnership with the CCSL-USP (Centro de Competência de Software Livre da USP); ii) development on-line tools for storage, transference, classification, mining and data analysis; iii) on-line tools for e-learning. Two Network units – IME and CPB – have large experience regarding patent elaboration and licensing processes as well as technology transference for the productive sector.
  • Increasing knowledge diffusion in e-Science in the country and the visibility of Brazilian research in e-Science abroad - Knowledge diffusion will be accomplished through publications and e-Science tools developed by the network. Two research meetings on e-Science will be held along the project development.

Specific objectives related to the components of an e-Science environment:

  • Development of new mathematical models and algorithms for analysis and visualization of scientific data.
  • Advances in the scientific questions addressed in the target domains.
  • Data integration and data mining in multiple spatial and temporal scales, combining with textual data and annotation is made by specialists.
  • Algorithm development for data recovering, summarization, classification and mining associated to two specific topics: temporal series processing (from sensor data or from imaging extracted data) and pattern recognition in biological and biomedical imaging.
  • Development of visualization and interaction tools for knowledge discovery at target domains.