The 1st International Workshop
on Data Analysis in Life Science



19-23 September, 2022 in Grenoble, France

Online workshop


In conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

About

Data Scientists play a fundamental role in analyzing biological, genomics, and health care data, in particular, to pave the ground to personalized medicine. Indeed, such data need to be properly managed, integrated, and analyzed employing statistical inference tools as well as machine learning, data mining, and deep learning methods. Indeed, an increasing number of data scientists are actively working with bio-data, with many different goals, including patient stratification, personalized medicine, drug, and therapy design. Furthermore, analyzing and mining the public large biological and clinical databases (e.g., ENCODE, CCLE, GDC, and MIMIC) have already proven to be paramount for knowledge discovery. Thus, the Data Science community may benefit from learning and sharing the various approaches that have been developed to deal with biological and clinical data.

The workshop will gather researchers with expertise in data management and analysis, machine learning, and knowledge discovery applied to bioinformatics, healthcare, and life science problems. This workshop aims to share current cutting-edge data science methodologies and their applications. It will be an opportunity for researchers to meet and share their ideas on improving data-driven personalized medicine, genetic data management, and health care system advancement.

Topics of interest

The topics of interest include, but are not limited to:

  • Visualization and exploratory analysis methods for genomic and health care data
  • Data analysis of genomics and molecular biology data
  • Knowledge discovery from biological and clinical databases
  • Analysis of electronic health records (EHR)
  • Data analysis of heterogeneous health information systems and databases
  • Natural language processing methods applied to genomics and healthcare data
  • Tracking and visualization of the viral sequences
  • Machine learning methods applied to viral sequences
  • Deep learning methods for bioinformatics and biomedical images
  • Data pre-processing approaches for single-cell data
  • Single-cell data analysis
  • Data integration practices for precision medicine
  • Network-based approaches for genomic data analysis

We invite contributions from both industry and academia to share their research and experience in using data science, machine learning and database knowledge practices with biological and clinical data.

Program

DALS 2022 will take place Monday, September the 19th starting at 9:00AM (CET). The program of the workshop follows:

9:00-9:15, Welcome and Introduction

9:15-10:15, Keynote by Anna Bernasconi: Data analysis for unveiling the SARS-CoV-2 evolution
Abstract: The COVID-19 epidemic has brought enormous attention to the genetics of viral infection and the corresponding disease. In this seminar, I will provide a viral genomic primer. Then, I will discuss the potential of big data in this domain, especially when millions of SARS-CoV-2 sequences are available on open databases. I will present a collection of current analysis problems, focusing on viral evolution, monitoring of variants, and the categorization of their effects. Finally, I will hint at open problems that should attract the interest of data scientists.
Biography: Anna Bernasconi is a postdoctoral researcher with the Dipartimento di Elettronica, Informazione e Bioingegneria at Politecnico di Milano and a visiting researcher at Universitat Politècnica de València. Her research areas are Bioinformatics, Databases, and Data Science, where she applies conceptual modeling, data integration, and semantic web technologies to biological and genomic data. Starting from a Ph.D. thesis on the modeling and integration of data and metadata of human genomic datasets, she has then extended her expertise to the fastly growing field of viral genomics, particularly relevant since the COVID-19 pandemic outbreak. She is active in the conceptual modeling and database communities, with several paper presentations and the organization of tutorials and workshops.

10:15-10:45, Paper presentation: I-CONVEX: Fast and Accurate de Novo Transcriptome Recovery from Long Reads.

10:45-11:15, Paper presentation: Italian debate on measles vaccination: how Twitter data highlight communities and polarity.

Paper submission guidelines

Papers must be written in English and formatted according to the Springer LNCS guidelines.
Authors can submit their manuscript using the EasyChair platforms..
Submissions may not exceed 12 pages in PDF format for full papers, respectively 6 pages for short or demo papers, including figures and references. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or journal. Work in progress is welcome, but preliminary results should be made available as a proof of concept. Submissions only consisting of a proposal will be rejected.
At least one author should commit to present their work at the workshop.

Post-conference publication

Accepted papers will be published by Springer as joint proceedings of several ECML PKDD workshops. We are working towards a preliminary agreement with BMC Bioinformatics journal (2-year Impact Factor: 3.169) for a post-conference supplement related to Data modeling, Processing and Analysis for Life Sciences. If the agreement is reached, all the papers accepted to our workshop will be invited to submit a revised and extended version to the journal supplement.

Important dates

The following deadlines are in AoE time zone (UTC – 12).

  • Paper submission: 31 August 2022 20 June 2022
  • Notification: 10 September 2022 13 July 2022
  • ECML PKDD Conference dates: September 17-20, 2022

Chairs

Arif Canakoglu, Dipartimento di Anestesia, Rianimazione ed Emergenza-Urgenza,
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
arif.canakoglu@policlinico.mi.it

Arif Canakoglu currently works as a data scientist at Policlinico di Milano; and he works mainly on the electronic health record of the intensive care unit's patients in the Lombardy region. He is leading the research with the support of the medical group, analyzing the life quality of the patients after the hospital discharge. Previously, he was involved in the "Data-driven Genomic Computing" ERC Awarded project (2016-2021), where he contributed to developing integration of heterogeneous genomic data and for developing computational methods for genomic applications. In 2016, he received his PhD on biomolecular knowledge data integration by using the modular schema data warehouse. His research interests include data integration and data driven genomic computing, big data analysis and processing on cloud computing, artificial intelligence applications. His main areas of expertise are heterogeneous data integration and data driven models and machine learning approaches in genomic, and big data processes, especially on cloud computing.



Gaia Ceddia, Life Sciences, Integrative Computational Network Biology
Barcelona Supercomputing Center (BSC), Barcelona, Spain
gaia.ceddia@bsc.es

Gaia Ceddia is a Recognised Researcher at Barcelona Supercomputing Center. She is currently working for the Integrative Computational Network Biology group directed by Prof. Nataša Pržulj. She received his PhD cum laude in 2021, with a thesis titled “Computational Methods for Data-driven Predictions and Understanding of Biological Interactions” at Politecnico di Milano. Her research interests include the design of novel network science and machine learning algorithms to extract new biomedical information and uncover molecular mechanisms of disease from biological data.



Sara Pido, Laboratory for Information and Decision Systems (LIDS)
Massachussets Institute of Technology, US Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di Milano, Italy
sarapid@mit.edu

Sara Pido is a Ph.D. candidate at Politecnico di Milano in Data Analytics and Decision Sciences supervised by the professor Stefano Ceri. She is currently a visiting student at Laboratory of Information and Decision Systems, Massachusetts Institute of Technology, under the supervision of Kalyan Veeramachaneni. She obtained her master degree at Politecnico di Milano under the supervision of the professor Marco Masseroli with a thesis on the analysis of gene expression data through the use of complex networks. She is currently working on drug repurposing techniques and on automatic data science methods.



Pietro Pinoli, Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di Milano, Italy
pietro.pinoli@polimi.it

Pietro Pinoli works as Researcher Fellow and lecturer at the Department of Electronics, Information and Bioengineering at the Politecnico di Milano (Italy). He received his PhD cum laude in 2017, with a thesis titled “Modeling and Querying Genomic Data” where he proposed and benchmarked data structures and algorithms to manage, search and elaborate huge collections of genomic datasets, by means of cloud and distributed technologies. He has been visiting PhD candidate at Harvard University (Cambridge, MA, US). His research interests include bioinformatics and computational biology, data bases and data management, big data technology and algorithms, machine learning and natural language processing, and drug repurposing. He participated in the Italian PRIN GenData, ERC GeCo and EIT VirusLab projects. In recent years he delivered talks at IBM Research in Almaden, Broad Institute of Boston, University of Trento, University of Lausanne, IBM Research Zurich. He co-organized tutorials sections at ER and EDBT, workshop at ICWE and he served as guest editor for MDPI Biotech and BMC Supplements.



Program Committee

TBD