DALS 2022

About

Data Scientists play a fundamental role in analyzing biological, genomics, and health care data, in particular, to pave the ground to personalized medicine. Indeed, such data need to be properly managed, integrated, and analyzed employing statistical inference tools as well as machine learning, data mining, and deep learning methods. Indeed, an increasing number of data scientists are actively working with bio-data, with many different goals, including patient stratification, personalized medicine, drug, and therapy design. Furthermore, analyzing and mining the public large biological and clinical databases (e.g., ENCODE, CCLE, GDC, and MIMIC) have already proven to be paramount for knowledge discovery. Thus, the Data Science community may benefit from learning and sharing the various approaches that have been developed to deal with biological and clinical data.

The workshop will gather researchers with expertise in data management and analysis, machine learning, and knowledge discovery applied to bioinformatics, healthcare, and life science problems. This workshop aims to share current cutting-edge data science methodologies and their applications. It will be an opportunity for researchers to meet and share their ideas on improving data-driven personalized medicine, genetic data management, and health care system advancement.

Topics of interest

The topics of interest include, but are not limited to:

Visualization and exploratory analysis methods for genomic and health care data
Data analysis of genomics and molecular biology data
Knowledge discovery from biological and clinical databases
Analysis of electronic health records (EHR)
Data analysis of heterogeneous health information systems and databases
Natural language processing methods applied to genomics and healthcare data
Tracking and visualization of the viral sequences
Machine learning methods applied to viral sequences
Deep learning methods for bioinformatics and biomedical images
Data pre-processing approaches for single-cell data
Single-cell data analysis
Data integration practices for precision medicine
Network-based approaches for genomic data analysis

We invite contributions from both industry and academia to share their research and experience in using data science, machine learning and database knowledge practices with biological and clinical data.

Program

DALS 2022 will take place Monday, September the 19th starting at 9:00AM (CET). The program of the workshop follows:

9:00-9:15, Welcome and Introduction

9:15-10:15, Keynote by Anna Bernasconi: Data analysis for unveiling the SARS-CoV-2 evolution
Abstract: The COVID-19 epidemic has brought enormous attention to the genetics of viral infection and the corresponding disease. In this seminar, I will provide a viral genomic primer. Then, I will discuss the potential of big data in this domain, especially when millions of SARS-CoV-2 sequences are available on open databases. I will present a collection of current analysis problems, focusing on viral evolution, monitoring of variants, and the categorization of their effects. Finally, I will hint at open problems that should attract the interest of data scientists.
Biography: Anna Bernasconi is a postdoctoral researcher with the Dipartimento di Elettronica, Informazione e Bioingegneria at Politecnico di Milano and a visiting researcher at Universitat Politècnica de València. Her research areas are Bioinformatics, Databases, and Data Science, where she applies conceptual modeling, data integration, and semantic web technologies to biological and genomic data. Starting from a Ph.D. thesis on the modeling and integration of data and metadata of human genomic datasets, she has then extended her expertise to the fastly growing field of viral genomics, particularly relevant since the COVID-19 pandemic outbreak. She is active in the conceptual modeling and database communities, with several paper presentations and the organization of tutorials and workshops.

10:15-10:45, Paper presentation: I-CONVEX: Fast and Accurate de Novo Transcriptome Recovery from Long Reads.

10:45-11:15, Paper presentation: Italian debate on measles vaccination: how Twitter data highlight communities and polarity.

Chairs

Arif Canakoglu, Dipartimento di Anestesia, Rianimazione ed Emergenza-Urgenza,
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
arif.canakoglu@policlinico.mi.it

Arif Canakoglu currently works as a data scientist at Policlinico di Milano; and he works mainly on the electronic health record of the intensive care unit's patients in the Lombardy region. He is leading the research with the support of the medical group, analyzing the life quality of the patients after the hospital discharge. Previously, he was involved in the "Data-driven Genomic Computing" ERC Awarded project (2016-2021), where he contributed to developing integration of heterogeneous genomic data and for developing computational methods for genomic applications. In 2016, he received his PhD on biomolecular knowledge data integration by using the modular schema data warehouse. His research interests include data integration and data driven genomic computing, big data analysis and processing on cloud computing, artificial intelligence applications. His main areas of expertise are heterogeneous data integration and data driven models and machine learning approaches in genomic, and big data processes, especially on cloud computing.

Gaia Ceddia, Life Sciences, Integrative Computational Network Biology
Barcelona Supercomputing Center (BSC), Barcelona, Spain
gaia.ceddia@bsc.es

Gaia Ceddia is a Recognised Researcher at Barcelona Supercomputing Center. She is currently working for the Integrative Computational Network Biology group directed by Prof. Nataša Pržulj. She received his PhD cum laude in 2021, with a thesis titled “Computational Methods for Data-driven Predictions and Understanding of Biological Interactions” at Politecnico di Milano. Her research interests include the design of novel network science and machine learning algorithms to extract new biomedical information and uncover molecular mechanisms of disease from biological data.

Sara Pido, Laboratory for Information and Decision Systems (LIDS)
Massachussets Institute of Technology, US Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di Milano, Italy
sarapid@mit.edu

Sara Pido is a Ph.D. candidate at Politecnico di Milano in Data Analytics and Decision Sciences supervised by the professor Stefano Ceri. She is currently a visiting student at Laboratory of Information and Decision Systems, Massachusetts Institute of Technology, under the supervision of Kalyan Veeramachaneni. She obtained her master degree at Politecnico di Milano under the supervision of the professor Marco Masseroli with a thesis on the analysis of gene expression data through the use of complex networks. She is currently working on drug repurposing techniques and on automatic data science methods.

Pietro Pinoli, Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di Milano, Italy
pietro.pinoli@polimi.it

Pietro Pinoli works as Researcher Fellow and lecturer at the Department of Electronics, Information and Bioengineering at the Politecnico di Milano (Italy). He received his PhD cum laude in 2017, with a thesis titled “Modeling and Querying Genomic Data” where he proposed and benchmarked data structures and algorithms to manage, search and elaborate huge collections of genomic datasets, by means of cloud and distributed technologies. He has been visiting PhD candidate at Harvard University (Cambridge, MA, US). His research interests include bioinformatics and computational biology, data bases and data management, big data technology and algorithms, machine learning and natural language processing, and drug repurposing. He participated in the Italian PRIN GenData, ERC GeCo and EIT VirusLab projects. In recent years he delivered talks at IBM Research in Almaden, Broad Institute of Boston, University of Trento, University of Lausanne, IBM Research Zurich. He co-organized tutorials sections at ER and EDBT, workshop at ICWE and he served as guest editor for MDPI Biotech and BMC Supplements.

The 1^st International Workshop
on Data Analysis in Life Science

19-23 September, 2022 in Grenoble, France

Online workshop

In conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

About

Topics of interest

Program

Paper submission guidelines

Post-conference publication

Important dates

Chairs

Program Committee