Publications | Jakub Chłędowski

Jungkyu Park, Yoel Shoshan, Robert Martí, Pablo Gómez del Campo, Vadim Ratner, Daniel Khapun, Aviad Zlotnick, Ella Barkan, Flora Gilboa-Solomon, Jakub Chłędowski, Jan Witowski, Alexandra Millet, Eric Kim, Alana Lewin, Kristine Pysarenko, Sardius Chen, Julia Goldberg, Shalin Patel, Anastasia Plaunova, Melanie Wegener, Stacey Wolfson, Jiyon Lee, Sana Hava, Sindhoora Murthy, Linda Du, Sushma Gaddam, Ujas Parikh, Laura Heacock, Linda Moy, Beatriu Reig, Michal Rosen-Zvi, Krzysztof J. Geras

July 2021 In Nature Machine Intelligence

Lessons from the first DBTex Challenge

A new international competition aims to speed up the development of AI models that can assist radiologists in detecting suspicious lesions from hundreds of millions of pixels in 3D mammograms. The top three winning teams compare notes.

Jakub Chłędowski, Adam Polak, Bartosz Szabucki, Konrad Żołna

July 2021 In ICML

Robust Learning-Augmented Caching: An Experimental Study

Effective caching is crucial for performance of modern-day computing systems. A key optimization problem arising in caching–which item to evict to make room for a new item–cannot be optimally solved without knowing the future. There are many classical approximation algorithms for this problem, but more recently researchers started to successfully apply machine learning to decide what to evict by discovering implicit input patterns and predicting the future. While machine learning typically does not provide any worst-case guarantees, the new field of learning-augmented algorithms proposes solutions which leverage classical online caching algorithms to make the machine-learned predictors robust. We are the first to comprehensively evaluate these learning-augmented algorithms on real-world caching datasets and state-of-the-art machine-learned predictors. We show that a straightforward method–blindly following either a predictor or a classical robust algorithm, and switching whenever one becomes worse than the other–has only a low overhead over a well-performing predictor, while competing with classical methods when the coupled predictor fails, thus providing a cheap worst-case insurance.

Kangning Liu, Yiqiu Shen, Nan Wu, Jakub Chłędowski, Carlos Fernandez-Granda, Krzysztof J. Geras

February 2021 In MIDL

Weakly-supervised High-resolution Segmentation of Mammography Images for Breast Cancer Diagnosis

In the last few years, deep learning classifiers have shown promising results in image-based medical diagnosis. However, interpreting the outputs of these models remains a challenge. In cancer diagnosis, interpretability can be achieved by localizing the region of the input image responsible for the output, i.e. the location of a lesion. Alternatively, segmentation or detection models can be trained with pixel-wise annotations indicating the locations of malignant lesions. Unfortunately, acquiring such labels is labor-intensive and requires medical expertise. To overcome this difficulty, weakly-supervised localization can be utilized. These methods allow neural network classifiers to output saliency maps highlighting the regions of the input most relevant to the classification task (e.g. malignant lesions in mammograms) using only image-level labels (e.g. whether the patient has cancer or not) during training. When applied to high-resolution images, existing methods produce low-resolution saliency maps. This is problematic in applications in which suspicious lesions are small in relation to the image size. In this work, we introduce a novel neural network architecture to perform weakly-supervised segmentation of high-resolution images. The proposed model selects regions of interest via coarse-level localization, and then performs fine-grained segmentation of those regions. We apply this model to breast cancer diagnosis with screening mammography, and validate it on a large clinically-realistic dataset. Measured by Dice similarity score, our approach outperforms existing methods by a large margin in terms of localization performance of benign and malignant lesions, relatively improving the performance by 39.6% and 20.0%, respectively. Code and the weights of some of the models are available at https://github.com/nyukat/GLAM

Tomasz Dwojak, Michał Pietruszka, Łukasz Borchmann, Jakub Chłędowski, Filip Graliński

November 2020 In CONLL

From Dataset Recycling to Multi-Property Extraction and Beyond

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor’s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.