Publications

Learning Document Graphs with Attention for Image Manipulation Detection

Published in ICPRAI International Conference on Pattern Recognition and Artificial Intelligence, 2022

Detecting manipulations in images is becoming increasingly important for combating misinformation and forgery. While recent advances in computer vision have lead to improved methods for detecting spliced images, most state-of-the-art methods fail when applied to images containing mostly text, such as images of documents. We propose a deep-learning method for detecting manipulations in images of documents which leverages the unique structured nature of these images in comparison with those of natural scenes. Specifically, we re-frame the classic image splice detection problem as a node classification problem, in which Optical Character Recognition (OCR) bounding boxes form nodes and edges are added according to an text-specific distance heuristic. We propose a system composed of a Variational Autoencoder (VAE)-based embedding algorithm and a graph neural network with attention, trained end-to-end for robust manipulation detection. Our proposed model outperforms both a state-of-the-art image splice detection method and a document-specific method.

Recommended citation: Hailey James, Otkrist Gupta, and Dan Raviv. "Learning Document Graphs with Attention for Image Manipulation Detection." ICPRAI 2022 https://link.springer.com/chapter/10.1007/978-3-031-09037-0_22

Printing and Scanning Attack for Image Counter Forensics

Published in EURASIP Journal on Image and Video Processing, 2022

Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can successfully identify manipulations, they are also vulnerable to adversarial attacks, ranging from simple double JPEG compression to advanced pixel-based perturbation. In this paper we explore another method of highly plausible attack: printing and scanning. We demonstrate the vulnerability of two state-of-the-art models to this type of attack. We also propose a new machine learning model that performs comparably to these state-of-the-art models when trained and validated on printed and scanned images. Of the three models, our proposed model outperforms the others when trained and validated on images from a single printer. To facilitate this exploration, we create a dataset of over 6,000 printed and scanned image blocks. Further analysis suggests that variation between images produced from different printers is significant, large enough that good validation accuracy on images from one printer does not imply similar validation accuracy on identical images from a different printer.Download paper here

Recommended citation: Hailey James, Otkrist Gupta, and Dan Raviv. "Printing and Scanning Attack for Image Counter Forensics." J Image Video Proc. 2022, 2 (2022). https://doi.org/10.1186/s13640-022-00579-5 https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-022-00579-5

Probabilistic Bias Mitigation in Word Embeddings

Published in NeurIPS Workshop on Human-Centered Machine Learning, 2019

It has been shown that word embeddings derived from large corpora tend to incorporate biases present in their training data. Various methods for mitigating these biases have been proposed, but recent work has demonstrated that these methods hide but fail to truly remove the biases, which can still be observed in word nearest-neighbor statistics. In this work we propose a probabilistic view of word embedding bias. We leverage this framework to present a novel method for mitigating bias which relies on probabilistic observations to yield a more robust bias mitigation algorithm. We demonstrate that this method effectively reduces bias according to three separate measures of bias while maintaining embedding quality across various popular benchmark semantic tasks. Download paper here

Recommended citation: Hailey James, David Alvarez-Melis. "Probabilistic Bias Mitigation in Word Embeddings." Neurips Workshop on Human-Centered Machine Learning (2019). https://arxiv.org/abs/1910.14497