The ACM Multimedia Conference is a premier conference for multimedia experts and practitioners across academia and industry. The conference is held at the Lotte Hotel in Seoul, Korea from 22-26 Oct 2018. As the co-author of the paper – SLIONS: A Karaoke Application to Enhance Foreign Language Learning, which was accepted for oral presentation, I had the privilege of attending the conference under an SoC grant.

The conference is an extensive program that includes technical sessions covering all aspects of the multimedia field in forms of oral and poster presentations, tutorials, panels, exhibits, demonstrations, videos, doctoral symposium, and workshops, as well as Grand Challenge Competitions and Open Source Software Competitions. The program covers important foundations, as well as challenging and novel topics in the multimedia field. I will be covering a few presentations that I find particularly interesting below.

Context-Aware Unsupervised Text Stylisation

In this work, the researchers presented a novel algorithm for unsupervised text stylisation. The main structural imagery of the source image is first extracted to create an initial mapping, which is then refined by a legibility-preserving structure transfer algorithm that adds shape characteristics to the text. As a result, the framework is able to automatically transfer the text style to a target image with high accuracy, without the need for ideal input as required by supervised methods.

The framework has many real-world applications, including photography post-processing and graphic designing.

Polygon Annotation of fashionable clothings

Recognising different pieces of clothings from an image has been a challenging computer vision problem as they varies greatly in appearance, patterns, layering and position.
Yet, with the proliferation of E-commerce, the ability to detect and classify fashion item has become extremely crucial for functionalities such as visual search and product recommendation.

In this work, the researchers collected a million street fashion photos, filter out unsuitable photos (eg. low quality) and annotate the rest, through application of various deep learning techniques, including Faster R-CNN, ResNet, SSD and YOLO.  This annotated dataset will also be extremely useful for further researches on suitable training models for fashion item detection.

Oral and poster presentation

We presented our paper in our oral and poster presentation session. There were many invaluable insights and suggestions raised by the audience, which will be helpful in deciding our next step forward for the project.


Through the conference, I have gained deeper insights into all aspects of the multimedia field in Computer Science, as well as exposure to the frontier of multimedia research and industrial innovations. Through the interactions with experts and practitioners from all over the world and across academia and industry, I have also developed a more holistic view of Computer Science research. Overall, the conference has enhanced my competencies in presentation and research skills, as well as broadening my horizons on the potential of multimedia research in real world applications.