Generative AI at the Intersection of Vision and Language: Advances in Synthesis, Editing, and Multimodal Understanding

Location: South Campus, VKYM - tbd
Time: December 20, 11:00-12:00
 
Abstract:
Generative AI is reshaping how we synthesize and edit visual content while bridging the gap between vision and language. In this talk, I will present our recent contributions that explore this intersection through advanced models and benchmarks. These includes a method that incorporates text-conditioned adapter layers into pretrained GAN inversion networks, enabling precise and intuitive text-driven editing of real images. Building on this, our work on text-based neural video manipulation disentangles content and motion, allowing for coherent and semantically meaningful video edits. Expanding beyond text, our SonicDiffusion framework introduces audio-driven image generation and editing by transforming audio features into representations compatible with diffusion models.To complement these efforts, we have also created robust benchmarks for evaluating multimodal systems, assessing their linguistic and temporal grounding, or compositional generalization and reasoning abilities. Collectively, these efforts advance the capabilities of generative models, fostering more intuitive and flexible multimodal AI systems.
 
Biography:
Aykut Erdem is an Associate Professor in the Department of Computer Engineering at Koç University, Istanbul, and is affiliated with the KUIS AI Center. He earned his Ph.D. from Middle East Technical University (METU) in Ankara. Prior to joining Koç University, he was a faculty member in the Computer Engineering Department at Hacettepe University, where he co-directed the Computer Vision Lab. His research focuses on developing methods to better understand, interpret, and manipulate visual data. In recognition of his contributions, he received the Young Scientist Award (BAGEP) from the Science Academy in 2021 and was recently awarded funding from the TÜBİTAK 2247-A National Outstanding Researchers Program. He also serves as an Associate Editor for the IEEE Transactions on Image Processing.

 

Friday, December 20, 2024 - 11:00
Fotoğraflar: