Internships
by Mathieu Cyr & Guillaume Freyermuth
11/07/2024
DiverSE Coffee
Rennes, France
Abstract
Detecting Anomalous Behaviors in Cyber Systems using Generative AI and Large Language Models, Mathieu Cyr
The primary objective of my project is to develop a system for detecting and characterizing abnormal behaviors in cyber systems through the analysis of execution traces. These traces, which include system calls, memory usage, and network packets, are treated as semi-structured data. The innovative use of LLMs, known for their proficiency in processing and synthesizing complex textual data, promises significant advancements in identifying errors, bugs, malware, and potential cyber-attacks. In collaboration with the National Information Systems Security Agency (ANSSI), this research involves conducting a thorough literature review, creating a test environment with realistic cyber scenarios, and experimenting with LLMs to achieve preliminary results. The ultimate goal is to design an automated supervisory program that provides cybersecurity experts with actionable insights and enhances defensive strategies against cyber threats.
Information Extraction from Scientific Publications using LLM, Guillaume Freyermuth
Open Science involves practices that aim to provide transparency and supplies to strengthen the confidence of the results. Open science practices can come in many forms, e.g.: Through statements: revealing conflicts of interest, sources of financing, … Providing materials: data used for the study, a statistical analysis plan, … Following methodology or guidelines that guarantee reproducibility, consistency, … The way to announce a study follows some of these practices is not standardised, making it difficult to retrieve this information with standard parsing methods. In addition, a same statement can have different meanings depending on the context of the study. That’s why came the idea of using an LLM to retrieve such information from the text.
The goal of my internship is to evaluate how well can LLM retrieve such information from a text, which techniques can be used in combination with the LLM and how they affect the performance.