Mathematics (Faculty of)
http://hdl.handle.net/10012/9924
2024-03-28T16:50:17ZCardinality Estimation in Streaming Graph Data Management Systems
http://hdl.handle.net/10012/20366
Cardinality Estimation in Streaming Graph Data Management Systems
Akillioglu, Kerem
Graph processing has become an increasingly popular paradigm for data management
systems. Concurrently, there is a pronounced demand for specialized systems dedicated
to streaming processing that are essential to address the continual flow of data and the
inherent dynamism in streaming data. Yet, the lack of a standardized, general-purpose
query framework specifically for streaming graphs is a notable gap in existing technologies.
This shortfall emphasizes the necessity for a more comprehensive solution for processing
and analyzing streaming graph data efficiently in real time. Enhancing this solution is
crucially dependent on improving the query processing pipeline, especially on cardinality
estimation and query optimization, both of which are key factors in ensuring optimal
system performance.
In this thesis, a novel cardinality estimation technique, called GraphSketch, that
is tailored for streaming graph database management systems (GDBMS) is proposed.
GraphSketch is a sketch-based framework designed to concisely summarize streaming
graphs, enabling both accurate and efficient cardinality estimations. The thesis delves
into the theoretical foundations of GraphSketch, outlining its conceptual design and the
specific methodologies employed in its construction. Additionally, the thesis elaborates
on the suitability of GraphSketch for streaming systems, highlighting its capability for
incremental updates, which are pivotal in maintaining efficiency in the rapidly evolving
environment of streaming data.
2024-02-23T00:00:00ZMS/MS Spectrum Prediction for MHC-Associated Peptides with a Fine-Tuned Model
http://hdl.handle.net/10012/20364
MS/MS Spectrum Prediction for MHC-Associated Peptides with a Fine-Tuned Model
Li, Zhenbo
To improve the quality of spectral library search, several MS/MS spectrum predictors have been developed in the last decades. After success in various fields, deep learning techniques are adopted by MS/MS spectrum predictors to increase the accuracy of predicted spectra. However, the quality and quantity of the training set are both required to train a deep learning model. Due to the less representation of MHC-associated peptides in most spectral libraries, current MS/MS spectrum predictors provide less accurate predicted spectra for MHC-associated peptides than their performance for other peptides.
In this thesis, we built several MHC-associated peptide spectral libraries for training and evaluation purposes. We selected PredFull as our base model and performed transfer learning with these MHC-associated peptide libraries, which are much smaller than com- mon tryptic spectral libraries. The result showed that the fine-tuned model outperformed the original model significantly when predicting MHC-associated peptides.
2024-02-23T00:00:00ZAnalyzing Threats of Large-Scale Machine Learning Systems
http://hdl.handle.net/10012/20355
Analyzing Threats of Large-Scale Machine Learning Systems
Lukas, Nils
Large-scale machine learning systems such as ChatGPT rapidly transform how we interact with and trust digital media. However, the emergence of such a powerful technology faces a dual-use dilemma. While it can have many positive societal impacts in providing equitable access to information, ML systems can also be misused by untrustworthy entities to cause intentional harm. For example, a system could unintentionally disclose private information about its training data and jeopardize the privacy of individuals in the training data. The system's generated content could also be misused for unethical purposes, such as eroding trust in digital media by misrepresenting generating content as authentic. Providing untrustworthy users with these new capabilities could amplify potential negative consequences emerging through this technology, such as a proliferation of deep fakes or disinformation. I analyze these threats from two perspectives: (i) Data leakage, when the model cannot be trusted because it has memorized private information during training, and (ii) Misuse when users cannot be trusted to use the system for its intended purposes. This thesis presents five projects to assess these risks to the privacy and security of ML systems and evaluates the reliability of known countermeasures. To do so, I assess the privacy risks of extracting Personally Identifiable Information from language models trained with differential privacy. As a method of controlling unintended use, I study the effectiveness and robustness of fingerprinting and watermarking methods to detect the provenance of models and their generated content.
2024-02-22T00:00:00ZSafety-Critical Control for Dynamical Systems under Uncertainties
http://hdl.handle.net/10012/20345
Safety-Critical Control for Dynamical Systems under Uncertainties
Wang, Chuanzheng
Control barrier functions (CBFs) and higher-order control barrier functions (HOCBFs) have shown great success in addressing control problems with safety guarantees. These methods usually find the next safe control input by solving an online quadratic programming problem. However, model uncertainty is a big challenge in synthesizing controllers. This may lead to the generation of unsafe control actions, resulting in severe consequences. In this thesis, we discuss safety-critical control problems for systems with different levels of uncertainties. We first study systems modeled by stochastic differential equations (SDEs) driven by Brownian motion. We propose a notion of stochastic control barrier functions (SCBFs) and show that SCBFs can significantly reduce the control efforts, especially in the presence of noise, and can provide a reasonable worst-case safety probability. Based on this less conservative probabilistic estimation for the proposed notion of SCBFs, we further extend the results to handle higher relative degree safety constraints using higher-order SCBFs. We demonstrate that the proposed SCBFs achieve good trade-offs of performance and control efforts, both through theoretical analysis and numerical simulations.
Next, we discuss deterministic systems with imperfect information. We focus on higher relative degree safety constraints and HOCBFs to develop a learning framework to deal with such uncertainty. The proposed method learns the derivatives of a HOCBF and we show that for each order, the derivative of the HOCBF can be separated into the nominal derivative of the HOCBF and some remainders. This implies that we can use a neural network to learn the remainders so that we can approximate the real residual dynamics of the HOCBF. Next, we study stochastic systems with unknown diffusion terms. We propose a data-driven method to handle the case where we cannot calculate the generator of the stochastic barrier functions. We provide guarantees that the data-driven method can approximate the It\^{o} derivative of the stochastic control barrier function (SCBF) under partially unknown dynamics using the universal approximation theorem.
Finally, we study completely unknown stochastic systems. We extend our assumption into the case where we do not know either the drift or the diffusion term of SDEs. We employ Bayesian inference as a data-driven approach to approximate the system. To be more specific, we utilize Bayesian linear regression along with the central limit theorem to estimate the drift term. Additionally, we employ Bayesian inference to approximate the diffusion term. We also validate our theoretical results using numerical examples in each chapter.
2024-02-15T00:00:00Z