This is an old revision of the document!
Among the research topics of the Security Group we focus on here on the following activities:
See also our sections on Security Economics and Malware Analysis.
Most importantly, Do you want data? We know that building datasets is difficult, error prone and time consuming so we have decided to share our efforts of the past 4 years. Check our Security Datasets in Trento.
Our new paper in the Empirical Software Engineering Journal we propose an automated method to determine the code evidence for the presence of vulnerabilities in retro software versions.
Why should one bother? After all the code is old code. This is only true if you are thinking about your free browser that is shipped to you in change of your personal data and is updated periodically possibly breaking your extensions. In all other cases, when software is proprietary, it has a actually very long shelf life.
The method scans the code base of each retro version of software for the code evidence to determine whether a retro version is vulnerable or not. It identifies the lines of code that were changed to fix vulnerabilities. If an earlier version contains these deleted lines, it is highly likely that this version is vulnerable. To show the scalability of the method we performed a large scale experiments on Chrome and Firefox (spanning 7,236 vulnerable files and approximately 9,800 vulnerabilities) on the National Vulnerability Database (NVD). The elimination of spurious vulnerability claims (e.g. entries to a vulnerability database such as NVD) found by our method may change the conclusions of studies on the prevalence of foundational vulnerabilities.
Vulnerability exploitation is, reportedly, a major threat to system and software security. Assessing the risk represented by a vulnerability has therefore been at the center of a long debate. Eventually, the security community widely adopted the Common Vulnerability Scoring System (or CVSS in short) as the reference methodology for vulnerability risk assessment. The CVSS is used in reference vulnerability databases such as CERT and NIST's NVD, and is referenced as the standard-de-facto methodology by national and international standards and best-practices for system security (e.g. U.S. Government SCAP Protocol).
Today's baseline is that if you have a vulnerability and its CVSS score is high, you are in trouble and must fix it. But this may not be so realistic…
We are trying to assess to what degree this “baseline” can be reasonable to follow: after all, any CIO of any company big enough to care about security, will tell you “if this was a perfect world, maybe: but you are crazy if you think I'll fix every vulnerability out there, high CVSS or not”. Surely, CIOs and CEOs care about business continuity on top of business security, and to this extent updates can sometimes be more risky to apply than vulnerabilities not to patch.
We are going to present a detailed analysis of how CVSS influences (positively and negatively) your patching policy at the beginning of August at BlackHat USA 2013 in Las Vegas, USA. Want to come? PDF of the BlackHat presentation or the talk video on YouTube . You can also check out the Full paper on ACM TISSEC (Now ACM TOPS) .
Our research gravitates around the question “are really all (high CVSS score) vulnerabilities interesting for the attacker?” With this question we renounce in giving a general answer that would identify every possible attack vector, ending with really not identifying anything in particular (see our CCS BADGERS work): on the contrary, we are seeking for a general law of macro-security that can cover the greatest majority of the risk.
We developed a methodology that allows the organisation to:
For example, our methodology enables CIOs and decision makers to make assessments such as “System K has a Z% likelihood of being successfully attacked. If I fix these V vulnerabilities on the system, my risk of being attacked will decrease by X%”.
Additional information on the methodology can be found in our ACM TISSEC article.
To perform our study we collected five databases of vulnerabilities and exploits.
The Picture on the right is a Venn-Diagram representation of vulnerabilities and CVSS scores in our database. Colours are representative of High, Medium and Low CVSS scores (Red, Orange and Cyan respectively). Areas are proportional to volumes of vulnerabilities. As one can immediately see, NVD is disproportionally big with respect to any other database. Remember that NVD is the database into which, according to the SCAP protocol, are contained all the vulnerabilities you should fix, and SYM is the dataset of actually exploited vulnerabilities. Adjusting by software type and year of the vulnerability does not change the overall figure: NVD is full of un-interesting (or at least not-high risk) vulnerabilities, despite what the CVSS score says.
EDB (or the equivalent OSVDB) is often used as the reference dataset for “actually exploited” vulnerabilities. Many researchers already observed that a vulnerability should represent higher risk if a public exploit for it exist. Still, EDB intersect SYM for only ~4% of its surface: most publicly available exploits aren't used by attackers! And, most interestingly, more than 75% of SYM is not covered by EDB, which decreases the credibility of the latter, as a risk marker for vulnerabilities, by a fair amount.
EKITS is the small square at the intersection between SYM and EDB. It features only about 100 vulnerabilities and still, according to Google, it may drive as much as 60% of the overall attacks against the final users ( see Trends in circumventing web-malware detection (PDF)). EKITS is covered by SYM for 80% of its surface, meaning that if a vulnerability is in the black markets it is, most likely, going to be attacked.
Focusing on the CVSS score distributions, a few facts are worth being underlined:
Overall, these results show, in our opinion, that much room for improvement in vulnerability metrics and risk assessment is possible. Our contribution is rooted in:
For further insights we refer the interested reader to the Malware Analysis and Security Economics sections of this Wiki.
Vulnerability Discovery Model (VDM) operates on known vulnerability data (or observed data) to estimate the cumulative number of vulnerabilities found and reported in released software. A VDM is a function family with some parameters, for example, the linear model (LN) is: LN(t) = At + B
where t
is the time, A, B
are two parameters. These parameters are valued by fitting the model to observed data.
A successful model should not only well fit the observed data, but also be able to predict the future trend of vulnerabilities.
The figure on the right exhibits the taxonomy of recent VDMs. VDMs are categorized into two major categories: time-based models and effort-based models. Most of state-of-the-art models fall into the first category. Only one model is classified into the second category. The time-based models divide into three other subcategories:
The above models were supported with one or more empirical evidences by the proponents, except the Aderson's one. However, there are some concerns in their experiments:
We propose an experimental methodology to systematically assess the performance of a model based on two quantitative metrics: quality and predictability. The methodology includes following steps:
Step 1
Acquire the data sets: collect different data sets of vulnerabilities with respect to different definitions of vulnerability and different versions of software. The assessment of VDM will be established based on these data sets, so it will cover different definitions of vulnerability. By doing this, we address the vulnerability definition and multi-version software concerns.Step 2
Fit the VDM on collected data: estimate the parameters of VDM so that it could fit the collected data as much as possible. The goodness-of-fit of the fitted model can be evaluated by using the chi-square test for goodness-of-fit.Step 3
Perform goodness-of-fit quality analysis: perform an analysis on the quality of VDM in software lifetime. This addresses the brittle goodness-of-fit concern.Step 4
Perform predictability analysis: perform an analysis on the predictability of VDM.Step 5
Compare VDM: compare a VDM with others to determine which one is better.We apply the proposed methodology to evaluate state-of-the-art models. They are evaluated in different usage scenarios such as:
Our analysis has revealed that the most appropriate model is the simplest one (LN) when software is young (12 months). In other cases, s-shape models perform better where AML model is significantly better for middle-age software (36 months), but there is no statistically significant difference among s-shape models when software is old (72 months).
The details of our methodology as well as the validation of existing models are published in TSE. Preliminary analysis on VDM could be found in our ESSoS'12, and ASIACCS'12 papers
The following is a list a people that has been involved in the project at some point in time.
This activity was supported by a number of project