Cybersecurity CVE Trends
Feedly for Cybersecurity includes a special security skill which allows Leo to understand which software vulnerabilities are trending. The logic here is that vulnerabilities that are trending have a higher level of awareness and are more likely to be exploited (and should be patched more quickly).
We offer this trending metadata as a CSV file which includes metadata related to all the CVEs published in the last 18 months. This document describes the fields includes in the CSV file for each CVE.
- date: The Unix timestamps of the first article we crawled in Feedly that mentioned the CVE
- n_articles: The number of articles mentioning the CVE. The articles are weighted by the factor 1/sqrt(number of CVEs mentioned in the article)
- baseline_value: We take the mean+3std of the n_articles values (at the same number of hour as the CVE we are considering) for the CVEs of the main vendor (see below for how we choose the main vendor)
- baseline_confidence: This boolean is at True if we estimate that the baseline has enough data (in terms of number of CVEs it takes to be calculated) to be accurate
- last_update: The Unix timestamp of the last update on that row (if the CVE is less than 1 week old, then it should increase every hour).
The file is updated every hour, around HH:55 (the operation takes about 3 minutes), and, for every CVE from which the first article was published less than a week ago:
The columns n_articles is updated with the new articles mentioning the CVE.
We change the hour of the baseline_value (and we check if the vendors' list of the CVE has been updated).
The baseline_confidence is updated for the vendors that qualify.
The last_update timestamp is set to the current time.
- In which bucket the number of CVEs of the vendor is (bucket limits are: 0, 10, 20, 50, 100, 200, 500, 1000 CVEs). (eg, a vendor with 361 CVEs will be in the 5th bucket, and a vendor with only 12 CVEs will be in the 2nd)
- The baseline value at H168