Following the recent incident related to Jan version 0.4.4 triggering Bitdefender on Windows with Gen:Variant.Tedy.258323 on January 10, 2024, we wanted to provide a comprehensive postmortem and outline the necessary follow-up actions.
Incident Overview
Bug Description
Jan 0.4.4 installation on Windows triggered Bitdefender to flag it as infected with Gen:Variant.Tedy.258323, leading to automatic quarantine.
Affected Antivirus
- McAfee / Microsoft Defender was unaffected
- Bitdefender consistently flagged the issue.
Incident Timeline
- 10 Jan, 2:18 am SGT: Hawke flags up Malware antivirus errors for 0.4.4 installation on Windows computers.
- 10 Jan, 2:21 am SGT: @0xSage responds in Discord.
- 10 Jan, 2:35 am SGT: Hawke confirms multiple people have experienced this error on fresh installs.
- 10 Jan, 2:41 am SGT: @louis-jan and @dan-jan revert 0.4.4 out of an abundance of caution.
- Incident ongoing: To triage and investigate the next day.
- 10 Jan, 11:36 am SGT: @Hien has investigated all versions of Nitro and conducted scans using Bitdefender. Only the 2 latest versions raised warnings (0.2.7, 0.2.8).
- 10 Jan, 12:44 pm SGT: @Hien tested again for the 0.2.6 and suggested using 0.2.6 for now, the 2 remaining Nitro version (0.2.7, 0.2.8) will under further investigation.
- The team started testing on the fixed build.
- 10 Jan, 3:22 pm SGT: Diagnosis found that it's most likely a false positive. @Hien has only found a solution by attempting to build Nitro Windows CPU on a GitHub-hosted runner and hasn't identified the root cause yet.
- 10 Jan, 5:24 pm SGT: @Hien testing two scenarios and still trying to understand the workings of Bitdefender.
- 11 Jan, 5:46 pm SGT: Postmortem meeting
Investigation Update
- @Hien has investigated all versions of Nitro and conducted scans using Bitdefender. and only the 2 latest versions raised warnings from Bitdefender. Nitro 0.2.6, which is the highest version without the issue, was tested again, and it no longer triggers a warning from Bitdefender.
- We have observed that Nitro versions up to 0.2.6 remain unaffected. However, Bitdefender flags versions 0.2.7 and 0.2.8 as infected, leading to the deletion. In order to proceed with the current release, Hien suggests downgrading Nitro to version 0.2.6 and conducting tests with this version. Simultaneously, he will investigate why Bitdefender is flagging versions 0.2.7 and 0.2.8.
- It's essential to note that between versions 0.2.6, 0.2.7, and 0.2.8, only minor changes were made, which should not trigger a malicious code warning. We can refer to the changelog between 0.2.7 and 0.2.8 to pinpoint these changes.
- Our primary message is to convey that we did not introduce malicious code into Jan (indicating a false positive), and the investigation aims to understand the root cause behind Bitdefender flagging versions 0.2.7 and 0.2.8.
- The current diagnosis looks like a false positive but it's still under investigation. Reference link: here, here, and here.
- @Hien testing two scenarios and still trying to understand the workings of Bitdefender. Still under investigation: is the issue with the code or the CI?
- In Case 1, using the same CI agent for tags 0.2.6 and 0.2.8, after PRs by Alan and myself, Bitdefender flagged the Nitro CPU binary build. Naturally, one would conclude this is due to the code.
- However, I proceeded with a further experiment: for the 0.2.8 code, instead of using our CI agent, I used a GitHub hosted agent. This time, Bitdefender did not flag our binary build.
- We've identified the Bitdefender warning was not an attack. There is no malicious code
- We've isolated the event to originate from a CI agent, which resulted in a BitDefender false positive alert.
Follow-ups and Action Items
-
Reproduce Bitdefender Flag in Controlled Environment [Done]:
- Objective: To replicate the issue in a controlled environment to understand the triggers and specifics of Bitdefender's detection.
-
Investigate Malicious Code or False Positive:
- Objective: Determine whether the flagged issue is a result of actual malicious code or a false positive. If it's a false positive, work towards resolution while communicating with Bitdefender.
-
Supply Chain Attack Assessment:
- Objective: Evaluate the possibility of a supply chain attack. Investigate whether the Nitro 0.4.4 distribution was compromised or tampered with during the release process.
-
Testing after the Hotfix:
- Objective: In addition to verifying the issue after the fix, it is essential to conduct comprehensive testing across related areas, ensuring compatibility across different operating systems and antivirus software (latest version / free version only).
-
Process Improvement for Future Releases:
- Objective: Identify and implement improvements to our release process to prevent similar incidents in the future. This may include enhanced testing procedures, code analysis, and collaboration with antivirus software providers during the pre-release phase. Additionally, we should add verifying the latest antivirus software in the release checklist.
-
Documentation of Tested Antivirus Versions:
- Objective: Create a document that outlines the testing conducted, including a matrix that correlates Jan versions with the tested antivirus versions.
- Sample list: for consideration purpose
- Bitdefender
- McAfee
- Avira
- Kaspersky
- Norton
- Microsoft defender
- AVG
- TotalAV
Next Steps
- The team should follow up on each action item with clear ownership priority, and deadlines.
- Communicate progress transparently with the community and clients through appropriate channels. If any insights or suggestions, share them within the dedicated channels.
- Update internal documentation and procedures based on the lessons learned from this incident.
Lessons Learned
-
Antivirus Compatibility Awareness:
- Observation: The incident underscored the significance of recognizing and testing for antivirus compatibility, particularly with widely-used solutions like Bitdefender.
- Lesson Learned: In the future, we will integrate comprehensive checks for compatibility with various antivirus software, including both antivirus and "Malicious Code Detection," into our CI or QA checklist. This proactive measure aims to minimize false positive detections during the release and testing processes.
-
Cross-Platform Testing:
- Observation: The problem did not occur on MacOS and Linux systems, implying a potential oversight in cross-platform testing during our release procedures.
- Lesson Learned: Clarification — This observation is not directly related to antivirus testing. Instead, it underscores the necessity to improve our testing protocols, encompassing multiple operating systems. This ensures a thorough evaluation of potential issues on diverse platforms, considering the various antivirus software and differences in architectures on Mac and Linux systems.
-
User Communication and Documentation:
- Observation: Due to the timely response from Nicole, who was still active on Discord and Github at 2 am, this quick response facilitated our ability to assess the impact accurately.
- Lesson Learned: While our communication with users was effective in this instance, it was mainly due to Nicole's presence during the incident. To improve our overall response capability, we should prioritize "24/7 rapid triage and response." This involves ensuring continuous availability or establishing a reliable rotation of team members for swift user communication and issue documentation, further enhancing our incident response efficiency.
-
Proactive Incident Response:
- Observation: The incident response, while involving a prompt version rollback, experienced a slight delay due to the release occurring at midnight. This delay postponed the initiation of the investigation until the next working hours.
- Lesson Learned: Recognizing the importance of swift incident response, particularly in time-sensitive situations, we acknowledge that releasing updates during off-hours can impact the immediacy of our actions. Moving forward, we will strive to optimize our release schedules to minimize delays and ensure that investigations can commence promptly regardless of the time of day. This may involve considering alternative release windows or implementing automated responses to critical incidents, ensuring a more proactive and timely resolution.
-
Supply Chain Security Measures:
- Observation: While the incident prompted consideration of a potential supply chain attack, it's crucial to emphasize that this was not the case. Nonetheless, the incident underscored the importance of reviewing our supply chain security measures.
- Lesson Learned: Going forward, we should strengthen supply chain security by introducing additional verification steps to uphold the integrity of our release process. Collaborating with distribution channels is essential for enhancing security checks and ensuring a robust supply chain.
- Longer-term: Exploring options for checking Jan for malicious code and incorporating antivirus as part of our CI/CD pipeline should be considered for a more comprehensive and proactive approach.
-
User Education on False Positives:
- Observation: Users reported Bitdefender automatically "disinfecting" the flagged Nitro version without allowing any user actions.
- Lesson Learned: Educate users about the possibility of false positives and guide them on how to whitelist or report such incidents to their antivirus provider (if possible). Provide clear communication on steps users can take in such situations.
These lessons learned will serve as a foundation for refining our processes and ensuring a more resilient release and incident response framework in the future. Continuous improvement is key to maintaining the reliability and security of our software.
Thank you for your dedication and cooperation in resolving this matter promptly.