Why our dataset have unsafe files?

We have uploaded a set of .gz files.

However, on the main page of our dataset (liwu/MNBVC Ā· Datasets at Hugging Face), the following warning is shown:

This dataset has 267 files that have been marked as unsafe.

For each file, the following error can be seen:

Virus: Can’t write to file ERROR

I am wondering why? Why our dataset is unsafe?

These files are just compressed text files, in particular, compressed .jsonl files in UTF-8. They are absolutly SAFE.

cc @mcpotato

any update?

Ah sorry about that, an error must have happened during scanning. I’ll check to see what happened and try to re-scan.

Re-scan was broken, it seems to have went through this time, only 27 files are marked as unsafe now, with ā€œvalidā€ flags.

These files are just texts in json format. How texts be ā€œunsafeā€ :sweat_smile:

It depends on the matching rules of the antivirus, some text files can contain code or some string that is used to detect viruses.