Jump to content

Web control: huge number of false positives in Greek links


carmik

Recommended Posts

Hello,

we are using endpoint security 8.x to control our users' access to the net. Specifically, one of the major categories blocked is news. We are running into a major problem though here: the number of false positives in this category is so large (we are talking about Greek sites) that we are considering enabling this category, just to mitigate the more serious issue of no access to non-news-related sites flagged as news-ones.

Is this a problem with international links in general? I am trying to figure out whether eset has knowledge of the issue or not. As it is, eset filtering of news sites (possibly other categories as well) is not only useless, but damaging to our daily operation so we will have to consider an alternative product for url filtering...

Just for the record, we've been running a squid proxy with squidguard utilizing the Shalla public domain url blocklist for more than 10 years and we definitely did not run into the same issues.

I feel extremely disappointed with the web filtering performance of this product...

Edited by carmik
Link to comment
Share on other sites

  • carmik changed the title to Web control: huge number of false positives in Greek links

Sure, I've just communicated to ESET Greece the following links:

https://boro.gr/31192/sxizofreneia-ta-prwta-symptwmata-kai-oi-kindynoi/

https://www.iatronet.gr/ygeia/psyxiki-ygeia/article/27098/sxizofreneia-poia-symptwmata-prepei-na-mas-anisyxoyn.html

The latter site contains medical information, the former get-well/fitness related information. Both links were categorized as "news". ESET Greece changed these to "health"/"magazine".

An example logged just some minutes ago:https://www.daniilidisbio.gr/

This is the site about farming: plants and varieties. This was logged under "food and restaurants".

The case open with the local ESET distributor is 64499, hope that helps.

Edited by carmik
Link to comment
Share on other sites

  • Administrators

First of all, websites are put into appropriate categories by automatic mechanisms that employ AI. It is beyond human capabilities to open every website that exists, read the content and categorize it manually.

Secondly, categorization is subjective; a website what one would put in the X category, another one would put into Y category, etc.

Thirdly, there is a limited number of website categories. Therefore some may fall into other/miscellaneous group or may fall into a broader group, such as News that may contain news about political affairs, health, technology, etc.

Likewise Food and restaurants contains websites related to food which vegetables and fruits are.

As for boro.gr:

This is indeed correctly evaluated by automatic mechanisms as Magazine/News:image.png

The fact that the news concerns health does not make the categorization incorrect. It's just a broader category for health news. And no, the category was not changed to health by the Greek ESET partner. All they can do is report miscategorization to the website categorization provider. In this case, it's not miscategorization but rather a more specific category "health" that the website might belong to. I assume that the broader category is reported if a website falls into multiple categories.

As for www.daniilidisbio.gr, it also looks like a correct categorization - food & restaurants. Definitely it concerns food so not a miscategorization. There is no category such as agriculture. The Food & drink category also encompasses categories like International Cuisines, Desserts & Banking, Dining Out, Food Allergies, Health Cooking, Vegan, Vegetarian.

image.png

 

 

Link to comment
Share on other sites

Thank you for tackling this faster than a bullet.

I was under the impression that this sort of categorization is community-driven in a way. That is, users might upload specific sites to specific categories, or tag them in multiple categories. Which can be subjective.

Now, as for the test cases discussed, I might agree with you about boro.gr. But I do disagree about daniilidisbio.gr. This is a site that offers plants for biological farming. Food refers to something you can eat, and this is not what this site is about. The farm owner basically ridiculed us today because his site was not accessible to our government agency. The fact that the site contains a small number of references to fruits does not imply that this is about selling food.

You did not mention the iatronet.gr site, which contains medical related information. It does contain some news, but it is not the site's core function. Therefore if your AI thinks it is, then perhaps you should try to do something here.

I have three "solutions":

  1. either block only pornographic sites, downloads/warez and basically red-flagging sites, avoiding blocking categories that will effectively carpet bomb sites not belonging there or
  2. do a whitelisting of everything ending in *.gr (allowing bad material from *.gr to creep in), followed by blocking whatever we are blocking right now...
  3. get rid of ESET web control which does not work as it should and tender for something that can accomplish the task better (see my notes below)

I can not really offer a huge technical help on how you should go about reducing the site false positives. You might believe that your product works well and does its job properly. My own experience with our users on classification as performed by other products (Fortigate comes to mind, as well as the community-as-a-service driven Shall squidguard URL lists) makes me feel that eset web control is unusable for broad business application here.

Will most likely go with (2) above for the time being, that is if ESET allows a top level domain (.gr) to be whitelisted.

Link to comment
Share on other sites

  • Administrators

You can suggest a website categorization provider that does categorization better than the current one and we could investigate if it's really better than the current provider.

Link to comment
Share on other sites

I'm sorry, I can't really suggest something here. I've not looked into dedicated/paid solutions in the past 3-4 years (my experience with fortiguard is before that time, using a small UTM for the purpose).

Shalla is available at at hxxp://www.shallalist.de/faq.html although obviously one should look for the terms and conditions. It has a quite grained categorization and includes not only domains but URL regexps as well. It does the job much, much better that the current ESET filtering.

Like I said, I'm certain that this job can be performed better. It's not only me that is interested in eset being a better product; it's mainly the eset company by itself!

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...