Proposal for new <Article> property: type


#1

I have a proposal for a new HTML element <comment>. I’m possibly explaining the obvious here, but it would be used to wrap the comments on a webpage.
The behavior of the tag would be similar to a <div> element.
Originally, I was thinking it would wrap each individual comment, as in:

<comment>
  I am a comment.
</comment>

But it might work equally well or better to have a <comments> element that wraps the entire comment section.

<comments>
  <div class="comment">
    I am a comment.
  </div>
  <div class="comment">
    Me too.
  </div>
</comments>

The reasoning is similar to this proposal for ownership of content.
However, the main driving reason I am proposing this is for SEO considerations and the reduction of spam. My idea is that once website creators are using <comment> tags for their comment sections, search engine web crawlers can then be made to recognize the comments on a page and not increase page ranking based on links from those comment sections. Since page ranking appears to be the primary motivator for spam comments, perhaps spam bot programmers would avoid comment sections that are wrapped in this new <comment> tag. Even if they didn’t stop posting comments, the website that they are spamming for would not get rewarded for this annoying behavior.
And of course if a developer wanted to, they could make the <comment> tag only wrap comments by new users or users that have not met some arbitrary criteria. That way legitimate users posting relevant links don’t have to be lost for the page-ranking algorithm.

Thanks,
Jon


#2

This would make the semantics less useful because not all actual comments would be marked-up as comments. To mark an untrusted comment, an attribute could be used such as untrusted:

<comment untrusted>
    Here is a comment by a new unverified user.
</comment>

I’m not sure though the whole feature should be represented by an element. A comment is generally an ARTICLE, so it would probably make more sense to mark up the type of content via an attribute:

<article type="comment">
    Here is a comment.
</article>

#3

Sure. Those modifications to the proposal seem reasonable to me. My main concern is that there is a well-defined way to designate certain content as being user-created and thus unreliable for determining page rank for search engines.


#4

To me, this seems better suited for an aria value or a property, not an element.


#5

“To me, this seems better suited for an aria value or a property, not an element.”

ARIA is only exposed in the accessibility tree, and only accessed by ATs (mostly screen readers), so an ARIA property wouldn’t fulfill the SEO and/or spam reduction use cases Jon described.


#6

What about making a set of custom elements to do what you’d like first? This way you can prototype the idea without getting vendors involved. If they get enough support from web developers then vendors could look at integrating it into browsers directly.

The general idea here seems to be, “Any anchors within this node should have rel=nofollow applied.” Which is very easy to script client-side. Applying it based on criteria is as easy as having an attribute to control whether or not a comment should have the rel modified.

I’m not seeing where this need currently rises to where vendors need to add anything new to the platform. Search engines have documented ways of handling this (typically rel attributes) and developers just need to apply it properly. The extensible web manifesto dictates this be tested by developers using existing technology already available. Then if it gains wide enough traction with developers browser vendors can integrate directly.


#7

I believe the goal of the Web Annotation Data Model was to enable things like this. See especially Motivation and Purpose.


#8

The Web Annotation Data Model seems to be aimed more at creating an inter-operable data structure to share data with some form of context as it relates to why it exists. It isn’t about making elements that indicate to search engines not to index the content. Nor would anything specifically in here provide that kind of indication it looks like.

It would be cool if a custom element would apply the Web Annotation stuff if possible in that context. But, it isn’t something in it’s own right to carry out the intent as described by the requesting developer from what I am understanding of the specification.


#9

I will go ahead and get working on a custom element prototype.
I think we won’t see the main benefits of attacking comment spam at the motivation level until the implementation is widespread enough for spambot creators to know about it, but I’ll certainly give a go at prototyping the solution if that’s the best way to proceed.
Thanks!


#10

Honestly, you aren’t going to stop spambots with this kind of thing. If their content is even rendered, they win. Because that means everyone visiting your site can see their content. Regardless of SEO reference juice. That’s why search engines are going to reduce your ranking if you have an excessive amount of spam content in comments. It shows site authors that allow it don’t really care to maintain their appearance to visitors which makes visitors have a worse experience.

Proper spam controls are far more about being proactive on maintaining a comment system. And putting rules in place like after some number of weeks the comment system is closed on articles. Then you may say, based on some internal thing, if a user is of some level of trust they may be able to ignore that rule and comment on older posts.

The goal shouldn’t be, “Allow tools to easily ignore possible spam.” It should be, “Let us not show spam to end users ever.” Yes, it is a difficult process, but it’s far more user friendly then letting them see the spam (spammers win here!) because some person or organization doesn’t want to watch the content they are sending to visitors.


#11

I completely disagree with you on that.
I don’t think you can know what the effect on spambot behavior would be if their links stopped giving their sites reference juice. I don’t believe that a significant number of actual people are clicking on links in spam comments and the text content of spam comments is usually undecipherable for any meaning. A blog post on spam comments
Furthermore, I’m not suggesting that website authors wrap their comments in <article type="comment"> tags and then ignore and leave the spam that gets added to their site. Of course, website authors would still have the same tools that they already have for getting rid of spam. I’m suggesting that we can attack spam in a different way.
The tools we have and the methods you’re suggesting are all about policing the spam. I’m saying that that is treating the symptom. I want to remove the underlying cause for spam because I believe that will be more effective where all of our CAPTCHAs and spam filter plugins are arms races. I’m not saying they aren’t effective in policing the spam. I’m not saying we should stop doing them. I believe that removing the reference juice will remove the primary motivator and thus drastically reduce the amount of spam that we have to police with those tools.


#12

That’s why you apply rel=nofollow as most sites already do, as explained before. This should be done automatically before the content even hits the client. Most comment systems have this option available or on-by-default. Which was also explained in the post you referenced.

I don’t think another element or additional attribute in browsers helps this at all. We already have the tools available.


#13

Where? I don’t see any explanations that “most sites” are already applying rel=nofollow to the links in their comments. I also don’t see any explanation that “most comment systems” have an option to apply rel=nofollow to links.
If that’s the case, then great. That shows that removing that motivation does affect how much spam those comment sections receive.
My argument is that an official semantic wrapper around comments would be a much cleaner method, in both the resultant markup and the programming required.


#14

So the following first example is cleaner than the second example?

Example 1:

<article type="comment">
  <a href="thing">Reference<a/>
</article>

Example 2:

<a href="thing" rel="nofollow">Reference</a>

Yup, I mis-wrote my intent there. Let’s consider this though. Most sites are using a prefabricated solution to operate on, such as Wordpress. These prefabs almost all apply nofollow automatically to comments by default.

Ever since 2005, search engines have supported the rel=”nofollow” attribute to comment links which is computer code for “The website owner did not curate this link and does not vouch for it, so do not use it to influence search results.” Almost every blog software (including WordPress and Blogger) supports it and adds it in without you having to do anything.

Reference: Blog comment spam is too high

Also, saying that eyeballs isn’t the point but only SEO juice is the driving factor for spam is misguided. If that’s the case, why is email spam so massive? An extremely small percentage of spam email is replied to. Yet out of that they make tons of money. A spammer doesn’t care whether “many people” click their links. They only care that the right people do so they can be taken advantage of.

Spam isn’t going to be defeated by making yet another element. We have all the tools available, 95% of spam control is policing it or preventing it. Not controlling the fallout after it happens and letting the content sit and still get sent to users.


#15

There are currently other comment-spam methods than just posting links. For example, pure text with no links, but with numbers of messengers or Skype or a phone number provided. rel="nofollow" can’t help with that.


#16

Wrapping these other types of spam in an element isn’t really helpful though either. The only way I’d see it maybe helping there is for mobile devices that automatically urn things like phone numbers into links to make the call. However, I think in this context it would be best to investigate a new global attribute to indicated untrusted content rather than introducing a new element to contain it.


#17

I wonder if you actually recognize that this is a false comparison. When putting this together did you think about how you would respond if you were on the other side of this debate?
The actual examples would more accurately be
Example 1:

<article type="comment">
  <a href="link_url1">Adidas</a><a href="link_url2">Nike</a><a href="link_url3">New Balance</a> etc...
</article>

Example 2:

<article>
  <a rel="nofollow" href="link_url1">Adidas</a><a rel="nofollow" href="link_url2">Nike</a><a rel="nofollow" href="link_url3">New Balance</a> etc...
</article>

Each comment is going to have a container element. What we are talking about is whether that container element should say that what it contains is a comment.

Ah, sorry. I thought you meant in this thread, not in my link.

Remember, no one is suggesting that all other tools for dealing with spam should be abandoned.

:tada: I know the original post in this thread talks about introducing a new element, but after the suggestion that it could just be an attribute on the <article> element, that is what we have been talking about.