The Beauty of Semantic Markup, Part 2: strong, b, em, i ~ A Blog Not Limited

The Beauty of Semantic Markup, Part 2: ``, ``, ``, ``

Aug 06, 2010

So, I had planned to focus the second installment of this series on markup for images with captions. The topic was a request from my friend Ian and his birthday was coming up. However, his birthday has passed, and I'm just now writing. Plus, I've been thinking a lot lately about something more fundamental: bold and italicized text.

This may seem a trivial thing to be consuming my "markup mind," but after Tantek Çelik's HTML5 presentation, it's been bugging me. And what specifically has been bugging me is the recommendation of  and  in HTML5.

Shut the Front Door!

Yep, it is true.  and  are back and, apparently, more "useful." And when I first learned this, I was instantly put-off. I come from the "separate content from presentation" school that dropped these two elements in favor of the "more semantic"  and .

At the time when folks were thinking about the structural/semantic markup approach,  and  were strictly presentational. The HTML 4 spec declared these two as style elements that simply rendered text in bold and italics, respectively. Further, screen readers didn't differentiate them in any special manner, adding to the logic that they were only useful for visual differentiation.

Conversely, in HTML 4,  and  offered meaning, as well as the default visual rendering. Content marked up with , for example, semantically indicated emphasis (and defaulted to italics in visual browsers), while the use of  indicated strong emphasis (and defaulted to bold).

Re-Definitions in HTML5

The WC3's HTML5 recommendation, however, brings us some redefinitions of these elements:

The  element now represents a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as keywords in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is emboldened.

The  element now represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized. Usage varies widely by language.

The  element now represents importance rather than strong emphasis.

, meanwhile, isn't featured on the list of changed elements in HTML5. Although, the working draft does seem to define it slightly differently than the previous spec: as emphatic stress rather than just emphasis.

Building an Argument

Upon first consideration, I was totally cool with the modified definitions of  and  (although, admittedly, slightly confused as to what emphatic stress meant), but I still felt  and  were presentational. I mean, the W3C even uses stylistically and typographic presentation in it's definitions for those elements.

But, thanks to this new series, I got to doing some research. And when I started, I was aiming to build an argument against  and . First, I wanted to find out how screen readers treated these guys.

Screen Readers

As it turns out (and as I expected), two most popular screen readers don't, by default, read content contained by these elements any differently than other content.

What I didn't expect to discover, though, is that they also don't treat  and  in any special way.

There goes the main "they aren't accessible" argument I was hoping for. None of the tags seems to offer any special accessibility to screen reader users.

Search Engine Optimization

So then I began a hunt for an SEO argument. Somewhere in the dusty annals of my mind, I recalled reading that Google paid particular attention to content contained by .

Turns out, I was wrong yet again. In fact, at one point, Google gave greater weight to content marked up with , not .

As of today, though, the search engine gives equal weight to  and , as well as  and .

Crap. I thought I had all this ammunition against  and , when what I really had were outdated and incorrect notions.

Forget the Argument, Focus on Semantics

I've said it before, but apparently I need to listen to my own advice about being too wedded to a particular semantic point of view … especially when operating with wrong assumptions. Time to focus on the entire point of this series: semantics.

So, let's take a closer look at  and  in HTML5

Presentation via CSS

In addition to the definitions I shared above, the HTML5 draft also specifies that CSS should control the presentation of  and ; that neither will, necessarily, appear in bold or italics by default.

Of course, this ultimately comes down to what the browser makers do, but this is a good clarification that these elements are no longer exclusively presentational in nature.

Further, the draft encourages authors to use the class attribute to define why a  or  element is used in order to allow for unique styling of different implementations.

Consider the new definition of assigning  to keywords and product names to offset those terms without adding importance. By extending  with class="keyword" or class="product" (or some other equally semantic values), you have your CSS hooks to give each a unique presentation and you are also adding meaning to your markup (kinda like how microformats work).

Same is true for applying  to taxonomy terms, idioms, phrases in another language and the like. Specifying the "why I'm using " via class offers potential for both styling and semantics.

Common Typographic Conventions

Even with these caveats, though, I can't help but still think about the presentational nature of the definitions in HTML5. As I mentioned, stylistically offset just screams presentation to me.

But then I started thinking about how bold and italicized text is commonly used in print. Yes, they do offer visual indicators, but more often (at least in my experience), text offset with italics or bold does conveys meaning, especially when considered in context.

Latin words, inner dialog or thoughts, titles of songs … I often see this type of content italicized in print. And, in context, I recognize the additional meaning the italics provides the content.

Media Independence

In HTML5,  and  are explicitly media independent. Essentially, because each element is no longer tied to bold or italics (visual presentation), the new semantic meaning they offer is available to non-visual browsers.

Again, it is up to those browser and screen reader makers to take advantage of that meaning, but media independence further supports the new semantic direction of these elements.

Warming Up

With all this additional information, I'm warming up a bit to using  and  again. But I'd be lying if I said I was completely comfortable.

 and  have historic ties to the notions of bold and italic. I mean, that's what "b" and "i" represent.

Why a new element wasn't introduced that is independent of this presentational history bugs me a bit. But, then again, using what people are already familiar with isn't always a bad thing.

Still, I worry that people will use these elements for presentational purposes. Or that folks won't apply the recommended class values to differentiate instances of these elements.

I can't help but think that this is just a big can of worms that will get messy if markup authors don't understand and apply the spec properly. And let's not even talk about the "challenges" that could result from what browser makers will end up doing or not doing.

Practical Usage

Aside from my concerns, I do want to give some thought to how I would actually use  and , now that they are semantic. And, of course, what roles  and  will play in my markup.

``

Even with the realigned definition of  in HTML5, I plan to use it as I always have, because I never really thought of it as strong emphasis. I always used it as it is now defined: indicating importance.

For my projects, the types of content I commonly mark up with  include:

Alerts
Warnings
Reminders
Important content (duh)

For example:

Registration is required for this event.

The presentation begins at 6:30 pm, so be sure to show up a few minutes early to avoid interrupting our speaker.

Password provided for this username is incorrect. Please try again or you may request your password be emailed to you.

I don't think there is a hard–and–fast rule about applying . To me, it is more about content. What is important? Is it the time a presentation starts, or is it the reminder to arrive early?

And this is what I dig about semantic markup. Focusing on content.

``

Like , I pretty much plan to use  the same as I always have. Even with the new (slightly unclear) definition of emphatic stress,  still means, to me, stressed content. As in content that I would verbalize in a stressed tone to indicate emphasis.

And because I write the way I talk (with lots of stressed words), I use this element often in my content:

Talking about microformats in less than 30 minutes (plus leaving time for questions) was quite a challenge.

You can use the <cite> attribute with <q> to indicate the source of a quote, if it's online.

It is really a matter of knowing the content well enough to know what terms and/or phrases should be emphasized in this fashion.

``

To be honest, based on the HTML5 definition of , I'm not sure how often I'll actually use it. The draft suggests it can be used with product names and keywords, but I, personally, don't see a need to differentiate this type of content in any way.

Of course, a client might feel differently. Perhaps a client might want all of the product names on their site to appear stylistically offset. So, in that situation, I would use it and take advantage of the recommended application of class to indicate the purpose of the element:

For data management, we offer two flagship products: Moxie and Mojo.

And if that same client also wanted to highlight keywords associated with their products, I might:

Moxie offers users the ability to cleanse, extract and transform data.

Meanwhile, in my CSS, I would style .product and .keyword in some fashion, likely both unique.

Also, HTML5 does specify that  can be used simply to indicate text that needs unique styling, such as those typographic conventions of drop caps and paragraph leads:

It was a cold and rainy night.

Although, I'm not sure I would favor this approach over using :first-letter in my CSS (like I already do on this blog). But I guess I could see it for styling a paragraph lead uniquely:

It was a cold and rainy night, despite what the weatherman had announced on the evening news. Bob was annoyed his stargazing plans were in danger from the looming storm.

Still, even after considering those scenarios, I'm frankly not convinced  is going to be a regular element in my arsenal.

``

As for , I can see using it much more often than . Particularly for technical, legal or medical terms, as well as foreign language phrases:

A patent foramen ovale is a congenital defect between the two upper chambers of the heart.

I try to live my life according to the axiom, illegitimi non carborundum.

Foreign Languages

Since I used a Latin phrase in the last example, now might be a good time to address use of the lang attribute. HTML5 Doctor provides an excellent article on the same topics I'm covering here.

In their examples of using  for foreign language phrases, they apply the lang attribute to indicate which foreign language is being referenced. For example:

Mix baking soda and vinegar together, and voilá, you get a cool chemical reaction.

However, another article on the topic, Using and elements, warns against this approach:

… the language attribute only describes the language of the text, not the meaning. It is possible that you will want to style text in a different language differently according to the context in which it is used, either now or in the future.

As for me, I think that if I do use  for foreign phrases, I'll likely skip the lang attribute and rely on class for any special styling.

Exercise Discretion

While I'm admittedly still a bit on the fence about actually using  and  regularly, you may feel differently and want to start marking up right away. If that is the case, please use these elements intelligently and correctly.

Don't just apply  because you need a bold effect and you are feeling lazy. Don't use  for a publication title, when <cite> may be the appropriate element (see part 1 of this series for more on <cite>).

Even the HTML5 draft recommends discretion:

… authors are encouraged to consider whether other elements might be more applicable than the i element, for instance the em element for marking up stress emphasis, or the dfn element to mark up the defining instance of a term.

Go Forth & Experiment

After gathering all this information, I had hoped to have a firm conclusion about  and . Alas, I don't. So, what I shall do is try different approaches and see how they work for me, my sites and my clients.

I have some clients who I know won't take the time to add the extra markup for something like , while some clients may embrace that extra level of control. And I have some CMS implementations that currently aren't configured in a way that will easily allow the addition of .

And then there's still that little voice in my head that can't seem to fully accept  and  as semantic elements.

Only time and practice will tell how big a role  and  will play for me. Until then, I'm eager to hear your thoughts!

Tags:: b; Beauty of Semantic Markup series; bold; browsers; em; emphasis; HTML; HTML5; i; importance; italic; markup; POSH; screen readers; semantic markup; semantics; SEO; strong; web standards; XHTML

♥ Share the Love

Twitter
♥ Facebook
♥ Delicious
♥ Digg
♥ StumbleUpon
♥ Reddit

Stephen opines:

08/06/2010

I’m of the party that would never use strong or em tags because b and i are more intuitive, they take less time to write, they make for cleaner code, and they take up less bytes (which becomes important for absurd scalability cases).

At the very least I would suggest that, instead of using em or strong, new names should be assigned that describe a sibling relationship - or use the same tag with an additional predefined property that describes the nature of its content.

Emily responds:

08/06/2010

@Stephen - Can’t lie … I strongly disagree with you. HTML5 aside, using  and  is simply non-semantic. And semantic markup is not only the point of this series, but it is what I aim for every day.

As for more intuitive, that is only true if people use those elements for presentation purposes, which is, again, not the point of this series. Further, it goes against the goal of web standards to separate content from presentation. Bold and italic should be controlled in CSS, not markup.

Now, if we bring HTML5 into the argument. They are semantic, at least as far as the draft recommendation views it. However, if people use  and in HTML5 for presentation, that’s invalid (as the spec stands now).

Not sure what you mean by "new names" … new elements? I do think that a new element for the new semantic goal of  and  in HTML5 might help avoid confusion and misuse. However, I don’t favor that over  or . They are both semantic elements that just happen to currently have visual rendering associated with them. So, semantically speaking, they will remain a large part of my markup.

Virginia opines:

08/06/2010

Let’s talk about Adobe Dreamweaver. Part of the problem with the proper use of strong and em, or even b and i, is to be found in the Dreamweaver Properties panel.

For the last several iterations of DW, you have been able to set a preference so that any use of the B or I icons on the Property panel would result in either a strong or an em element. Conversely, you could set the Preferences so that the B and I icons resulted in either bold or italic text. A cite element had to be added by hand or with the little code editor at the bottom right of the Properties panel.

OK, so we have 5 potential elements, all rendered by the majority of DW users with the B or I icons on the Property panel. If you’re looking for a reason why the semantics of these elements are still so confused, Adobe’s interface in DW is a good place to start looking.

I’d like to see Adobe offer 5 icons, not two, and for the semantic effect of each icon to be briefly described in the tooltip that accompanies the icon. A little clarity for the everyday DW user could mean better semantic code produced in a lot of places.

Emily responds:

08/07/2010

@Virginia - Excellent points. It isn’t just web professionals who should be aware of (and hopefully following) the specs … the software makers, too, have a responsibility.

Stephen opines:

08/07/2010

@Emily after some thought, I suppose semantics are more important overall than convenience. Maybe web browsers should refuse to style certain elements unless a “backward compatibility mode” is turned on.

I feel like the names “strong” and “em” are like apples and oranges, when their functions are more like different kinds of apples. The H1 through H6 tags have a sibling relationship. I think it would make more sense to have a sibling relationship between elements used to convey different tones in a text. This could be accomplished by creating new tags, like I was suggesting earlier, or maybe there’s a better way.

What would you think about a scheme like this (?):

or better yet

...completely ditching the element.

Toby opines:

08/07/2010

To add my 2C: previously to these issues being discussed in the HTML5 spec I have hardly heard of strong or em. I usually try to envisage what my content would look like printed out on a page from a typewriter (helps me structure flow correctly). In relation to this, if the correct usage of bolding and italicising is used, much as with quotes, double and single, and dash sizes, they should be all that is needed semantically. strong and em seem to me to cover grey shades of the meaning of bold and italic

Chris opines:

08/08/2010

I think half the confusion arises from people doing things the way *they* think things should be done.

Emily, I appreciate your for and against arguments re strong and em but, at the same time, I can appreciate Stephen’s comments re perhaps having em class=“italic” as something that makes more sense.

Having said that I think it really is a matter of interpretation ... which brings us back to the point I was trying to make in the first. Opinion is the killer here and perhaps the real solution would be for the W3C to document and define the spec better so that there can’t possibly be any misinterpretation?

Semantics are so much more important now than they used to be that I think we all really need to get it “right”, as impossible as that almost is with the lack of a single standard on the web. If only the web world was perfect. :)

Emily responds:

08/08/2010

@Stephen - As much as I appreciate the additional perspective you offer, I don’t agree with your suggestion of dropping . From my perspective, it has semantic value. As both the HTML 4 spec and HTML5 draft define it, it indicates importance. So, as you can see in the examples I provided in this post, I do see a valuable role for it.

Same holds true for , as it indicates linguistic emphasis. Which I, too, use all the time.

Using the rel attribute to indicate a presentation of bold or italic seems, to me, to be an invalid use of this attribute. It is intended to indicate relationship, not presentation.

I’m not opposed to applying classes to either  or  in order to indicate their visual presentation. That’s one of the purposes of class. However, I personally prefer semantic class names, rather than presentational.

So, if I were using, for example,  to mark up a user alert message that needed to appear in bold (and maybe red and maybe all caps), I would assign class=“alert” rather than class=“bold”.

With regard to the sibling relationship you mentioned, I’m still not quite clear where you are going with that. But perhaps you are alluding to what seems to be happening with the HTML5 re-definition of : "The relative level of importance of a piece of content is given by its number of ancestor strong elements; each strong element increases the importance of its contents."

Great stuff to think about, though. Thanks for adding to the discussion!

Emily responds:

08/08/2010

@Toby - Have to admit I’m shocked you haven’t heard of  or , so I’m glad I’m providing some information on these elements.

I do appreciate your approach to adding punctuation (emdahes, quotation marks and the like) to your content based on a typographic perspective. I do the same for the most part.

However, from a truly semantic standpoint — remember, semantic markup provides meaning for content, not presentation — bold and italic are not semantic. They are presentational concepts. And the  and  elements in the HTML 4 spec are defined as such.

 and , meanwhile, provide meaning. I do not see them as bold and italic elements, nor do any of the specs define them as such (although in HTML 4, they do come with default rendering in visual browsers).

And then once you get into HTML5,  and  have absolutely nothing to do with bold and italic. Sure, you can style them as such, but their definitions are fully semantic.

May I suggest you read my article that explains what semantic markup is and why it is important: Meaningful Markup: POSH and Beyond.

Emily responds:

08/08/2010

@Chris - Couldn’t agree with you more that changing specs make the semantic discussion more challenging and confusing. I often wish we had a single set of rules to follow that never changed, so that I could focus on making great web sites rather than making sure I’m following the latest changes. But then again, I also embrace the changing nature of the web.

I also agree (slightly) about your idea about interpretation being the killer here. There are certainly elements where I interpret them in my own fashion that may not be strictly how the specs define them (<cite> and <dl> come to mind).

For me, though,  and  are clear: emphasis and importance. Similarly (and HTML5 aside),  and  are bold and italic.

Of course, if I’m writing HTML5,  and  (as I hope I’ve demonstrated in my examples) are less clear to me and certainly more open to interpretation. Which I think was the intention of the HTML5 authors … for these elements to be used more broadly than presentational purposes.

Jason opines:

08/09/2010

I rarely use or . However, the new HTML5 definitions of these elements put them just barely above the non-semantic inline element . I believe it was Eric Meyer a few years ago who mentioned substituting an alternately-styled in place of simply as a non-semantic styling hook. From that view, they don’t sound all that terrible (and will even save a few bytes here an there), though the dangers of future misuse are increased as they become more common (again).

Emily responds:

08/09/2010

@Jason - You and I are on the exact same page :)

Anders opines:

08/16/2010

This series is a great initiative. Keep the posts coming!

… the language attribute only describes the language of the text, not the meaning. It is possible that you will want to style text in a different language differently according to the context in which it is used, either now or in the future.

As for the lang attribute, I can’t find any mention in the cited article that you shouldn’t be using it. You could certainly make use of attribute selectors to target texts written in another language. One could also assume that it would be helpful for screen readers to know in what language it should read out the text.

Debbie opines:

10/02/2010

AWESOME information Thanks so much!

Commenting is not available in this channel entry.

A Blog Not Limited

to web design, standards & semantics