Thibaut Wdowiak's Blog

An xml resume for html and pdf output

06 Dec 2007 // projects

A few months back I wrote some scripts to edit a cv in multiple languages, multiple formats from a single file. Now each and every version of my cv contains the same info, whether it is in french or in english, whether it is in html or in pdf.

Actually I am not the only one to have this kind of ideas. I found there is a project on sourceforge using the same kind of approach.

And oh by the way, the cvs on this site have been generated with these tools.

You may have (or may not have) noticed that the CV section of this website is somehow special:

  • The French and the English versions share the same presentation.
  • The pdf and the html versions share the exact same content.

The reason is that content is separated from presentation. I made this piece of development to tackle various problems that occurred to me when trying to maintain a cv. My requirements were the following:

  1. Only one place to write one thing.
  2. Support for different languages with an easy way to synchronize the content across the languages.
  3. Display in html and pdf, extensible to other formats as the needs arise.

Conception

The structure of the document is described in a dtd file. The content itself is in xml format, complying with the dtd. Then I developed 2 xsl scripts which transform the xml into either an html or an fo tree (which serves as an transitory format before pdf generation). The glue to hold things together is a bash script that calls xsltproc and fop with the required paramters.

The document structure (DTD)

The root node of the document is called Cv. It contains the following elements that form the logical structure of my cv:

  • Person,
  • Heading,
  • Education,
  • Languages,
  • WorkExperience,
  • Skills,
  • Occupations

Each of these main elements is described further until the root nodes containing the content. For example an Organisation is described as such:

<!ELEMENT Organisation (url?,t+)>
    <!ELEMENT url (#PCDATA)>
    <!ELEMENT t (#PCDATA)>
    <!ATTLIST t lang CDATA #REQUIRED>

It contains an url that points to the website of the organisation and one or more t elements. t elements must have a lang attribute. I am using t elements to include contents in different languages, currently english and french.

The content (XML)

The xml file complies with the dtd.For example, an organisation looks like this:

<Organisation>
       <url>http://www.micropole-univers.com</url>
       <t lang="fr">Micropole-Univers (SSII sp├ęcialiste BI)</t>
       <t lang="en">Micropole-Univers (Software Firm, BI specialist)</t>
    </Organisation>

The transformations (XSL and fop)

The xsl stylesheets takes 3 arguments:

  • with lang and capt, we specify the language of the output document
  • with medium, we specify if it is for a restricted usage or for internet release (I prefer not to leave my personal address on the web).

Let's see how we can transform our organisation into html:

<strong>
       <xsl:if test="$medium='web'">
          <xsl:element name="a">
             <xsl:attribute name="href">
                <xsl:value-of select="Organisation/url"/>
             </xsl:attribute>
             <xsl:value-of select="Organisation/t[@lang=$lang]"/>
          </xsl:element>
       </xsl:if>
       <xsl:if test="$medium!='web'">
          <xsl:value-of select="Organisation/t[@lang=$lang]"/>
       </xsl:if>
    </strong>

The glue (bash)

A bash script holds sample executions that I often have to launch. To convert the file cv.xml into pdf, I use xsltproc and fop:

xsltproc -stringparam lang 'en' -stringparam capt 'capt-en' -stringparam medium 'web' cv.fo.xsl cv.xml > cven_web.fo
    fop -fo cvfr_web.fo  -pdf cvfr_web.pdf