% \iffalse meta-comment % !TEX encoding = IsoLatin %<*internal> \begingroup \input docstrip.tex \keepsilent \preamble ______________________________________________________ The FiXLtxHyph package Copyright (C) 2011-2012 Claudio Beccari All rights reserved License information appended \endpreamble \postamble Copyright 2011-2012 Claudio Beccari Distributable under the LaTeX Project Public License, version 1.3c or higher (your choice). The latest version of this license is at: http://www.latex-project.org/lppl.txt This work is "author-maintained" This work consists of this file .dtx, a README file the derived file fixltxhyph.sty, and the English documentation fixltxhyph.pdf. By running pdflatex on fixltxhyph.dtx the user gets the .sty file, and the English documentation file in pdf format. \endpostamble \askforoverwritefalse \generateFile{fixltxhyph.sty}{t}{\from{fixltxhyph.dtx}{style}} \def\tmpa{plain} \ifx\tmpa\fmtname\endgroup\expandafter\bye\fi \endgroup % % % \fi % % \iffalse %<*driver> \documentclass{ltxdoc} \ProvidesFile{fixltxhyph.dtx}[2012/04/02 v.0.4 Documented TeX file for the FixLtxHyph package] \GetFileInfo{fixltxhyph.dtx} \usepackage[latin1]{inputenc} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{color} \usepackage{multicol} \title{\centering The FixLtxHyph package\protect\\ A small fix in order to hyphenate emphasized words after a vocalic elision\protect\\ in Catalan, French, Italian, Romansh, and Friulan} \date{\fileversion\space\filedate} \author{Claudio Beccari} \usepackage{array} \usepackage{metalogo} \def\prog#1{\textsf{#1}} \begin{document}\errorcontextlines=9 \maketitle \begin{multicols}{2} \tableofcontents \end{multicols} \setlength\hfuzz{20pt} \DocInput{fixltxhyph.dtx} \end{document} % % \fi % \CheckSum{84} % % \begin{abstract} % This file fixes a small feature of the hyphenation algorithm used by the \TeX\ system % typesetting engines that manifests itself only with those languages that use the % apostrophe for marking a vocalic elision. This small package was set up to fix this % little undesirable feature in Italian, but it was extended to Catalan, French, the % future implementation of the fourth official Swiss language Rumantsch Grischun (Romansh % in English) and the future implementation of the Regional Language Friulan, spoken % and written in North Eastern Italy. This fix operates correctly with both \prog{pdflatex} and \prog{xelatex}. % \end{abstract} % \section{What is the feature to be fixed} % The five languages Catalan, French, Italian, Romansh, and Friulan use the apostrophe % for marking the vocalic elision of the ending vowel at the end of prepositions, articles, % articulated prepositions, definite adjectives, and other words playing similar rôles when % they just precede nouns, adjectives, verbs, numerals, that start with a vowel. Probably % there are other languages that use the apostrophe in a similar way. I can easily upgrade % this small package if \LaTeX\ users of other languages let me know about such languages. % % This feature is common to most Romance languages (from West to East) from Catalan and % Valencian, to French, Langue d'oc, Occitan, Provençal, Vivaroalpin, Italian, Piedmontese, % Lombard, Romansh, Ladin, Friulian; up to now only Catalan, French and Italian are handled % by the \TeX\ system programs; at the same time most of these languages are minority ones % and are being protected by local legislation or are supported by specific cultural or % linguistic institutions; Romansh has got a national/federal legal status in % Switzerland and is being used in legal and official documents in the whole Swiss % Confederation, not only in its area of everyday use, the Kanton Graubunden or Canton % Grigioni or Chantun Grischun (where seven Romansh varieties are being spoken, besides % Swiss German, Italian, and other languages). The Friulan language has an official % regional status in the North-eastern Italian Region Friuli\,-Venezia Giulia. % % This spelling rule is very rigorous in French; I suppose it is also a rigorous rule in % Catalan, Romansh, and Friulan but I am not that familiar with these languages even if I % can understand their written forms. In Italian it used to be a rigorous rule many years % ago, but nowadays it is less frequently used when plurals are involved. % Nevertheless apostrophes are practically the only analphabetic sign you see in an % Italian text except for letters and punctuation and quotation marks. % % In order to hyphenate correctly these word combinations all five languages have to % declare the apostrophe, that has a category code of~12, as a glyph with non zero lower % case code. In facts all five languages declare: %\begin{verbatim} %\lccode`\'=`\' %\end{verbatim} % or something equivalent. With this little trick, the typesetting engine considers the % apostrophe as a valid word character and treats the whole string as a single word; the % patterns of these languages, of course, take into consideration also the apostrophe so % that the resulting correct line breaks are easily found: %\begin{center} %\begin{tabular}{l>{\ttfamily}ll} %Catalan & d'aquesta & d'a-ques-ta \\ %French & l'électricité & l'élec-tri-ci-té \\ %Italian & dell'eleganza & del-l'e-le-gan-za \\ %Romansh & l'identitad & l'i-den-ti-tad \\ %Friulan & l'arbul & l'ar-bul %\end{tabular} %\end{center} % % So where is the problem? It emerges when the second part of the string is emphasized, % because in this case no hyphenation takes place: %\begin{center} %\begin{tabular}{l>{\ttfamily}ll} %Catalan & d'\string\emph\{aquesta\} & d'\emph{aquesta} \\ %French & l'\string\emph\{électricité\} & l'\emph{électricité} \\ %Italian & dell'\string\emph\{eleganza\} & dell'\emph{eleganza} \\ %Romansh & l'\string\emph\{identitad\} & l'\emph{identitad} \\ %Friulan & l'\string\emph\{arbul\} & l'\emph{arbul} %\end{tabular} %\end{center} % % This behavior is easily explained, so that it is not to be considered a bug, but a % feature; a feature that is annoying only when using the above five named languages. % The point is that all \TeX\ system typesetting engines consider a word to be that % character string starting after a character invalid in a word and finishing with the % first token invalid in a word. Notice that when the hyphenating algorithm comes to work % the command |\emph| has already been expanded and it ends up with the qualifications of % the selected font; therefore a string such as \verb*| d'aquesta | (starting after % a space and ending before the following space) is made up of valid characters; but % \verb*| d'\emph{aquesta} | is a ``word'' starting after a space and ending before % a space, but containing a font change. And this makes the word invalid for hyphenation. % The \TeX\-book is clear on this respect: ``If a suitable letter is found [as a starting % character], let it be in font $f$. \dots\ \TeX\ continues to scan forward until coming % to something that's not one of the following three ``admissible items'': (1) a character % in font $f$ whose |\lccode| is not zero; (2) a ligature formed entirely from characters % of type (1); (3) an implicit kern. \dots\ Notice that all these letters are % in font~$f$.'' % % This was a specific programming choice decided by Donald~E.\ Knuth together with Frank % Liang, his PhD student who developed the hyphenation algorithm implemented in the % typesetting engines of the \TeX\ system\footnote{I have been told that Lua\TeX\ is % developing a different algorithm that eliminates this feature.}. % As all such decisions, it is a compromise between accuracy and speed. And remember that % at the beginning \prog{tex} the program was used essentially with English, a language % that does not use accented letters and uses elision in a much different way as the one we % are speaking here. The problem did non exist and, I suppose, it will never exist in % English. % % \section{The solutions} % As a compromise I decided to solve the problem in an automatic way only when the second % part of the ``word'' to be hyphenated is emphasized. I suppose it is the most frequent % situation, although no one can avoid thinking to other situations; for example: the % second part of such ``word'' after the apostrophe is bolded, is colored, is written % in another font selected on purpose or is in another alphabet, is in italics (with % no automatic inclination switching); in such cases the solution is manual and remains % manual, because there are too many possibilities and it is cumbersome to deal with all % of them. % % But manual or automatic, how should we proceed? Simply we must convince that the % starting letter must not be the start of the part preceding the apostrophe, but what % follows it. % This is simple: it suffices to put after the apostrophe an unbreakable, zero width glob % of glue; \TeX\ starts looking for a potential starting letter after the glue. % Therefore the manual solution consists in defining a short macro such as the following % one: %\begin{verbatim} %\newcommand\hz{\nobreak\hskip\z@skip} %\end{verbatim} % or, if you want to avoid setting this short command into a personal \texttt{.sty} file, % simple change |\z@skip| with |0pt|. You will then have to modify the font changing % phrase into something such as: %\begin{verbatim} %... d'\hz\textbf{aquesta} ... %\end{verbatim} % The |\hz|, whose name reminds the phrase ``Horizontal skip of an unbreakable Zero width % glob of glue'', finishes the preceding word and sets the grounds for starting the search % of a new starting letter of another word; it will be found after the font selections code % introduced in the horizontal list by the selected font identification. % % The automatic solution, on the opposite, implies a small but substantial modification of % the |\emph| command. In facts the text command uses the text declaration |\em|; on turn % |\em| is a robust command, that is it is defined as \verb*|\protect\em |: it would be % very unwise to modify a protected command, so it is necessary to modify the % \texttt{protect}ed one, and this operation is not trivial because of the space in this macro name. % In any case if we find out how, we must add |\hz| to the definition of \verb*|\em | % before its argument, the real text to emphasize, is processed. % % This small package does exactly this, only for the five named languages, and only if % they are used, and only with the |\emph| command. The |\hz| command is available to the % user in a global way, so that when this package is loaded, the manual solution remains % valid for every language, although in very unlikely situations. % % It has been tested with the five languages with both \prog{pdflatex} and \prog{xelatex}, % and apparently it works as expected; it has been throughly tested in all situations with % Italian; it should work properly also in French, in Romansh, and in Friulan. The adopted % solution does not fiddle with active characters and therefore it does not interfere with % the internal workings and settings of Catalan and the other languages. % % \section{Installation} % With modern \TeX\ distributions these instructions are superfluous; should you need to % install by hand, download from \textsc{ctan} in a scratch directory (possibly create one, % and after finishing, delete the whole directory with its contents) run this file % \texttt{fixltxhyph.dtx} through \prog{pdflatex}; you get two files and move them % as follows: %\begin{itemize} %\item Move all the files in the following directories on your disk; if you don't already % have those directories, create them. %\item These directories should be created in your personal \texttt{texmf} tree; if you % don't have one, create it; how to do this and where to root it depends on your operating % system; before doing any change to your hard disk, please read carefully the TeX Live % or the MiKTeX documentations in order to find out what a personal tree is. %\item Move \texttt{fixltxhyph.dtx} to \texttt{.../texmf/source/latex/FixLtxHyph/}; %\item Move \texttt{fixltxhyph.pdf} to \texttt{.../texmf/doc/latex/FixLtxHyph/}; %\item Move \texttt{fixltxhyph.sty} to \texttt{.../texmf/tex/latex/FixLtxHyph/}; %\item if your distribution requires it, refresh the file name database. %\end{itemize} % You are now ready to use the package by simply invoking it in the preamble of your % documents: %\begin{verbatim} %\usepackage{fixltxhyph} %\end{verbatim} % %\section{Aknowledgements} %I wish to thank Lorenzo Pantieri who tested the preliminary and the actual versions of % this package and directly or indirectly helped debugging the code, especially in the % preliminary version that used active characters and was particularly buggy. Another % big thank to Enrico Gregorio who spotted the protection problem of the |\em| command. % % \StopEventually{} % %\section{The documented code} % We start by identifying the package and the necessary format file: % \begin{macrocode} %<*style> \ProvidesPackage{fixltxhyph}[2011/04/02 v.0.4 Small fix for hyphenating emphasized words preceded by vocalic elision] \NeedsTeXFormat{LaTeX2e}[2011/06/27] % \end{macrocode} % Then we make sure that the package \texttt{babel} or \texttt{polyglossia} has already % been loaded; otherwise we warn the user and exit; no patches can be made to an unknown % package. % \begin{macrocode} \@ifpackageloaded{babel}{}{\@ifpackageloaded{polyglossia}{}{% \PackageWarning{FixLtXHyph}{This package must be loaded after babel or polyglossia}% \endinput}} % \end{macrocode} % % We need the package |etoolbox| in order to perform any action on control sequences % that contain spaces in their names; we do not need any means to test if we are working % with \texttt{babel} or with \texttt{polyglossia} because, thanks to the previous % tests, one of the two packages has certainly been loaded. % \begin{macrocode} \@ifpackageloaded{etoolbox}{}{\RequirePackage{etoolbox}} % \end{macrocode} % We define a very short command |\hz| in order to have available a handy command % for inserting an unbreakable zero-width glob of glue in case we needed to do some % sort of patching by hand. % \begin{macrocode} \newcommand\hz{\nobreak\hskip\z@skip} % \end{macrocode} % We make patches only if one or more of the five languages Catalan, French, Italian, % Romansh, or Friulan (or its alias Furlan) has been invoked as an option to \texttt{babel} % or specified to \texttt{polyglossia}; if none of these options had been selected, % evidently the user was thinking to other details and missed the point that this patch is % necessary only for the above mentioned five languages. In any case no harm takes place % if from now on nothing else gets done, except for the definition of |hz| that remains % available to the user. % % The next bit of code defines some aliases in order to keep the original meaning of the % declaration |\em|; in order to patch an alias, so as to be able to set the proper % definitions only for the named five languages and to restore the original situation when % a change of language takes place. % \begin{macrocode} \letcs{\FLH@originalem}{em } \let\FLH@newem\FLH@originalem \preto\FLH@newem{\hz} % \end{macrocode} % % We then use a repetition cycle based on a list of language names; if the % language with one of the listed names has been invoked as an option to \texttt{babel}, % or specified to \texttt{polyglossia} then the patched \verb*|\em | definition is made % the default, while when changing language the original definition is restored; we define a macro that contains the language names: % \begin{macrocode} \def\@tempB{catalan,french,italian,romansh,friulan,furlan} % \end{macrocode} % then we perform the above mentioned cycles; we have to distinguish if we are using % \texttt{polyglossia} or \texttt{babel} because the internal setting and resetting macros % of these two packages have different names; they are all made up by the agglutination of % a prefixx to the language name, so we have to build the macro names to patch with the % usual deferred name contraction by means of |\expandafter| and |\csname| with its % companion |\endcsname|. % \begin{macrocode} \@ifpackageloaded{polyglossia}% {\@for\@tempA:=\@tempB\do{% \expandafter\ifx\csname captions\@tempA\endcsname\relax\else \expandafter\addto\csname noextras@\@tempA\endcsname{\cslet{em }{\FLH@originalem}}% \expandafter\addto\csname blockextras@\@tempA\endcsname{\cslet{em }{\FLH@newem}}% \expandafter\addto\csname inlineextras@\@tempA\endcsname{\cslet{em }{\FLH@newem}}% \fi}% }{\@for\@tempA:=\@tempB\do{% \expandafter\ifx\csname captions\@tempA\endcsname\relax\else \expandafter\addto\csname extras\@tempA\endcsname{\cslet{em }{\FLH@newem}}% \expandafter\addto\csname noextras\@tempA\endcsname{\cslet{em }{\FLH@originalem}}% \fi}% } % \end{macrocode} % % This documented file is now terminated and its final commands are issued. % \begin{macrocode} \endinput % % \end{macrocode} % % \Finale % \endinput