佛教圖書館館訊 第二十四期 89年12月

Editing XML

Chung-Hwa Institute of Buddhist Studies Christian Wittern

【Abstracts】:This article explores ways to edit XML files. XML is the format of the source of CBETA's text files, so finding good and powerful tools to work with these files is of central importance to the success of the project.

  As described here, the author was looking for a suitable editor for SGML/XML files for many years. There are a number of such editors available, but most of them do not support the Chinese language or are cumbersome to use or both. Only recently, he found found that nowadays Emacs in conjunction with PSGML can not only serve as a high-quality XML editing environment, but has also the facilities to handle Chinese and many other languages very well. Emacs also has the potential to serve not only as an editor, but also as a whole reading envrironment for the CBETA texts, including indices, dictionaries on the like.

  The article then proceeds to give some starting points for working with Emacs and closes with a very short comparison of how the paradigm of markup, which is central to the success of CBETA's work procedure, as compared to other methods, e.g. editing files in wordprocessors like Microsoft Word.

Keywords: Markup, Editors, XML, SGML, Emacs

1. Introduction

  It was about eight years ago, when I first discovered the concepts of markup and the markup language SGML. At that time, I had used computers for a couple of years, written a thesis and used the help of a computer to analyse texts for my research.

  At that time, I was very excited, because markup seemed to solve many of the problems that troubled me:

● It is hard to move texts between different platforms or even applications.
● It is very difficult to inform the computer about features in the text.
● Some characters can't be typed or displayed at all.
● And of course all kinds of difficulties encountered while trying to mix German and Chinese in one computer file.

  Markup, or so I thought, provided a solution to this. And furthermore, there was an effort going on to standardize markup practice among scholars of the humanities (the TEI, or Text Encoding Initiative1) And indeed, markup was wonderful, and at the beginning, I was also pleased that I did not have to buy some new software to try out its concept, because markup could be used in every text editor.

  After a while, however, I felt tired of typing all these angled brackets and was looking for something more sophisticated, which could actually make use of the markup. Over the years, I tried a few, even some of the commercial ones. There was no good solution and I continued to use ordinary text editors for working with markup. Even with the arrival of HTML and later XML and a whole new generation of specialized editors, I continued to try some of these and always came back to using powerful text editors, like Textpad ord Ultraedit2 Meanwhile, although I knew and appreciated the advantages of markup, I continued to write my articles with commercial wordprocessor programs and my slides for oral presentations with presentation software that tried to hide the structure of my texts from me.

  This situation was unsatisfying, but it continued for many years, until about six months ago, when I discovered Emacs.

2. Emacs, the customizable display editor

  Of course, I had heard of Emacs before. In fact, I had tried Emacs out more than once. I even bought a book and tried to work my way through it, that was in 1994. Alas, I always discovered some shortcomings of Emacs, that prevented me from using it for my work.

  By mid-2000 however, with Emacs 20.7, it was possible to edit texts in a large variety of languages, including German, Chinese, Japanese, Korean, Tibetan, Hindi, Hebrew, Arabic and many others. And although Emacs has its own way of handling these languages, it even offers support for Unicode, which is not complete, but good enough for now.

  And of course, Emacs has always had excellent support for editing texts in a variety of markup languages, including SGML and XML. Since this works on top of all the other powerful features of this amazing peace of software, it is hard to imagine a dedicated XML editor that could be better than Emacs, except maybe the next version of Emacs:-)

  Emacs was first written by Richard M. Stallman at the MIT back around 1975. Mr. Stallman is a very outspoken advocate of free Software3 and later founded the Free Software Foundation (FSF) and started working on his version of the Unix operating system, which he called GNU.

  In the meantime, Emacs has been re-written a couple of times and improved quite a bit. Today it is maintained by a group of volunteers scattered around the globe and improvements and new features are constantly added. And it is available on almost every computer platform, past or present, including all Windows versions, all Unix/Linux flavors, and the Macintosh operating system. It is thus excellent for work across different computing environments.

3. Getting started with Emacs

  Having said all this, I have to admit that getting started in Emacs can be hard. Many things do not work as Windows users came to expect. For example, Emacs prefers the keyboard to the mouse, which means that you have to memorize a number of keystrokes, which requires effort and slows down things in the beginning. It does not take long however, and once this first phase is over, working with Emacs is much faster than with other editors. Using Emacs is like learning to meditate: it is much different from what we are used to, but once we understand it, it helps us a lot in our daily tasks.

  But maybe one of the biggest problems for new users is setting Emacs up to work flawlessly with SGML/XML files and work with. There are some components that are not part of the Emacs distribution and have to be downloaded separately. This is the one big obstacle for many who would like to try Emacs and fail to get it started. To overcome this problem, I have put together all the things needed to work with Emacs using the XML version of TEIlite (and other XML documents) all set up for a Chinese operating environment on Windows 98/Me/NT/2000 in one self-extracting archive4. All that needs to be done is download this file and execute it. This will copy all the files needed and set things up.

  The next step is to start Emacs, which is done by clicking on the file PSGML (or PSGML.BAT), which should be in the directory C:\PFILES. If you want a guided tour through the first adventures with Emacs and XML, you might want to try a tutorial, which is also available on my homepage.

4. Using Emacs

  In the past few months, I got used to use Emacs for virtually all my editing needs, I started to even read my email with Emacs built-in mail- and newsreader gnus, which among other things helped to protect me from email viruses.

  Even this article, which I could also write in a wordprocessor, is more conveniently written in Emacs. This might seem strange, since a program like MS-Word is so much more "user-friendly". To me, this "user-friendliness" means just the opposite: It hides important features of my texts from me, it does not give me control over how I want my text to be handled and it locks me into one specific application and environment. (see figure 1)

Figure 1: The beginning of this article displayed in Emacs

  Much could be said of the differences between Emacs and MS-Word, which are representing two radically different views of how computers should interface with the users, and maybe even, what the role of computers should be. While Microsoft tries to make working with computers 'easier' by introducing 'entertaining' elements like an animated paperclip and the like, I have not yet met anybody who actually finds this helpful. But this is clearly in line with a global society that changes 'education' to 'edutainment', 'information' to 'infotainment' and the like. Accordingly, this probably should be called 'computainment'.

  On the other hand, markup tries to make explicit and clear what you want to talk about. This precision requires some extra effort at the beginning, because it requires thinking. But it makes a lot of sense and saves a lot of time in the long run. The Emacs way of doing things is perfectly in line with this by completely empowering the user to do the task he wants to do with the least amount of intrusion. In fact, the Emacs maintainer make an conscious effort not to make assumptions about what the user wants, but instead give its users the power to adapt and change literally every single function, key-sequence or menu entry, and a complex and highly efficient model exists for this.

  This might sound like working with Emacs is much less fun than working with MS-Word, but in fact, the contrary is true: Being able to do what one wants to do with the least necessary effort, in an efficient and clearly understandable way by itself is very much rewarding. Furthermore, by using Emacs you connect with a worldwide community of maintainers and users, that work together to improve the software and help new users become familiar with it, a community which is built on values like free sharing of resources, mutual encouragement and empowerment, and global cooperation. Problems with Emacs are easily detected and quickly solved5.

  By using Microsoft's flagship product Word however, you are contributing to the revenue of the world's largest software company and the assets of the worlds richest man (who arrived at that position in the small space of less than twenty years). You can go to any bookstore and find many books on how to use this 'easy' and 'userfriendly' Software, but if you find problems with the program, you will have to wait for months, maybe years until they are possibly fixed and you will have to pay again, to get this new 'upgraded' version of a software you already paid for.

  I could go on like this, but I will stop here. I hope to have written enough to make you at least curious and give it a try. It should be much easier today to get started than it used to be.

  Over the past few years, I have been working on an environment that would enable readers of text like those of CBETA to work productively and enhance there understanding by providing convenient access to various tools, like dictionaries, reference works etc. The project is called SMART6 and has gone through various phases. I have tried several networked, browser-based approaches, an approach on top of Microsoft Word (yes, I have tried that) and finally decided to use Emacs as a platform, since it offers by far the best functionality. Over time, this might as well become another reason to use Emacs:-)


1. More information about the TEI is available on the Internet at www.tei-c.org and < http://www.oucs.ox.ac.uk/TEI >.
2. I even put up some notes on the Web on how you can setup a poor mans XML editing environment with a built-in parser, see < http://www.chibs.edu.tw/~chris/smart/editxml.htm >.
3. That is, software that you are free to make any changes to and share with your friends, not necessarily distributed free of charge, although in practice this also happens to be the case.
4. The archive is available for download through my homepage www.chibs.edu.tw/~chris.
5. This happened just today on the Usenet forum gnu.emacs.help : A user was trying to set the encoding UTF-8 as a keyboard encoding (Emacs can work in virtually any language setting and can use a variety of input methods, both internal to Emacs and external from the operating system). This message was posted on Saturday, 3 Mar 2001 (22:27:21 +0100) After some discussion, different solutions were suggested, but none of them did what the original poster wanted, there was a problem Emacs could solve, but the solution was not elegant and convenient to the user. This would be the end in a situation were commercial software is involved. In the Emacs forum however, on Sunday, 4 Mar 2001 (23:10:17 +0100) somebody posted a solution that changed a tiny bit of code in Emacs and fixed the problem immediately. This took almost exactly one day, on a weekend.
6. More information about SMART can be found on my homepage as above.

[gaya首頁]   [圖書館服務]   [佛教圖書館館訊]   [館訊24期目次]