Abstract
The paper describes an approach that converts the structure of sequential text types – text books, scientific papers – into hypertext networks. Using XML as the technical basis, the approach implements sets of rules which automatically generate hypertext views as additional layers while preserving the original sequence and content of the sequential documents. These rules process information of mark-up at different annotation layers: the document structure layer, the terms and definitions layer, the thematic structure layer and the cohesion layer. In addition, the semantics of technical terms in these domains are represented in a WordNet-style semantic network. This lexical representation is used to link technical terms with their definitions and for generating a glossary that is linked to the terms in the corpus documents. Feasability and performance of the approach was evaluated using a German corpus with documents from the domains of text technology and hypertext research. This paper concentrates on the conversion methodology and its linguistic background; the related paper by Lenz (in this volume) focuses on implementation issues and presents a specialized hypertext transformation language that she has developed in this framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beißwenger, M. (2004). Annotation definitorischer Textsegmente und “terminologiesensitives Linking”. Arbeitsbericht, Forschergruppe Texttechnologische Informationsmodellierung.
Beißwenger, M., Lenz, E. A., and Storrer, A. (2002). Generierung von Linkangeboten zur Rekonstruktion terminologiebedingter Wissensvoraussetzungen. In Busemann, S., editor, KONVENS 2002. 6. Konferenz zur Verarbeitung natürlicher Sprache. Proceedings, Saarbrücken, 30.09.-02.10.2002, pages 187–191, Saarbrücken.
Beißwenger, M., Storrer, A., and Runte, M. (2003). Modellierung eines Terminologienetzes für das automatische Linking auf der Grundlage von WordNet. LDV-Forum, 19(1/2):95–104.
Carr, L., Hall, W., Bechhofer, S., and Goble, C. (2001). Conceptual linking: Ontology-based open hypermedia. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, pages 334–342.
Fellbaum, C. (1998). WORDNET: An electronic lexical database. MIT Press, Cambridge, MA.
Foltz, P. W. (1996). Comprehension, coherence, and strategies in hypertext and linear text. In Rouet, J.-F., Lovonen, J. J., Dillon, A., and Spiro, R. J., editors, Hypertext and Cognition, pages 109–136. Lawrence Erlbaum Associates Publishers, Mahwah/New Jersey.
Fritz, G. (1999). Coherence in hypertext. In Bublitz, W., Lenk, U., and Ventola, E., editors, Coherence in Spoken and Written Discourse, Pragmatics and Beyond New, pages 221–232. John Benjamins, Amsterdam/Philadelphia.
Hammwöhner, R. (1997). Offene Hypertextsysteme. Das Konstanzer Hypertextsystem (KHS) im wissenschaftlichen und technischen Kontext. Konstanzer Universitätsverlag, Konstanz.
Hammwöhner, R. (1990). Macro-operations for hypertext construction. In Jonassen, D. H. and Mandl, H., editors, Designing Hypermedia for Learning, pages 71–96. Springer, Berlin.
Hoffmann, L. (2000). Thema, Themenentfaltung, Makrostruktur. In Brinker, K., Antos, G., Heinemann, W., and Sager, S. F., editors, Text- und Gesprächslinguistik – ein internationales Handbuch zeitgenössischer Forschung, volume 16.1 of Handbücher zur Sprach- und Kommunikationswissenschaft, pages 344–356. de Gruyter, Berlin/ New York.
Holler, A. (2003). Spezifikation für ein Annotationsschema für Koreferenzphänomene im Hinblick auf Hypertextualisierungsstrategien. Technical report, Forschergruppe Texttechnologische Informationsmodellierung.
Holler, A., Maas, J. F., and Storrer, A. (2004). Exploiting coreference annotations for text-to-hypertext conversion. In Proceedings of the Fourth International Conference on Language Resources and Evaluation LREC 2004, Lisboa, pages 655–658.
Kuhlen, R. (1991). Hypertext. Ein nicht-lineares Medium zwischen Buch und Wissensbank. Springer, Berlin/Heidelberg/New York.
Kunze, C. and Wagner, A. (2001). Anwendungsperspektiven des GermaNet, eines lexikalisch-semantischen Netzes für das Deutsche. In Lemberg, I., editor, Chancen und Perspektiven computergestützter Lexikographie, pages 229–246. Niemeyer, Tübingen.
Lenz, E. A. and Storrer, A. (2002). Converting a corpus into a hypertext: An approach using XML topic maps and XSLT. In Proceedings of the Third International Conference on Language Resources and Evaluation LREC 2002, Las Palmas, pages 432–436.
Lenz, E. A., Birkenhake, B., and Maas, J. F. (2003). Von der Erstellung bis zur Nutzung: Wortnetze als XML Topic Maps. LDV-Forum, 19(1/2):113–125.
Mayfield, J. (1997). Two-level models of hypertext. In Nicholas, C. K. and Mayfield, J., editors, Intelligent Hypertext, volume 1326 of Lecture Notes in Computer Science, pages 90–108. Springer, New York.
Miller, G. A. (1998). Nouns in WordNet. In WORDNET: An electronic lexical database, pages 23–46. MIT Press, Cambridge, MA
Müller, F. H. (2004). Stylebook for the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/ stylebook-04.pdf
Pepper, S. and Moore, G. (2001). XML Topic Maps (XTM) 1.0. TopicMaps.org Specification. http://www.topicmaps.org/xtm/1.0/.
Storrer, A. (2002). Coherence in text and hypertext. Document Design, 3(2):156–168.
Tochtermann, K. (1995). Ein Modell für Hypermedia: Beschreibung und integrierte Formalisierung wesentlicher Hypermediakonzepte. Aachen, Shaker.
Witt, A., Goecke, D., Sasaki, F., and Lüngen, H. (2005). Unification of XML documents with concurrent markup. Literary and Linguistic Computing, 20(1):103–116.
Zifonun, G., Hoffmann, L., and Strecker, B., editors (1997). Grammatik der deutschen Sprache. de Gruyter, Berlin/New York.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Storrer, A. (2010). Mark-up Driven Strategies for Text-to-Hypertext Conversion. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_11
Download citation
DOI: https://doi.org/10.1007/978-90-481-3331-4_11
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3330-7
Online ISBN: 978-90-481-3331-4
eBook Packages: Computer ScienceComputer Science (R0)