Skip to main content

Mark-up Driven Strategies for Text-to-Hypertext Conversion

  • Chapter
  • First Online:
Linguistic Modeling of Information and Markup Languages

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 41))

  • 728 Accesses

Abstract

The paper describes an approach that converts the structure of sequential text types – text books, scientific papers – into hypertext networks. Using XML as the technical basis, the approach implements sets of rules which automatically generate hypertext views as additional layers while preserving the original sequence and content of the sequential documents. These rules process information of mark-up at different annotation layers: the document structure layer, the terms and definitions layer, the thematic structure layer and the cohesion layer. In addition, the semantics of technical terms in these domains are represented in a WordNet-style semantic network. This lexical representation is used to link technical terms with their definitions and for generating a glossary that is linked to the terms in the corpus documents. Feasability and performance of the approach was evaluated using a German corpus with documents from the domains of text technology and hypertext research. This paper concentrates on the conversion methodology and its linguistic background; the related paper by Lenz (in this volume) focuses on implementation issues and presents a specialized hypertext transformation language that she has developed in this framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Beißwenger, M. (2004). Annotation definitorischer Textsegmente und “terminologiesensitives Linking”. Arbeitsbericht, Forschergruppe Texttechnologische Informationsmodellierung.

    Google Scholar 

  • Beißwenger, M., Lenz, E. A., and Storrer, A. (2002). Generierung von Linkangeboten zur Rekonstruktion terminologiebedingter Wissensvoraussetzungen. In Busemann, S., editor, KONVENS 2002. 6. Konferenz zur Verarbeitung natürlicher Sprache. Proceedings, Saarbrücken, 30.09.-02.10.2002, pages 187–191, Saarbrücken.

    Google Scholar 

  • Beißwenger, M., Storrer, A., and Runte, M. (2003). Modellierung eines Terminologienetzes für das automatische Linking auf der Grundlage von WordNet. LDV-Forum, 19(1/2):95–104.

    Google Scholar 

  • Carr, L., Hall, W., Bechhofer, S., and Goble, C. (2001). Conceptual linking: Ontology-based open hypermedia. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, pages 334–342.

    Google Scholar 

  • Fellbaum, C. (1998). WORDNET: An electronic lexical database. MIT Press, Cambridge, MA.

    Google Scholar 

  • Foltz, P. W. (1996). Comprehension, coherence, and strategies in hypertext and linear text. In Rouet, J.-F., Lovonen, J. J., Dillon, A., and Spiro, R. J., editors, Hypertext and Cognition, pages 109–136. Lawrence Erlbaum Associates Publishers, Mahwah/New Jersey.

    Google Scholar 

  • Fritz, G. (1999). Coherence in hypertext. In Bublitz, W., Lenk, U., and Ventola, E., editors, Coherence in Spoken and Written Discourse, Pragmatics and Beyond New, pages 221–232. John Benjamins, Amsterdam/Philadelphia.

    Google Scholar 

  • Hammwöhner, R. (1997). Offene Hypertextsysteme. Das Konstanzer Hypertextsystem (KHS) im wissenschaftlichen und technischen Kontext. Konstanzer Universitätsverlag, Konstanz.

    Google Scholar 

  • Hammwöhner, R. (1990). Macro-operations for hypertext construction. In Jonassen, D. H. and Mandl, H., editors, Designing Hypermedia for Learning, pages 71–96. Springer, Berlin.

    Google Scholar 

  • Hoffmann, L. (2000). Thema, Themenentfaltung, Makrostruktur. In Brinker, K., Antos, G., Heinemann, W., and Sager, S. F., editors, Text- und Gesprächslinguistik – ein internationales Handbuch zeitgenössischer Forschung, volume 16.1 of Handbücher zur Sprach- und Kommunikationswissenschaft, pages 344–356. de Gruyter, Berlin/ New York.

    Google Scholar 

  • Holler, A. (2003). Spezifikation für ein Annotationsschema für Koreferenzphänomene im Hinblick auf Hypertextualisierungsstrategien. Technical report, Forschergruppe Texttechnologische Informationsmodellierung.

    Google Scholar 

  • Holler, A., Maas, J. F., and Storrer, A. (2004). Exploiting coreference annotations for text-to-hypertext conversion. In Proceedings of the Fourth International Conference on Language Resources and Evaluation LREC 2004, Lisboa, pages 655–658.

    Google Scholar 

  • Kuhlen, R. (1991). Hypertext. Ein nicht-lineares Medium zwischen Buch und Wissensbank. Springer, Berlin/Heidelberg/New York.

    Google Scholar 

  • Kunze, C. and Wagner, A. (2001). Anwendungsperspektiven des GermaNet, eines lexikalisch-semantischen Netzes für das Deutsche. In Lemberg, I., editor, Chancen und Perspektiven computergestützter Lexikographie, pages 229–246. Niemeyer, Tübingen.

    Google Scholar 

  • Lenz, E. A. and Storrer, A. (2002). Converting a corpus into a hypertext: An approach using XML topic maps and XSLT. In Proceedings of the Third International Conference on Language Resources and Evaluation LREC 2002, Las Palmas, pages 432–436.

    Google Scholar 

  • Lenz, E. A., Birkenhake, B., and Maas, J. F. (2003). Von der Erstellung bis zur Nutzung: Wortnetze als XML Topic Maps. LDV-Forum, 19(1/2):113–125.

    Google Scholar 

  • Mayfield, J. (1997). Two-level models of hypertext. In Nicholas, C. K. and Mayfield, J., editors, Intelligent Hypertext, volume 1326 of Lecture Notes in Computer Science, pages 90–108. Springer, New York.

    Google Scholar 

  • Miller, G. A. (1998). Nouns in WordNet. In WORDNET: An electronic lexical database, pages 23–46. MIT Press, Cambridge, MA

    Google Scholar 

  • Müller, F. H. (2004). Stylebook for the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/ stylebook-04.pdf

  • Pepper, S. and Moore, G. (2001). XML Topic Maps (XTM) 1.0. TopicMaps.org Specification. http://www.topicmaps.org/xtm/1.0/.

  • Storrer, A. (2002). Coherence in text and hypertext. Document Design, 3(2):156–168.

    Google Scholar 

  • Tochtermann, K. (1995). Ein Modell für Hypermedia: Beschreibung und integrierte Formalisierung wesentlicher Hypermediakonzepte. Aachen, Shaker.

    Google Scholar 

  • Witt, A., Goecke, D., Sasaki, F., and Lüngen, H. (2005). Unification of XML documents with concurrent markup. Literary and Linguistic Computing, 20(1):103–116.

    Article  Google Scholar 

  • Zifonun, G., Hoffmann, L., and Strecker, B., editors (1997). Grammatik der deutschen Sprache. de Gruyter, Berlin/New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelika Storrer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Storrer, A. (2010). Mark-up Driven Strategies for Text-to-Hypertext Conversion. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3331-4_11

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3330-7

  • Online ISBN: 978-90-481-3331-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics