This guide summarizes the functionality of WebAnno from the user’s perspective.

It is assumed that you plan to test the WebAnno standalone version or an already existing server installation of WebAnno. For information on how to set up WebAnno for a group of users on a server, please refer to the Administrator Guide.

All materials, including this guide, are available via the WebAnno homepage.


Getting started

In order to run WebAnno, you need to have Java installed on your system in version 8 or higher. If you do not have Java installed yet, please the latest Oracle Java or OpenJDK.

Download the stand-alone JAR from the WebAnno downloads page.

Start the application simply by double-clicking on the download JAR file in your file manager. After a moment, a splash screen will be displayed while the application is initializing. Once the initialization is complete, a dialog will appear which you can use to open the application in your default browser or to shut down the application.

Alternatively, you can start it on the command line using

$ java -jar webanno-4.0.0-SNAPSHOT.standalone.jar

The splash screen and dialog will not appear in this case and you have to manually point your browser at http://localhost:8080.

The first time you start the application, a default user with the name admin and the password admin is created. Use this username and password to log in to the application after opening it in your browser.

Your first annotation

After logging in, you will see the main menu. Click on Projects to go to the project management page and there click on create to start a new project. Enter a name for your project, select the type annotation project and press save.

Next, switch to the Documents tab and press choose files to select a plain text file from your local harddisk (the file should be in UTF-8 encoding). As part of the import process, WebAnno automatically processes the file to identify sentence and token boundaries.

Use the Home link at the top of the screen to return to the main menu and select Annotation to open your text file in the annotation editor.

To create your first annotation, select Named entity from the Layer dropdown menu on the right side of the screen. Then, use the mouse to select a word in the Annotation area. When you release the mouse button, the annotation is immediately created and you can edit its details in the right sidebar.

Congratulations! You have created your first annotation project.

Where to go from here

To familiarize yourself with the functionalities of WebAnno, try importing some of the WebAnno downloads page[example projects].

Running WebAnno in the way you just did is a great way to get started and try out its capabilities, but it is not the best way of working with the application. If you like WebAnno, please be sure to check out the Administrator Guide to learn how to set up a production-ready instance.

By default, WebAnno creates and uses an embedded database and stores all its data in folder called .webanno (dot webanno) within your home folder. While this allows you to get started very quickly in trying out the application, it is not a recommended configuration for serious use. For production use, please configure WebAnno to use a database server when using it in production. For more information, please refer to the Administrator Guide.

By default the server starts on port 8080 and you can access it via a browser at http://localhost:8080 after you started it. But if you already have a service running on that port, you can add the parameter -Dserver.port=9999 at the end of the command line to start the server on port 9999 (or choose any other port).


The following image shows an exemplary workflow of an annotation project with WebAnno.

progress workflow

First, the projects need to be set up. In more detail, this means that users are to be added, guidelines need to be provided, documents have to be uploaded, tagsets need to be defined and uploaded, etc. The process of setting up and managing a project are explicitly described in Projects.

After the setup of a project, the users who were assigned with the task of annotation annotate the documents according to the guidelines. The task of annotation is further explained in Annotation. The work of the annotators is managed and controlled by monitoring. Here, the person in charge has to assign the workload. For example, in order to prevent redundant annotation, documents which are already annotated by several other annotators and need not be annotated by another person, can be blocked for others. The person in charge is also able to follow the progress of individual annotators. All these tasks are demonstrated in Monitoring in more detail. The person in charge should not only control the quantity, but also the quality of annotation by looking closer into the annotations of individual annotators. This can be done by logging in with the credentials of the annotators.

After at least two annotators have finished the annotation of the same document by clicking on Done, the curator can start his work. The curator compares the annotations and corrects them if needed. This task is further explained in Curation.

The document merged by the curator can be exported as soon as the curator clicked on Done for the document. The extraction of curated documents is also explained in Projects.

Core functionalities

Logging in

Upon opening the application in the browser, the login screen opens. Please enter your credentials to proceed.

When WebAnno is started for the first time, a default user called admin with the password admin is automatically created. Be sure to change the password for this user after logging in (see User Management).
version3 login

Menu bar

At the top of the screen, there is always a menu bar visible which allows a quick navigation within the application. It offers the following items:

  • Home - always takes you back to the main menu.

  • Help - opens the integrated help system in a new browser window.

  • Username - shows the name of the user currently logged in. If the administrator has allowed it, this is a link which allows accessing the current user’s profile, e.g. to change the password.

  • Log out - logs out of the application.

  • Timer - shows the remaining time until the current session times out. When this happens, the browser is automatically redirected to the login page.

Main Menu

After login, you will be presented with the overview screen. This screen can be reached at any time from within the GUI by clicking on the Home link in the left upper corner.

Here, you can navigate to one of the currently seven options:

  • Annotation - The page to perform annotations

  • Curation - Compare and merge annotations from multiple users (only for curators)

  • Correction - Correcting automatic annotation (under development)

  • Automation - Creating automatically annotated data

  • Projects - Set up or change annotation projects (only for administrators and managers)

  • Monitoring - Allows you to see the projects, their progress and change document status (only for managers and curators)

  • User Management - Allows you to manage the rights of users

Please click on the functionality you need. The individual functionalities will be explained in further chapters.


This functionality is only available to annotators and managers. Annotators and managers only see projects in which they hold the respective roles.

The annotation screen allows to view text documents and to annotate them.

Opening a Document

When navigating to the Annotation page, a dialogue opens that allows you to select a project, and a document within the project. If you want to open a different project or document later, click on Open to open the dialog.

open doc

Projects appear as folders, and contain the documents of the project. Double-click on a document to open it for annotation. Document names written in black show that the document has not been opened by the current user, blue font means that it has already been opened, whereas red font indicates that the document has already been marked as done.

Users that are managers can additionally open other users' documents to view their annotations but cannot change them. This is done by selecting the project, user and then document in the described dialogue. The user’s own name is listed at the top and marked (me).

open doc manager


Sentence numbers on the left side of the annotation page show the exact sentence numbers in the document.


The arrow buttons first page, next page, previous page, last page, and go to page allow you to navigate accordingly. The Prev. and Next buttons in the Document frame allow you to go to the previous or next document on your project list. You can also use the following keyboard assignments in order to navigate only using your keyboard.

Table 1. Navigation key bindings
Key Action


jump to first sentence


jump to last sentence


move to the next page, if not in the last page already


move to previous page, if not already in the first page


go to next document in project, if available


go to previous document in project, if available

A click on the Help button displays the Guidelines for the tool and The Annotator’s Guide to NER-Annotation. When you are finished with annotating or curating a document, please click on the Done button, so that the document may be further processed. If the button above the Done is a cross symbol, it means the documents have already been finished. If the symbol has a tick, it is still open.


Annotation of spans works by selecting the span, or double-clicking on a word. This activates the Actions-box on the right, where you can choose a layer. One can also type in the initial letters and chose the needed layer. After having chosen a layer, the drop-down menu inside the Features-box displays the features you can use during the annotation. The tag can be selected out of the drop-down menu inside the Features-box which contains the tags of the chosen layer.

annotation edit version3

To change or delete an annotation, double-click on the annotation (span or link annotations). The Actions-box is now activated. Changes and Deletions are possible via the respective buttons.

Link annotations (between POS tags) are created by selecting the starting POS-tag, then dragging the arrow to connect it to its target POS tag. All possible targets are highlighted.

annotation pos span

Creating annotations

The Layer box in the right sidebar shows the presently active layer span layer. To create a span annotation, select a span of text or double click on a word.

If a relation layer is defined on top of a span layer, clicking on a corresponding span annotation and dragging the mouse creates a relation annotation.

Once an annotation has been created or if an annotation is selected, the Annotation box shows the features of the annotation.

The result of changing the active layer in the Layer box while an annotation is selected depends on the Remember layer setting. If this setting is disabled, changing the active layer causes the currently selected annotation to be deleted and replaced with an annotation of the selected layer. In this mode, it is necessary to unselect the current annotation by pressing the Clear button before an annotation on another layer can be created. If Remember layer is enabled, changing the active layer has no effect on the currently selected annotation.

The definition of layers is covered in Section Layers.


To create an annotation over a span of text, click with the mouse on the text and drag the mouse to create a selection. When you release the mouse, the selected span is activated and highlighted in orange. The annotation detail editor is updated to display the text you have currently selected and to offer a choice on which layer the annotation is to be created. As soon as a layer has been selected, it is automatically assigned to the selected span. To delete an annotation, select a span and click on Delete. To deactivate a selected span, click on Clear.

Depending on the layer behavior configuration, span annotations can have any length, can overlap, can stack, can nest, and can cross sentence boundaries.


For example, for NE annotation, select the options as shown below (red check mark):


NE annotation can be chosen from a tagset and can span over several tokens within one sentence. Nested NE annotations are also possible (in the example below: "Frankfurter" in "Frankfurter FC").

annotation ner

Lemma annotation, as shown below, is freely selectable over a single token.

annotation lemma

POS can be chosen over one token out of a tagset.

annotation pos
Zero-width spans

To create a zero-length annotation, hold SHIFT and click on the position where you wish to create the annotation. To avoid accidental creations of zero-length annotations, a simple single-click triggers no action by default. The lock to token behavior cancels the ability to create zero-length annotations.

A zero-width span between two tokens that are directly adjacent, e.g. the full stop at the end of a sentence and the token before it (end.) is always considered to be at the end of the first token rather than at the beginning of the next token. So an annotation between d and . in this example would be rendered at the right side of end rather than at the left side of ..

Co-reference annotation can be made over several tokens within one sentence. A single token sequence can have several co-ref spans simultaneously.

Forward annotation mode

The forward annotation mode is useful for annotation tasks where every single token should be receiving an annotation - typical examples are part-of-speech of lemma annotation. When this mode is enabled, completing an annotation automatically creates another annotation on the next token.

The forward annotation mode is available for layers fulfilling the following conditions:

  • the layer is a span layer

  • the layer anchors to single tokens

  • there is exactly one enabled and visible String feature

  • if the feature uses a tagset, this tagset must be non-empty

If the feature is associated with a tagset, you will notice that the cursor does not jump to the feature editor when an annotation is created. This is intentional. Assume your tagset includes the tags ADJ, ADV and NOUN. When you press the A key once, the feature editor loads the first value starting with that letter, i.e. ADJ. Press A again to move to the second tag ADV. Pressing a key multiple times cycles through all the tags starting with the respective letter. Thus, pressing A a third time loads ADJ again. Pressing N loads NOUN.

If the feature is associated with a tagset, pressing the BACKSPACE key deletes the current annotation and moves on. Otherwise the current annotation is removed if the feature editor is empty (no value entered) when completing the annotation.

Press ENTER to complete the annotation and to move on to the next token. In the general case, a new annotation is created on the next token and it is loaded into the feature editor. However, if it is not permitted to stack annotations or make them overlap (i.e. if the value of the behaviour setting for Overlap is set to anything else than Any) and if there is already an annotation on that token, this would naturally fail. In this case, no new annotation is created and instead an existing annotation is opened for editing.

key binding POS

If the Remember layer setting is turned on, it is possible to select and edit an annotation which is not on the layer for which the forward-mode is enabled. In this case, the forward-mode is paused and the user can edit the annotation normally. The forward-mode resumes when the user creates a new annotation or selects an annotation on the forward-enabled layer.


In order to create relation annotation, a corresponding relation layer needs to be defined between the source and target spans. If this is the case, there are two ways of creating a relation:

  • for short-distance relations, you can conveniently create relation by left-clicking on a span and while keeping the mouse button pressed moving the cursor over to the target span. A rubber-band arc is shown during this drag-and-drop operation to indicate the location of the relation. To abort the creation of an annotation, hold the CTRL key when you release the mouse button.

  • for long-distance relations, first select the source span annotation. Then locate the target annotation. You can scroll around or even switch to another page of the same document - just make sure that your source span stays selected in the annotation detail editor panel on the right. Once you have located the target span, right-click on it and select Link to…​. Mind that long-ranging relations may not be visible as arcs unless both the source and target spans are simultaneously visible (i.e. on the same "page" of the document). So you may have to increase the number of visible rows in the settings dialog to make them visible.

Navigating along relations

When a relation annotation is selected, the annotation detail panel includes two fields From and To which indicate the origin and target annotations of the relation. These fields include a small cross-hair icon which can be used to jump to the respective annotations.

When a span annotation is selected, and incoming or outgoing relations are also shown in the annotation detail panel. Here, the cross-hair icon can be used to jump to the other endpoint of the relation (i.e. to the other span annotation). There is also an icon indicating whether the relation is incoming to the selected span annotation or whether it is outgoing from the current span. Clicking on this icon will select the relation annotation itself.

Depending on the layer behavior configuration, relation annotations can stack, can cross each other, and can cross sentence boundaries.

Self-looping relations

To create a relation from a span to itself, press the SHIFT key before starting to drag the mouse and hold it until you release the mouse button. Or alternatively select the span and then right-click on it and select Link to…​.

Currently, there can be at most one relation layer per span layer. Relations between spans of different layers are not supported.
Not all arcs displayed in the annotation view are belonging to chain or relation layers. Some are induced by Link Features.

When moving the mouse over an annotation with outgoing relations, the info pop-up includes the yield of the relations. This is the text transitively covered by the outgoing relations. This is useful e.g. in order to see all text governed the head of a particular dependency relation. The text may be abbreviated.

annotation relation yield
Figure 1. Example of the yield of a dependency relation


A chain layer includes both, span and relation annotations, into a single structural layer. Creating a span annotation in a chain layer basically creates a chain of length one. Creating a relation between two chain elements has different effects depending on whether the linked list behavior is enabled for the chain layer or not. To enable or disable the linked list behaviour, go to Layers in the Projects Settings mode. After choosing Coreference, linked list behaviour is displayed in the checkbox and can either be marked or unmarked.

LinkedList 1
Figure 2. Configuration of a chain layer in the project settings
annotation span many
Figure 3. Example of chain annotations

To abort the creation of an annotation, hold CTRL when you release the mouse button.

Table 2. Chain behavior
Linked List Condition Result


the two spans are already in the same chain

nothing happens


the two spans are in different chains

the two chains are merged


the two spans are already in the same chains

the chain will be re-linked such that a chain link points from the source to the target span, potentially creating new chains in the process.


the two spans are in different chains

the chains will be re-linked such that a chain link points from the source to the target span, merging the two chains and potentially creating new chains from the remaining prefix and suffix of the original chains.

Primitive Features

Supported primitive features types are string, boolean, integer, and float. Boolean features are displayed as a checkbox that can either be marked or unmarked. Integer and float features are displayed using a number field. However if an integer feature is limited and the difference between the maximum and minimum is lower than 12 it can also be displayed with a radio button group instead. String features are displayed using a text field or a text area with multiple rows. If multiple rows are enabled it can either be dynamically sized or a size for collapsing and expanding can be configured. The multiple rows, non-dynamic text area can be expanded if focused and collapses again if focus is lost. In case the string feature has a tagset it will instead use a combobox.

Link features can be used to link one annotation to others. Before a link can be made, a slot must be added. If role labels are enabled enter the role label in the text field and press the add button to create the slot. Next, click on field in the newly created slot to arm it. The field’s color will change to indicate that it is armed. Now you can fill the slot by double-clicking on a span annotation. To remove a slot, arm it and then press the del button.

Navigating along links

Once a slot has been filled, there is a cross-hair icon in the slot field header which can be used to navigate to the slot filler.

When a span annotation is selected which acts as a slot filler in any link feature, then the annotation owning the slow is shown in the annotation detail panel. Here, the cross-hair icon can be used to jump to the slot owner.

Role labels

If role labels are enabled they can be changed by the user at any time. To change a previously selected role label, no prior deletion is needed. Just double-click on the instance you want to change, it will be highlighted in orange, and chose another role label.

If role labels are disabled for the link feature layer they cannot be manually set by the user. Instead the UI label of the linked annotation is displayed.


Once the document is opened, a default of 5 sentences is loaded on the annotation page. The Settings button will allow you to specify the settings of the annotation layer.

annotation settings

The Editor setting can be used to switch between different modes of presentation. It is currently only available on the annotation page.

The Sidebar size controls the width of the sidebar containing the annotation detail edtior and actions box. In particular on small screens, increasing this can be useful. The sidebar can be configured to take between 10% and 50% of the screen.

The Font zoom setting controls the font size in the annotation area. This setting may not apply to all editors.

The Page size controls how many sentences are visible in the annotation area. The more sentences are visible, the slower the user interface will react. This setting may not apply to all editors.

The Remember layer checkbox controls if the annotation layer selected in the Actions box. It will work as main layer during the annotation process. Only instances of this layer will be created, even if an annotation in another layer is selected. If necessary, it is possible to change active instances. Still, if a new instance is selected, the main layer is automatically activated.

The Auto-scroll setting controls if the annotation view is centered on the sentence in which the last annotation was made. This can be useful to avoid manual navigation. This setting may not apply to all editors.

The Collaps arcs setting controls whether long ranging relations can be collapsed to save space on screen. This setting may not apply to all editors.

The Read-only palette controls the coloring of annotations on read-only layers. This setting overrides any per-layer preferences.

Layer preferences

In this section you can select which annotation layers are displayed during annotation and how they are displayed.

Hiding layers is useful to reduce clutter if there are many annotation layers. Mind that hiding a layer which has relations attached to it will also hide the respective relations. E.g. if you disable POS, then no dependency relations will be visible anymore.

The Palette setting for each layer controls how the layer is colored. There are the following options:

  • static / static pastelle - all annotations receive the same color

  • dynamic / dynamic pastelle - all annotations with the same label receive the same color. Note that this does not imply that annotations with different labels receive different colors.

  • static grey - all annotations are grey.

Mind that there is a limited number of colors such that eventually colors will be reused. Annotations on chain layers always receive one color per chain.


Annotations are always immediately persistent in the backend database. Thus, it is not necessary to save the annotations explicitly. Also, losing the connection through network issues or timeouts does not cause data loss. To obtain a local copy of the current document, click on export button. The following frame will appear:

annotation export

Choose your preferred format. Please take note of the facts that the plain text format does not contain any annotations and that the files in the binary format need to be unpacked before further usage. For further information the supported formats, please consult the corresponding chapter Formats.

The document will be saved to your local disk, and can be re-imported via adding the document to a project by a project manager. Please export your data periodically, at least when finishing a document or not continuing annotations for an extended period of time.


This functionality is only available to curators.

When navigating to the Curation Page, the procedure for opening projects and documents is the same as in Annotation. The navigation within the document is also equivalent to Annotation.

Table 3. Explanation of the project colors in the curation open document dialog

No curatable documents


Curatable documents


Table 4. Explanation of the document colors in the curation open document dialog



Annotation in progress


Curation in progress


Curation finished


In the left frame of the window, named Sentences, an overview of the chosen document is displayed. Sentences are represented by their number inside the document. Sentences containing a disagreement between annotators are colored in red. Click on a sentence in order to select it and to to edit it in the central part of the page.

curation 1

The center part of the annotation page is divided into the Annotation pane which is a full-scale annotation editor and contains the final data from the curation step. Below it are multiple read-only panes containing the annotations from individual annotators. Clicking on an annotation in any of the annotator’s panes transfers the respective annotation to the Annotation pane.

When a document is opened for the first time in the curation page, the application analyzes agreements and disagreemens between annotators. All annotations on which all annotators agree are automatically copied to the Annotation pane. Any annotations on which the annotators disagree are skipped.

The annotator’s panes are color-coded according to their relation with the contents of the Annotation pane and according to the agreement status. If the annotations were the same, they are marked grey in the lower panels. If the annotators do not agree, the respective annotations are show in dark blue in the lower panels. By default, they are not taken into the merged file (cf. Merging strategies).

Left-click on an annotation in one of the lower panels to merge it. This action copies the annotation to the upper panel. The merged annotation will turn green in the lower panel from which it was selected. If other annotators had a conflicting opinion, these will turn red in the lower panels of the respective annotators.

Right-click on an annotation in the lower panels to bring up a menu with additional options.

  • Merge all XXX: merge all annotations of the given type from the selected annotator. Note that this overrides any annotations of the type which may previously have been merged or manually created in the upper panel.

The upper Annotation pane is not color-coded. It uses whatever coloring strategy is configured in the Settings dialog.
Table 5. Explanation of the annotation colors in the annotator’s panes (lower panes)


all annotators agree


disagreement requiring curation; annotators disagree and there is no corresponding annotation in the upper Annotation pane yet


accepted; matches the corresponding annotation in the upper Annotation pane


rejected; different to the corresponding annotation in the upper Annotation pane

Merging strategies

By default, the merging strategy only considers annotations if all annotators made the same annotation at the same location (i.e. complete and agreeing annotations) - i.e. it considers any annotations not provided by all annotators as a disagreement between the annotators.

However, there are situations where it is desirable to merge annotations from all annotators, even if some did not provide it. For example, if your project has two annotators, one working on POS tagging and another working on lemmatization, then as a curator, you might simply want to merge the annotators from the two. This can be done by using the Re-merge action and activating the checkbox Merge incomplete annotations. This will re-merge the current document (i.e. discard the entire state of curation and merge from scratch).

Anonymized curation

By default, the curator can see the annotators names on the curation page. However, in some cases, it may not be desirable for the curator to see the names. In this case, enable the option Anonymous curation in the project detail settings. Users with the curator role will then only see an anonymous label like Anonymized annotator 1 instead of the annotator names. Users who are project managers can still see the annotator names.

The order of the annotators is not randomized - only the names are removed from the UI. Only annotators who have marked their documents as finished are shown. Thus, which annotator recieves which number may changed depending on documents being marked as finished or put back into progress.


This functionality is only available to curators and managers.

This page allows to observe the progress and document status of projects you are responsible for. Moreover, you are able to see the time of the last login of every user. After clicking on Monitoring in the main menu, the following page is displayed:


In the right frame, the overall progress of all projects is displayed. In the left frame you can see all projects that you are allowed to curate or manage. By clicking on one of the projects on the left, it may be selected and the following view is opened:


The percental progress out of the workload for individual annotators may be viewed as well as the number of finished documents. The table below these statistics shows the document status of each document for each user via a symbol.

The following table explains the different possible symbols:

Table 6. Document Status
Symbol Meaning

icon new

Annotation has not started yet

icon locked

Document not available to user

icon annotation in progress

Annotation is in progress

icon done

Annotation is complete

icon curation in progress

Curation is in progress

You can also alter the document status of annotators. By clicking on the symbols you can change between Done and In Progress. You can also alter between New and Locked status. The second column of the document status frame displays the status of the curation.

As there is only one curator for one document, curation is not divided into individual curators.


This functionality is only available to curators and managers.

This page allows you to calculate inter-annotator agreement between users. Agreement can be inspected on a per-feature basis and is calculated pair-wise between all annotators across all documents.

The Feature dropdown allows the selection of layers and features for which an agreement shall be computed.

agreement feature

A measure for the inter-annotator-agreement can be selected by opening the Measure dropdown menu. A short description of available measures and their differences follows in the Measures section.

Selecting both Feature and Measure and clicking Calculate.. will start the agreement calculation and the results will be shown in a Pairwise agreement matrix:

agreement table

Below this matrix, the results of Coding measures can be exported. An export format can be chosen. Currently, only CSV format is possible.


Several agreement measures are supported.

Table 7. Supported agreement measures
Measure Type Short description

Cohen’s kappa


Chance-corrected inter-annotator agreement for two annotators. The measure assumes a different probability distribution for all raters. Incomplete annotations are always excluded.

Fleiss' kappa


Generalization of Scott’s pi-measure for calculating a chance-corrected inter-rater agreement for multiple raters, which is known as Fleiss' kappa and Carletta’s K. The measure assumes the same probability distribution for all raters. Incomplete annotations are always excluded.

Krippendorff’s alpha (nominal)


Chance-corrected inter-rater agreement for multiple raters for nominal categories (i.e. categories are either equal (distance 0) or unequal (distance 1). The basic idea is to divide the estimated variance of within the items by the estimated total variance.

Krippendorff’s alpha (unitizing)


Chance-corrected inter-rater agreement for unitizing studies with multiple raters. As a model for expected disagreement, all possible unitizations for the given continuum and raters are considered. Note that units coded with the same categories by a single annotator may not overlap with each other.

Coding vs. Unitizing

Coding measures are based on positions. I.e. two annotations are either at the same position or not. If they are, they can be compared - otherwise they cannot be compared. This makes coding measures unsuitable in cases where partital overlap of annotations needs to be considered, e.g. in the case of named entity annotations where it is common that annotators do not agree on the boundaries of the entity. In order to calculate the positions, all documents are scanned for annotations and annotations located at the same positions are collected in configuration sets. To determine if two annotations are at the same position, different approaches are used depending on the layer type. For a span layer, the begin and end offsets are used. For a relation layer, the begin and end offsets of the source and target annotation are used. Chains are currently not supported.

Unitizing measures basically work by internally concatenating all documents into a single long virtual document and then consider partial overlaps of annotations from different annotations. I.e. there is no averaging over documents. The partial overlap agreement is calculated based on character positions, not on token positions. So if one annotator annotates the blackboard and another annotator just blackboard, then the partial overlap is comparatively high because blackboard is a longish word. Relation and chain layers are presently not supported by the unitizing measures.

Incomplete annotations

When working with coding measures, there is the concept of incomplete annotations. For a given position, the annotation is incomplete if at least one annotator has not provided a label. In the case of the pairwise comparisons that are used to generate the agreement table, this means that one annotator has produced a label and the other annotator has not. Due to the way that positions are generated, it also means that if one annotator annotates the blackboard and another annotator just blackboard, we are actually dealing with two positions (the blackboard, offsets 0-15 and blackboard, offsets 4-14), and both of them are incompletely annotated. Some measurs cannot deal with incomplete annotations because they require that every annotator has produced an annotation. In these cases, the incomplete annotations are excluded from the agreement calculation. The effect is that in the (the) blackboard example, there is actually no data to be compared. If we augment that example with some other word on which the annotators agree, then only this word is considered, meaning that we have a perfect agreement despite the annotators not having agreed on (the) blackboard. Thus, one should avoid measure that cannot deal with incomplete annotations such as Fleiss' kappa and Cohen’s kappa except for tasks such as part-of-speech tagging where it is known that positions are the same for all annotators and all annotators are required (not expected) to provide an annotation.

The agreement calculations considers an unset feature (with a null value) to be equivalent to a feature with the value of an empty string. Empty strings are considered valid labels and are not excluded from agreement calculation. Thus, an incomplete annotation is not one where the label is missing, but rather one where the entire annotation is missing.

In general, it is a good idea to use at least a measure that supports incomplete data (i.e. missing labels) or even a unitizing measure which is able to produce partial agreement scores.

Table 8. Possible combinations for agreement
Feature value annotator 1 Feature value annotator 2 Agreement Complete









no annotation

















no annotation



Stacked annotations

Multiple interpretations in the form of stacked annotations are not supported in the agreement calculation! This also includes relations for which source or targets spans are stacked.

Pairwise agreement matrix

The lower part of the agreement matrix displays how many configuration sets were used to calculate agreement and how many were found in total. The upper part of the agreement matrix displays the pairwise agreement scores.

Annotations for a given position are considered complete when both annotators have made an annotation. Unless the agreement measure supports null values (i.e. missing annotations), incomplete annotations are implicitly excluded from the agreement calculation. If the agreement measure does support incomplete annotations, then excluding them or not is the users' choice.


This functionality is only available to managers of existing projects, project creators (users with the ability to create new projects), and administrators. Project managers only see projects in which they hold the respective roles. Project creators only see projects in which they hold the project manager role.

This is the place to specify/edit annotation projects. You can either select one of the existing projects for editing, or click Create Project to add a project.

Click on Create Project to create a new project.


After doing so, a new pane is displayed, where you can name and describe your new project. It is also important to chose the kind of project you want to create. You have the choice between annotation, automation, and correction. Please do not forget to save.


After saving the details of the new project, it can be treated like any other already existing one. Also, a new pane with many options to organize the project is displayed.


To delete a project, click on it in the frame Details. The project details are displayed. Now, click on Delete.

The pane with the options to organize and edit a project, as described above, can also be reached by clicking on the desired project in the left frame.


By clicking on the tabs, you can now set up the chosen project.


Here, you can import project archives such as the example projects provided on our website or projects exported from the Export tab.

When a user with the role project creator imports a project, that user automatically becomes a manager of the imported project. However, no permissions for the project are imported!

If the current instance has users with the same name as those who originally worked on the import project, the manager can add these users to the project and they can access their annotations. Otherwise, only the imported source documents are accessible.

When a user with the role administrator imports a project, the user can choose whether to import the permissions and whether to automatically create users who have permissions on the imported project but so far do not exist. If this option to create missing users disabled, but the option to import permissions is enabled, then projects still maintain their association to users by name. If the respective user accounts are created manually after the import, the users will start showing up in the projects.

Automatically added users are disabled and have no password. They must be explicitly enabled and a password must be set before the users can log in.


After clicking on the Users tab, you are displayed with a new pane in which you can add new users by clicking on the Add users text field. You get a dropdown list of enabled users in the system which can be added to the project. Any users which are already part of the project are not offered. As you type the dropdown list with the users is filtered to match your input. By clicking on a username or by pressing enter you can select the corresponding user. You can keep typing to add more users to the project. When you press the Add button the selected users are added to your project.

new userSelection
For privacy reasons, the administrator may choose to restrict the users shown in the dropdown. If this is the case, you have to enter the full name of a user before it appears in the dropdown and can be added.

By default, the users are added to the project as annotators. If you want to assign additional roles, you can do so by clicking on the user and then on Permissions pane select the appropriate permissions.

new userPermissions

After ticking the wished permissions, click on Save. To remove a user, remove all the permissions and then click on Save.


To add or delete documents, you have to click on the tab Documents in the project pane. Two frames will be displayed. In the first frame you can import new documents.


Open the upload dialog by on Choose Files. If you browser supports uploading multiple documents, you can usually either select a range of files or multiple files individually. To select a range, click on the fist document, press SHIFT and then clicking on the last document in the desired range. To (de)select individual files, press CTRL and then the file you want to (de)select.

Select the Format corresponding to the documents you upload. If you are uploading multiple documents at once, the all must have the same format.

Then click on Import Document. The imported documents can be seen in the list below. To delete a document from the project, you have to click on it and then click on Delete in the right lower corner.

Uploading large numbers of documents

While it is possible to upload multiple documents at once, there are limits to how many documents can be uploaded in a single upload operation. For a start, it can take quite some time to upload thousands of documents. Also, the server configuration limits the individual file size and total batch size (the default limit is 100MB for both). Finally, browsers differ in their capability of dealing with large numbers of documents in an upload. In a test with 5000 documents of each ca. 2.5kb size including Chrome, Safari and Firebird, only Chrome (80.0.3987.122) completed the operation successfully. Safari (13.0.5) was only able to do upload about 3400 documents. Firebird (73.0.1) froze during the upload and was unable to deliver anything to the server. With a lower number of documents (e.g. 500), none of the browsers had any problems.


All annotations belong to an annotation layer. Each layer has a structural type that defines if it is a span, a relation, or a chain. It also defines how the annotations behave and what kind of features it carries.

Creating a custom layer

This section provides a short walk-through on the creation of a custom layer. The following sections act as reference documentation providing additional details on each step. In the following example, we will create a custom layer called Sentiment with a feature called Polarity that can be negative, neutral, or positive.

  1. Create the layer Sentiment

    • Go to the Layers tab in your project’s settings and press the Create layer button

    • Enter the name of the layer in Layer name: Sentiment

    • Choose the type of the layer: Span

    • Enable Allow multiple tokens because we want to mark sentiments on spans longer than a single token.

    • Press the Save layer button

  2. Create the feature Polarity

    • Press the New feature button

    • Choose the type of the feature: Primitive: String

    • Enter the name of the feature: Polarity

    • Press Save feature

  3. Create the tagset Polarity values

    • Go to the Tagsets tab and press Create tagset

    • Enter the name of the tagset: Polarity values

    • Press Save tagset

    • Press Create tag, enter the name of the tag: negative, press Save tag

    • Repeat for neutra and positive

  4. Assign the tagset Polarity values to the feature Polarity

    • Back in the Layers tab, select the layer: Sentiment and select the feature: Polarity

    • Set the tagset to Polarity values

    • Press Save feature

Now you have created your first custom layer.

Built-in layers

WebAnno comes with a set of built-in layers that allow you to start annotating immediately. Also, many import/export formats only work with these layers as their semantics are known. For this reason, the ability to customize the behaviors of built-in layers is limited and it is not possible to extend them with custom features.

Table 9. Built-in layers
Layer Type Enforced behaviors



Lock to multiple tokens, no overlap, no sentence boundary crossing



(no enforced behaviors)


Relation over POS,

Any overlap, no sentence boundary crossing



Locked to token offsets, no overlap, no sentence boundary crossing

Named Entity


(no enforced behaviors)

Part of Speech (POS)


Locked to token offsets, no overlap, no sentence boundary crossing

The coloring of the layers signal the following:

Table 10. Color legend
Color Description


built-in annotation layer, enabled


custom annotation layer, enabled


disabled annotation layer

To create a custom layer, select Create Layer in the Layers frame. Then, the following frame will be displayed.

Exporting layers

At times, it is useful to export the configuration of a layer or of all layers, e.g. to copy them to another project. There are two options:

  • JSON (selected layer): exports the currently selected layer as JSON. If the layer depends on other layers, these are included as well in the JSON export.

  • UIMA (all layers): exports a UIMA type system description containing all layers of the project. This includes built-in types (i.e. DKPro Core types) and it may include additional types required to allow loading the type system description file again. However, this type system description is usually not sufficient to interpret XMI files produced by WebAnno. Be sure to load XMI files together with the type system description file which was included in the XMI export.

Both types of files can be imported back into WebAnno. Note that any built-in types that have have been included in the files are ignored on import.


Table 11. Properites
Property Description

Layer name

The name of the layer (obligatory)


A description of the layer. This information will be shown in a tooltip when the mouse hovers over the layer name in the annotation detail editor panel.


Whether the layer is enabled or not. Layers can currently not be deleted, but they can be disabled.

When a layer is first created, only ASCII characters are allowed for the layer name because the internal UIMA type name is derived from the initial layer name. After the layer has been created, the name can be changed arbitrarily. The internal UIMA type name will not be updated. The internal UIMA name is e.g. used when exporting data or in constraint rules.
layer details

Technical Properties

In the frame Technical Properties, the user may select the type of annation that will be made with this layer: span, relation, or chain.

Table 12. Technical Properites
Property Description

Internal name

Internal UIMA type name


The type of the layer (obligatory, see below)

Attach to layer (Relations)

Determines which span layer a relation attaches to. Relations can only be created between annotations of this span layer.

The layer type defines the structure of the layer. Three different types are supported: spans, relations, and chains.

Table 13. Layer types
Type Description Example


Continous segment of text delimited by a start and end character offset. The example shows two spans.

project layer type span


Binary relation between two spans visualized as an arc between spans. The example shows a relation between two spans.

project layer type relation


Directed sequence of connected spans in which each span connects to the following one. The example shows a single chain consisting of three connected spans.

project layer type chain

For relation annotations the type of the spans which are to be connected can be chosen in the field Attach to layer. Here, only non-default layers are displayed. To create a relation, first the span annotation needs to be created.

Currently for each span layer there can be at most one relation layer attaching to it.
It is currently not possible to create relations between spans in different layers. For example if you define span layers called Men and Women, it is impossible to define a relation layer Married to between the two. To work around this limitation, create a single span layer Person with a feature Gender instead. You can now set the feature Gender to Man or Woman and eventually define a relation layer Married to attaching to the Person layer.


Table 14. Behaviors
Behavior Description


The layer may be viewed but not edited.


 When pre-annotated data is imported or when the behaviors settings are changed, it is possible that annotations exist which are not conforming to the current behavior settings. This setting controls when a validation of annotations is performed. Possible settings are Never (no validation when a user marks a document as finished) and Always (validation is performed when a user marks a document as finished). Mind that changing the document state via the Monitoring page does not trigger a validation. Also, problematic annotations are highlighted using an error marker in the annotation interface. NOTE: the default setting for new projects/layers is Always, but for any existing projects or for projects imported from versions of WebAnno where this setting did not exist yet, the setting is initialized with Never.

Granularity (span, chain)

The granularity controls at which level annotations can be created. When set to Character-level, annotations can be created anywhere. Zero-width annotations are permitted. When set to Token-level or Sentence-level annotation boundaries are forced to coincide with token/sentence boundaries. If the selection is smaller, the annotation is expanded to the next larger token/sentence covering the selection. Again, zero-width annotations are permitted. When set to Single tokens only may be applied only to a single token. If the selection covers multiple tokens, the annotation is reduced to the first covered token at a time. Zero-width annotations are not permitted in this mode. Note that in order for the Sentence-level mode to allow annotating multiple sentences, the Allow crossing sentence boundary setting must be enabled, otherwise only individual sentences can be annotated.


This setting controls if and how annotations may overlap. For span layers, overlap is defined in terms of the span offsets. If any character offset that is part of span A is also part of span B, then they are considered to be overlapping. If two spans have exactly the same offsets, then they are considered to be stacking. For relation layers, overlap is defined in terms of the end points of the relation. If two relations share any end point (source or target), they are considered to be overlapping. If two relations have exactly the same end points, they are considered to be stacking. Note that some export formats are unable to deal with stacked or overlapping annotations. E.g. the CoNLL formats cannot deal with overlapping or stacked named entities.

Allow crossing sentence boundary (chain)

Allow annotations to cross sentence boundaries.

Behave like a linked list

Controls what happens when two chains are connected with each other. If this option is disabled, then the two entire chains will be merged into one large chain. Links between spans will be changed so that each span connects to the closest following span - no arc labels are displayed. If this option is enabled, then the chains will be split if necessary at the source and target points, reconnecting the spans such that exactly the newly created connection is made - arc labels are available.


layer feature details

In this section, features and their properties can be configured.

When a feature is first created, only ASCII characters are allowed for the feature name because the internal UIMA name is derived from the initial layer name. After the feature has been created, the name can be changed arbitrarily. The internal UIMA feature name will not be updated. The internal UIMA name is e.g. used when exporting data or in constraint rules.
Features cannot be added to or deleted from built-in layers.

The following feature types are supported.

Table 15. Feature types
Type Description


Textual feature that can optionally be controlled by a tagset. It is rendered as a text field or as a combobox if a tagset is defined.


Boolean feature that can be true or false and is rendered as a checkbox.


Numeric feature for integer numbers.


Numeric feature for decimal numbers.

uima.tcas.Annotation (Span layers)

Link feature that can point to any arbitrary span annotation

other span layers (Span layers)

Link feature that can point only to the selected span layer.

Table 16. General feature properties
Property Description

Internal name

Internal UIMA feature name


The type of the feature (obligatory, see below)


The name of the feature (obligatory)


A description that is shown when the mouse hovers over the feature name in the annotation detail editor panel.


Features cannot be deleted, but they can be disabled


Whether the feature value is shown in the annotation label. If this is disabled, the feature is only visible in the annotation detail editor panel.


Whether the annotation detail editor should carry values of this feature over when creating a new annotation of the same type. This can be useful when creating many annotations of the same type in a row.

String features

A string feature uses a simple input field to enter text. When enabling multiple rows, then a multi-line text area is used instead - in this case, no tagset can be assigned to the feature.

If a tagset is assigned, then a combo box or auto-complete input field will be used - depending on the number of tags in the tagset. The instance administrator can globally configure the threshold for switching from a combo box to an auto-complete via the annotation.feature-support.string.autoCompleteThreshold in the file.

Table 17. String feature properties
Property Description


The tagset controlling the possible values for a string feature.

Multiple Rows

If enabled the textfield will be replaced by a textarea which expands on focus. This also enables options to set the size of the textarea and disables tagsets.

Dynamic Size

If enabled the textfield will dynamically resize itself based on the content. This disables collapsed and expanded row settings.

Collapsed Rows

Set the number of rows for the textarea when it is collapsed and not focused.

Expanded Rows

Set the number of rows for the textarea when it is expanded and not focused.

Number features
Table 18. Number feature properties
Property Description


If enabled a minimum and maximum value can be set for the number feature.


Only visible if Limited is enabled. Determines the minimum value of the limited number feature.


Only visible if Limited is enabled. Determines the maximum value of the limited number feature.

Editor Type

Select which editor should be used for modifying this features value.

Boolean features
Table 19. Link feature properties
Property Description


The tagset controlling the possible values for the link roles.

Enable Role Labels

Allows users to add a role label to each slot when linking anntations. If disabled the UI labels of annotations will be displayed instead of role labels. This property is enabled by default.

Key bindings

Some types of features support key bindings. This means, you can assigning a combination of keys to a particular feature value. Pressing these keys on the annotation page while a annotation is selected will set the feature to the assigned value. E.g. you could assign the key combo CTRL P to the value PER for the value feature on the Named Entity layer. So when you create a Named Entity annotation and then press the CTRL P, the value would be set to PER.

If the focus is on an input field, the key bindings are suppressed. That means, you could even assign single key shortcuts like p for PER while still be able to use p when entering text manually into an input field. Normally, the focus would jump directly to the first feature editor after selecting an annotation. But this is not the case if any features have key bindings defined, because it would render the key bindings useless (i.e. you would have to click outside of the feature editor input field so it looses the focus, thus activating the key bindings).

When defining a key binding, you have to enter a key combo consisting of one or more of the following key names:

  • Modifier keys: Ctrl, Shift, Alt, Meta

  • Letter keys: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z

  • Number keys: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

  • Function keys: F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12

  • Navigation keys: Home, End, Page_up, Page_down, Left, Up, Right, Down

  • Other keys: Escape, Tab, Space, Return, Enter, Backspace, Scroll_lock, Caps_lock, Num_lock, Pause, Insert, Delete

Typically you would combine zero or more modifier keys with a regular key (letter, number, function key, etc). A combination of multiple number or letter keys does not work.

Mind that you need to take care not to define the same key binding multiple times. Duplicate definitions are only sensible if you can ensure that the features on which they are defined will never be visible on screen simultaneously.
Coloring rules

Coloring rules can be used to control the coloring of annotations. A rule consists of two parts: 1) a regular expression that matches the label of an annotation, 2) a hexadecimal color code.

A simple color rule could be use the pattern PER and the color code #0000ff (blue). This would display all annotations with the label PER on the given layer in blue.

In order to assign a specific color to all annotations from the given layer, use the pattern .*.

It is also possible to assign a color to multiple label at once by exploiting the fact that the pattern is a regular expression. E.g. PER|OTH would match annotations with the label PER as well as with the label OTH. Mind not to add extra space such as PER | OTH - this would not work!

Be careful when creating coloring rules on layers with multiple features. If there are two features with the values a and b, the label will be a | b. In order to match this label in a coloring rule, the pipe symbol (|) must be escaped - otherwise it is interpreted as a regular expression OR operator: a \| b.


To manager the tagsets, click on the tab Tagsets in the project pane.


To edit one of the existing tagsets, select it by a click. Then, the tagset characteristics are displayed.


In the Frame Tagset details, you can change them, export a tagset, save the changes you made on it or delete it by clicking on Delete tagset. To change an individual tag, you select one in the list displayed in the frame Tags. You can then change its description or name or delete it by clicking Delete tag in Tag details. Please do not forget to save your changes by clicking on Save tag. To add a new tag, you have to click on Create tag in Tag details. Then you add the name and the description, which is optional. Again, do not forget to click Save tag or the new tag will not be created.

To create an own tagset, click on Create tagset and fill in the fields that will be displayed in the new frame. Only the first field is obligatory. Adding new tags works the same way as described for already existing tagsets. If you want to have a free annotation, as it could be used for lemma or meta information annotation, do not add any tags.

project tagset new

To export a tagset, choose the format of the export at the bottom of the frame and click Export tagset.


project export

Two modes of exporting projects are supported:

  • Export the whole project for the purpose of creating a backup, of migrating it to a new WebAnno version, of migrating to a different WebAnno instance, or simply in order to re-import it as a duplicate copy.

  • Export curated documents for the purpose of getting an easy access to the final annotation results. If you do not have any curated documents in your project, this export option is not offered. A re-import of these archives is not possible.

A whole project export always serves as an archive which can be re-imported again since it includes the annotations in the format internally used by the application. In addition to the internal format, the annotations can be included in a secondary format in the export. This format is controlled by the Format drop-down field. When AUTO is selected, the file format corresponds to the format of the source document. If there is no write support for the source format, the file is exported in the WebAnno TSV3 format instead.

The AUTO format export annotated files in the format of the originally imported file. If the original file format did not contain any annotations (e.g. plain text files) or only specific types of annotations (e.g. CoNLL files), the secondary annotation files will also have none or limited annotations.

When exporting a whole project, the structure of the exported ZIP file is as follows:

  • <project ID>.json - project metadata file

  • annotation

    • <source document name>

      • <user ID>.XXX - file representing the annotations for this user in the selected format.

      • CORRECTION_USER.XXX - correction project: original document state, automation project automatically generated suggestions

  • annotation_ser

    • <source document name>

      • <user ID>.ser - serialized CAS file representing the annotations for this user

      • CORRECTION_USER.ser - correction project: original document state, automation project automatically generated suggestions

  • curation

    • <source document name>

      • CURATION_USER.XXX - file representing the state of curation in the selected format.

  • curation_ser

    • <source document name>

      • CURATION_USER.ser - serialized UIMA CAS representing the state of curation

  • log

    • <project ID>.log - project log file

  • source - folder containing the original source files

Some browsers automatically extract ZIP files into a folder after the download. Zipping this folder and trying to re-import it into the application will generally not work because the process introduces an additional folder level within the archive. The best option is to disable the automatic extraction in your browser. E.g. in Safari, go to PreferencesGeneral and disable the setting Open "safe" files after downloading.
The files under annotation and curation are provided for convenience only. They are ignored upon import.
The CORRECTION_USER.XXX and CURATION_USER.ser may be located in the curation and curation_ser folders for old exported projects.

Currently, it is not possible to choose a specific format for bulk-exporting annotations. However, this mailing list post describes how DKPro Core can be used to transform the UIMA CAS formats into alternative formats.

User Management

This functionality is only available to administrators.

After selecting this functionality, a frame which shows all users is displayed. By selecting a user, a frame is displayed on the right.

manage users

Now you may change his role or password, specify an e-mail address and dis- or enable his account by placing the tick.

Disabling an account prevents the user from logging in. The user remains associated with any projects and remains visible in on the Monitoring page.

To create a new user, click on Create in the left frame. This will display a similar frame as the one described in the last paragraph. Here you have to give a login-name to the new user.

In both cases, do not forget to save your changes by pressing the Save button.

  1. User roles




User. Required to log in to the application. Removal of this role from an account will prevent login even for users that additionally hold the ROLE_ADMIN!


Administrator. Can manage users and has access to all other functionalities.


Project creator. Can create new projects.


Remote API access. Currently experimental and undocumented. Do not use.

Advanced functionalities


This functionality is only available to annotators and managers.

In this page, already annotated documents may be checked, corrected and enhanced.

Before being able to see and correct documents, make sure to have chosen correction when creating your project in projects. For detailed instructions please refer to Projects. Also make sure that the documents you upload are already annotated.

After clicking on the Correction symbol on the main page, the Correction page is opened. In the appearing frame, which is the left one in the image below, the user has to choose a project first.


Afterwards the documents assigned to him are displayed. Now he may choose a document. Just like in Annotation and Curation, the color of the document names signals the following: black - unopened document, blue - opened document and red - document finished.

After having chosen the document, two frames are displayed.


The upper one, Annotation, is the frame in which annotations can be made by the user. Moreover, it displays the chosen annotations. The lower frame, User: Suggestion, displays the annotation that was previously made in the uploaded document. By clicking on the annotations (not the words), they are chosen as right and are therefore displayed in the Annotation frame. Additional annotations may be made just like in Annotation, by selecting the span or relation to be annotated, choosing the layer and tag. For more detailed instruction or the guidelines for the navigation in the upper frames (Document, Page, Help, Workflow), see the guidelines for Annotation. No changes may be made in the lower frame.

The coloring of the annotation signals the same as in Curation.


This functionality is only available to managers and administrators.

This functionality gives the possibility to choose features and documents, which can be used for training of all layers that are offered in WebAnno (lemma, NER, POS and co-ref).


After clicking on Create Project on the Projects page, select automation as your project type. The detailed description may be found in Projects.

The documents, that are to be annotated, have to be uploaded in the frame Documents. Please make sure that the chosen format corresponds to the format of the files you are uploading.


To manage the automation process, choose the Automation frame. The following frame will appear:


First choose your target layer in the Select automation layer frame. If you want to train a non-custom layer, please make sure you created or imported it in the Layer frame (for instructions to do so, see Projects).


Here you may choose the format of the target layer and optionally add some feature layers on which you want to train.

In the tab Target layer you may upload training files containing the target layer in WebAnno Export formats (WebAnno CPH TEI reader, plain text, binary format, XMI format, old WebAnno Format, WebAnno Format, Weblicht TCF Format, for more information on these formats, see [ Format]).

In the next tab TAB-SEP target, you may upload training files containing the target layer in a tab-separated format, which is structured by writing each single word in a line together with its target tag, separated by a tab. Sentences are separated by blank lines.

The same goes analogically for the feature layers. The Other layers tab gives the possibility to upload WebAnno Export formats and choosing the layers that are to be used in training in the format window. The TAB-SEP feature tab gives the possibility to upload files in the above described tab-separated format, containing the feature tags in the second column. Every file will be regarded as one separate feature.

After choosing the training files, uploading them in the right format and importing them (by clicking on Import), every file will be displayed in the corresponding tab in the frame Documents. Click on the button Start Automation on the left, when you have uploaded your training data. Be prepared to wait for some time, as automation is a non-trivial process.

You can see that the automation has finished either by the fact that the Start Automation button is enabled again, or on the Monitoring page, by choosing the project in Monitoring and looking at the progress shown in the Training results /status frame.


To see the tags that were automatically created during the previously described, go to Home and choose the Automation page. Then select a project and a file, analogically to Annotation. The page, which is demonstrated below will be displayed. The navigation, export and the marking of finished documents is the same as in Annotation.


In the lower part, you see two horizontal frames, the lower one showing the automatically created annotation. By clicking on the tags, they are selected and therefore appear in the upper frame Annotation. You may see that selected tags turn grey in the Automation frame and blue in the Annotation frame. You may also add new tags to the Annotation, just like on the Annotation page.


Constraints reorder the choice of tags based on the context of an annotation. For instance, for a given lemma, not all possible part-of-speech tags are sensible. Constraint rules can be set up to reorder the choice of part-of-speech tags such that the relevant tags are listed first. This speeds up the annotation process as the annotator can choose from the relevant tags more conveniently.

The choice of tags is not limited, only the order in which they are presented to the annotator. Thus, if the project manager has forgotten to set up a constraint or did possible not consider an oddball case, the annotator can still make a decision.

Importing constraints

To import a constraints file, go to Project and click on the particular project name. On the left side of the screen, a tab bar opens. Choose Constraints. You can now choose a constraint file by clicking on Choose Files. Then, click on Import. Upon import, the application checks if the constraints file is well formed. If they conform to the rules of writing constraints, the constraints are applied.

Implementing constraint sets

A constraint set consists of two components:

  • import statement

  • scopes

  • Import statements* are composed in the following way:

import <fully_qualified_name_of_layer> as <shortName>;

It is necessary to declare short names for all fully qualified names because only short names can be used when writing a constraint rule. Short names cannot contain any dots or special characters, only letters, numbers, and the underscore.

All identifiers used in constraint statements are case sensitive.
If you are not sure what the fully qualified name of a layer is, you can look it up going to Layers in Project settings. Click on a particular layer and you can view the fully qualified name under Technical Properties.

Scopes consist of a scope name and one or more rules that refer to a particular annotation layer and define restrictions for particular conditions. For example, it is possible to reorder the applicable tags for a POS layer, based on what kind of word the annotator is focusing on.

While scope names can be freely chosen, scope rules have a fixed structure. They consist of conditions and restrictions, separated by an arrow symbol (). Conditions consist of a path and a value, separated by an equal sign (=). Values always have to be embraced by double-quotes. Multiple conditions in the same rule are connected via the &-operator, multiple restrictions in the same rule are connected via the |-operator.

Typically a rule’s syntax is

Single constraint rule
<scopeName> {
  <condition_set> -> <restriction_set>;

This leads to the following structure:

Multiple constraint rules
<scopeName> {

Both conditions and restrictions are composed of a path and a value. The latter is always enclosed in double quotes.

Structure of conditions and restrictions

A condition is a way of defining whether a particular situation in WebAnno is based on annotation layers and features in it. Conditions can be defined on features with string, integer or boolean values, but in any case, the value needs to be put into quotes (e.g. someBooleanFeature="true", someIntegerFeature="2").

A condition set consists of one or more conditions. They are connected with logical AND as follows.

<condition> & <condition>

A restriction set defines a set of restrictions which can be applied if a particular condition set is evaluated to true. As multiple restrictions inside one rule are interpreted as conjunctions, they are separated by the |-operator. Restrictions can only be defined on String-valued features that are associated with a tagset.

<restriction> | <restriction>

A path is composed of one or more steps, separated by a dot. A step consists of a feature selector and a type selector. Type selectors are only applicable while writing the condition part of a rule. They comprise a layer operator @ followed by the type (Lemma, POS, etc). Feature selectors consist of a feature name, e.g.


Navigation across layers is possible via


Hereby all annotations of type <shortLayerName> at the same position as the current context are found.


The constraint language supports block comments which start with / and end with /. These comments may span across multiple lines.

/* This is a single line comment */

   This is a multi-
   line comment


The following simple example of a constraints file re-orders POS tags depending on Lemma values. If the Lemma was annotated as can, the POS tags VERB and NOUN are highlighted. If the Lemma value is the, the POS tag DET is suggested first.

import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma as Lemma;
import de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS as Pos;

Pos {
  @Lemma.value = "can" ->
    coarseValue = "VERB" |
    coarseValue = "NOUN";

  @Lemma.value = "the" ->
    coarseValue = "DET";

In the UI, the tags that were matched by the constraints are bold and come first in the list of tags:


Conditional features

Constraints can be used to set up conditional features, that is features that only become available in the UI if another feature has a specific value. Let’s say that for example you want to annotate events and only causing events should additionally offer a polarity feature, while for caused events, there should be no way to select a polarity.

Sticking with the example of annotating events, conditional features can be set up as following:

  • Go to the Layer tab of the project settings

  • Create a new tagset called Event category and add the tags causing and caused

  • Create a new tagset called Event polarity and add the tags positive and negative

  • Create a new span layer called Event

  • Add a string feature called category and assign the tagset Event category

  • Save the changes to the category feature

  • Add a string feature called polarity and assign the tagset Event polarity

  • Enabled the checkbox Hide Un-constraint feature on the polarity feature

  • Save the changes to the polarity feature

  • Create a new text file called constraints.txt with the following contents .

import webanno.custom.Event as Event;

Event {
  category="causing" -> polarity="positive" | polarity="negative";
  • Import constraints.txt in the tab Constraints in the project settings.

When you now annotate an Event in this project, then the polarity feature is only visible and editable if the category of the annotation is set to causing.

It is important that both of the features have tagsets assigned - otherwise the conditional effect will not take place.

Constraints for slot features

Constraints can be applied to the roles of slot features. This is useful, e.g. when annotating predicate/argument structures where specific predicates can only have certain arguments.

Consider having a span layer SemPred resembling a semantic predicate and bearing a slot feature arguments and a string feature senseId. We want to restrict the possible argument roles based on the lemma associated with the predicate. The first rule in the following example restricts the senseId depending on the value of a Lemma annotation at the same position as the SemPred annotation. The second rule then restricts the choice of roles for the arguments based on the senseId. Note that to apply a restriction to the role of a slot feature, it is necessary to append .role to the feature name (that is because role is technically a nested feature). Thus, while we can write e.g. senseId = "Request" for a simple string feature, it is necessary to write arguments.role = "Addressee".

Note that some role labels are marked with the flag (!). This is a special flag for slot features and indicates that slots with these role labels should be automatically displayed in the UI ready to be filled. This should be used for mandatory or common slots and saves time as the annotator does not have to manually create the slots before filling them.

SemPred {
  /* Rule 1 */
  @Lemma.value = "ask" -> senseId = "Questioning" | senseId = "Request" | senseId = "XXX";
  /* .. other lemmata */
  /* Rule 2 */
  senseId = "Questioning" ->
    /* core roles */
    arguments.role = "Addressee" (!) | arguments.role = "Message" (!) | arguments.role = "Speaker" (!) |
    /* non-core roles */
    arguments.role = "Time" | arguments.role = "Iterations";
  /* .. other senses */

Constraints language grammar

Constraints language grammar
// Basic structure ---------------------------------------
<file>            ::= <import>* | <scope>*
<scope>           ::= <shortLayerName> "{" <ruleset> "}"
<ruleset>         ::= <rule>*
<import>          ::= "import" <qualifiedLayerName>
                      "as" <shortLayerName>
<rule>            ::= <conds> "->" <restrictions> ";"

// Conditions --------------------------------------------
<conds>           ::= <cond> | <cond> "&" <conds>
<cond>            ::= <path> "=" <value>
<path>            ::= <featureName> | <step> "." <path>
<step>            ::= <featureName> | <layerSelector>
<layerSelector>   ::= <layerOperator>? <shortLayerName>
<layerOperator>   ::= "@" // select annotation in layer X

// Restrictions ------------------------------------------
<restrictions>    ::= <restriction> |
                      <restriction> "|" <restrictions>
<restriction>     ::= <restrictionPath> "=" <value>
                      ( "(" <flags> ")" )
<restrictionPath> ::= <featureName> |
                      <restrictionPath> "." <featureName>
<flags>           ::= "!" // core role

CAS Doctor

The CAS Doctor is an essential development tool. When enabled, it checks the CAS for consistency when loading or saving a CAS. It can also automatically repair inconsistencies when configured to do so. This section gives an overview of the available checks and repairs.

It is safe to enable any checks. However, active checks may considerably slow down the application, in particular for large documents or for actions that work with many documents, e.g. curation or the calculation of agreement. Thus, checks should not be enabled on a production system unless the application behaves strangely and it is necessary to check the documents for consistency.

Enabling repairs should be done with great care as most repairs are performing destructive actions. Repairs should never be enabled on a production system. The repairs are executed in the order in which they are appear in the debug.casDoctor.repairs setting. This is important in particular when applying destructive repairs.

When documents are loaded, CAS Doctor first tries to apply any enabled repairs and afterwards applies enabled checks to ensure that the potentially repaired document is consistent.

Additionally, CAS Doctor applies enabled checks before saving a document. This ensures that a bug in the user interface introduces inconsistencies into the document on disk. I.e. the consistency of the persisted document is protected! Of course, it requires that relevant checks have been implemented and are actually enabled.

By default, CAS Doctor generates an exception when a check or repair fails. This ensures that inconsistencies are contained and do not propagate further. In some cases, e.g. when it is known that by its nature an inconsistency does not propagate and can be avoided by the user, it may be convenient to allow the user to continue working with the application while a repair is being developed. In such a case, CAS Doctor can be configured to be non-fatal. Mind that users can always continue to work on documents that are consistent. CAS Doctor only prevents loading inconsistent documents and saving inconsistent documents.


Setting Description Default Example


If the extra checks trigger an exception




Extra checks to perform when a CAS is saved (also on load if any repairs are enabled)


comma-separated list of checks


Repairs to be performed when a CAS is loaded - order matters!


comma-separated list of repairs


Behave as like a release version even if it is a beta or snapshot version.




All feature structures indexed

This check verifies if all reachable feature structures in the CAS are also indexed. We do not currently use any un-indexed feature structures. If there are any un-indexed feature structures in the CAS, it is likely due to a bug in the application and can cause undefined behavior.

For example, older versions of WebAnno had a bug that caused deleted spans still to be accessible through relations which had used the span as a source or target.

This check is very extensive and slow.

Feature-attached spans truly attached



Related repairs

Re-attach feature-attached spans, Re-attach feature-attached spans and delete extras

Certain span layers are attached to another span layer through a feature reference from that second layer. For example, annotations in the POS layer must always be referenced from a Token annotation via the Token feature pos. This check ensures that annotations on layers such as the POS layer are properly referenced from the attaching layer (e.g. the Token layer).

Links reachable through chains



Related repairs

Remove dangling chain links

Each chain in a chain layers consist of a chain and several links. The chain points to the first link and each link points to the following link. If the CAS contains any links that are not reachable through a chain, then this is likely due to a bug.

No multiple incoming relations



Check that nodes have only one in-going dependency relation inside the same annotation layer. Since dependency relations form a tree, every node of this tree can only have at most one parent node. This check outputs a message that includes the sentence number (useful to jump directly to the problem) and the actual offending dependency edges.

No 0-sized tokens and sentences



Related repairs

Remove 0-size tokens and sentences

Zero-sized tokens and sentences are not valid and can cause undefined behavior.

Relation offsets consistency



Related repairs

Repair relation offsets

Checks that the offsets of relations match the target of the relation. This mirrors the DKPro Core convention that the offsets of a dependency relation must match the offsets of the dependent.

CASMetadata presence



Related repairs

Upgrade CAS

Checks if the ìnternal type `CASMetadata is defined in the type system of this CAS. If this is not the case, then the application may not be able to detect concurrent modifications.

Dangling relations



Related repairs

Remove dangling relations

Checks if there are any relations that do not have a source or target. Either the source/end are not set at all or they refer to an unset attach feature in another layer. Note that relations referring to non-indexed end-points are handled by All feature structures indexed.


Re-attach feature-attached spans



This repair action attempts to attach spans that should be attached to another span, but are not. E.g. it tries to set the pos feature of tokens to the POS annotation for that respective token. The action is not performed if there are multiple stacked annotations to choose from. Stacked attached annotations would be an indication of a bug because attached layers are not allowed to stack.

This is a safe repair action as it does not delete anything.

Re-attach feature-attached spans and delete extras



This is a destructive variant of Re-attach feature-attached spans. In addition to re-attaching unattached annotations, it also removes all extra candidates that cannot be attached. For example, if there are two unattached Lemma annotations at the position of a Token annotation, then one will be attached and the other will be deleted. Which one is attached and which one is deleted is undefined.

Re-index feature-attached spans



This repair locates annotations that are reachable via a attach feature but which are not actually indexed in the CAS. Such annotations are then added back to the CAS indexes.

This is a safe repair action as it does not delete anything.

Repair relation offsets



Fixes that the offsets of relations match the target of the relation. This mirrors the DKPro Core convention that the offsets of a dependency relation must match the offsets of the dependent.

Remove dangling chain links



This repair action removes all chain links that are not reachable through a chain.

Although this is a destructive repair action, it is likely a safe action in most cases. Users are not able see chain links that are not part of a chain in the user interface anyway.

Remove dangling feature-attached span annotations



This repair action removes all annotations which are themselves no longer indexed (i.e. they have been deleted), but they are still reachable through some layer to which they had attached. This affects mainly the DKPro Core POS and Lemma layers.

Although this is a destructive repair action, it is sometimes a desired action because the user may know that they do not care to resurrect the deleted annotation as per Re-index feature-attached spans.

Remove dangling relations



This repair action removes all relations that point to unindexed spans.

Although this is a destructive repair action, it is likely a safe action in most cases. When deleting a span, normally any attached relations are also deleted (unless there is a bug). Dangling relations are not visible in the user interface. A dangling relation is one that meets any of the following conditions:

  • source or target are not set

  • the annotation pointed to by source or target is not indexed

  • the attach-feature in the annotation pointed to by source or target is not set

  • the annotation pointed to by attach-feature in the annotation pointed to by source or target is not indexed

Remove 0-size tokens and sentences



This is a destructive repair action and should be used with care. When tokens are removed, also any attached lemma, POS, or stem annotations are removed. However, no relations that attach to lemma, POS, or stem are removed, thus this action could theoretically leave dangling relations behind. Thus, the Remove dangling relations repair action should be configured after this repair action in the settings file.

Upgrade CAS



Ensures that the CAS is up-to-date with the project type system. It performs the same operation which is regularly performed when a user opens a document for annotation/curation.

This is considered to be safe repair action as it only garbage-collects data from the CAS that is no longer reachable anyway.

Annotation Guidelines

Providing your annotation team with guidelines helps assuring that every team member knows exactly what is expected of them.

Annotators can access the guidelines via the Guidelines button on the annotation page.

Project managers can provide these guidelines via the Guidelines tab in the project settings. Guidelines are provided as files (e.g. PDF files). To upload guidelines, click on Choose files, select a file from your local disc and then click Import guidelines. Remove a guideline document by selecting it and pressing the Delete button.


Appendix A: Formats

CoNLL 2000

The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description

CoNLL 2000




POS, chunks

Table 20. Columns
Column Type Description






part-of-speech tag



chunk (IOB1 encoded)

reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O

CoNLL 2002

The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description

CoNLL 2002




Named entities

Table 21. Columns
Column Type/Feature Description



Word form or punctuation symbol.



named entity (IOB2 encoded)

Wolff B-PER
, O
currently O
a O
journalist O
in O
Argentina B-LOC
, O
played O
with O
Bosque I-PER
in O
the O
final O
years O
of O
the O
seventies O
in O
Real B-ORG
Madrid I-ORG
. O

CoNLL 2003

The CoNLL 2003 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span.

Format Read Write Custom Layers Description

CoNLL 2003




Table 22. Columns
Column Type/Feature Description



Word form or punctuation symbol.



chunk (IOB1 encoded)


Named entity

named entity (IOB1 encoded)

official NN I-NP O
heads VBZ I-VP O
for IN I-PP O
Baghdad NNP I-NP I-LOC
. . O O

CoNLL 2006

The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description

CoNLL 2006




Lemma, POS, dependencies (basic)

Table 23. Columns
Column Type/Feature Description



Token counter, starting at 1 for each new sentence.



Word form or punctuation symbol.



Lemma of the word form.


POS coarseValue


POS PosValue

Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.



Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available.



Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.



Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.



Projective head of current token, which is either a value of ID or zero ('0'), or an underscore if not available. Note that depending on the original treebank annotation, there may be multiple tokens an with ID of zero. The dependency structure resulting from the PHEAD column is guaranteed to be projective (but is not available for all languages), whereas the structures resulting from the HEAD column will be non-projective for some sentences of some languages (but is always available).



Dependency relation to the PHEAD, or an underscore if not available. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.

Heutzutage	heutzutage	ADV	_	_	ADV	_	_

CoNLL 2009

The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description

CoNLL 2009




Lemma, POS, dependencies (basic)

Table 24. Columns
Column Type/Feature Description



Token counter, starting at 1 for each new sentence.



Word form or punctuation symbol.



Lemma of the word form.



Automatically predicted lemma of FORM.


POS PosValue

Fine-grained part-of-speech tag, where the tagset depends on the language.



Automatically predicted major POS by a language-specific tagger.



Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available.



Automatically predicted morphological features (if applicable).



Head of the current token, which is either a value of ID or zero (`0). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.



Automatically predicted syntactic head.



Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply ROOT.



Automatically predicted dependency relation to PHEAD.



Contains Y for argument-bearing tokens.



(sense) identifier of a semantic 'predicate' coming from a current token.



Columns with argument labels for each semantic predicate (in the ID order).

1	The	the	the	DT	DT	_	_	4	4	NMOD	NMOD	_	_	_	_
2	most	most	most	RBS	RBS	_	_	3	3	AMOD	AMOD	_	_	_	_
3	troublesome	troublesome	troublesome	JJ	JJ	_	_	4	4	NMOD	NMOD	_	_	_	_
4	report	report	report	NN	NN	_	_	5	5	SBJ	SBJ	_	_	_	_
5	may	may	may	MD	MD	_	_	0	0	ROOT	ROOT	_	_	_	_
6	be	be	be	VB	VB	_	_	5	5	VC	VC	_	_	_	_
7	the	the	the	DT	DT	_	_	11	11	NMOD	NMOD	_	_	_	_
8	August	august	august	NNP	NNP	_	_	11	11	NMOD	NMOD	_	_	_	AM-TMP
9	merchandise	merchandise	merchandise	NN	NN	_	_	10	10	NMOD	NMOD	_	_	A1	_
10	trade	trade	trade	NN	NN	_	_	11	11	NMOD	NMOD	Y	trade.01	_	A1
11	deficit	deficit	deficit	NN	NN	_	_	6	6	PRD	PRD	Y	deficit.01	_	A2
12	due	due	due	JJ	JJ	_	_	13	11	AMOD	APPO	_	_	_	_
13	out	out	out	IN	IN	_	_	11	12	APPO	AMOD	_	_	_	_
14	tomorrow	tomorrow	tomorrow	NN	NN	_	_	13	12	TMP	TMP	_	_	_	_
15	.	.	.	.	.	_	_	5	5	P	P	_	_	_	_

CoNLL 2012

The CoNLL 2012 format targets semantic role labeling and coreference. Columns are whitespace-separated (tabs or spaces). Sentences are separated by a blank new line.

Note that this format cannot deal with the following situations: * An annotation has no label (e.g. a SemPred annotation has no category) - in such a case null is written into the corresponding column. However, the reader will actually read this value as the label. * If a SemPred annotation is at the same position as a SemArg annotation linked to it, then only the (V*) representing the SemPred annotation will be written. * SemPred annotations spanning more than one token are not supported * If there are multiple SemPred annotations on the same token, then only one of them is written. This is because the category of the SemPred annotation goes to the Predicate Frameset ID and that can only hold one value which.

Format Read Write Custom Layers Description

CoNLL 2012




Table 25. Columns
Column Type/Feature Description

Document ID


This is a variation on the document filename.</li>

Part number


Some files are divided into multiple parts numbered as 000, 001, 002, …​ etc.

Word number


Word itself

document text

This is the token as segmented/tokenized in the Treebank. Initially the *_skel file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release.



Parse bit


This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a *. The full parse can be created by substituting the asterisk with the ([pos] [word]) string (or leaf) and concatenating the items in the rows of that column.

Predicate lemma


The predicate lemma is mentioned for the rows for which we have semantic role information. All other rows are marked with a -.

Predicate Frameset ID


This is the PropBank frameset ID of the predicate in Column 7.

Word sense


This is the word sense of the word in Column 3.



This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data.

Named Entities


These columns identifies the spans representing various named entities.

Predicate Arguments


There is one column each of predicate argument structure information for the predicate mentioned in Column 7.



Coreference chain information encoded in a parenthesis structure.

en-orig.conll	0	0	John	NNP	(TOP(S(NP*)	john	-	-	-	(PERSON)	(A0)	(1)
en-orig.conll	0	1	went	VBD	(VP*	go	go.02	-	-	*	(V*)	-
en-orig.conll	0	2	to	TO	(PP*	to	-	-	-	*	*	-
en-orig.conll	0	3	the	DT	(NP*	the	-	-	-	*	*	(2
en-orig.conll	0	4	market	NN	*)))	market	-	-	-	*	(A1)	2)
en-orig.conll	0	5	.	.	*))	.	-	-	-	*	*	-

CoreNLP CoNLL-like format

The CoreNLP CoNLL format is used by the Stanford CoreNLP package. Columns are tab-separated. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description

CoreNLP CoNLL-like format




Table 26. Columns
Column Type/Feature Description



Token counter, starting at 1 for each new sentence.



Word form or punctuation symbol.



Lemma of the word form.


POS PosValue

Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.



Named Entity tag, or underscore if not available. If a named entity covers multiple tokens, all of the tokens simply carry the same label without (no sequence encoding).



Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.



Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.

1	Selectum	Selectum	NNP	O	_	_
2	,	,	,	O	_	_
3	Société	Société	NNP	O	_	_
4	d'Investissement	d'Investissement	NNP	O	_	_
5	à	à	NNP	O	_	_
6	Capital	Capital	NNP	O	_	_
7	Variable	Variable	NNP	O	_	_
8	.	.	.	O	_	_


The CoNLL-U format format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line.

Format Read Write Custom Layers Description





Lemma, POS, dependencies (basic & enhanced), surface form

Table 27. Columns
Column Type/Feature Description



Word index, integer starting at 1 for each new sentence; may be a range for tokens with multiple words.



Word form or punctuation symbol.



Lemma or stem of word form.


POS coarseValue

Part-of-speech tag from the universal POS tag set.


POS PosValue

Language-specific part-of-speech tag; underscore if not available.



List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.



Head of the current token, which is either a value of ID or zero (0).



Universal Stanford dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.



List of secondary dependencies (head-deprel pairs).



Any other annotation.

1	They	they	PRON	PRN	Case=Nom|Number=Plur	2	nsubj	4:nsubj	_
2	buy	buy	VERB	VB	Number=Plur|Person=3|Tense=Pres	0	root	_	_
3	and	and	CONJ	CC	_	2	cc	_	_
4	sell	sell	VERB	VB	Number=Plur|Person=3|Tense=Pres	2	conj	0:root	_
5	books	book	NOUN	NNS	Number=Plur	2	dobj	4:dobj	SpaceAfter=No
6	.	.	PUNCT	.	_	2	punct	_	_

WebLicht TCF

The TCF (Text Corpus Format) was created in the context of the CLARIN project. It is mainly used to exchange data between the different web-services that are part of the WebLicht platform.

Format Read Write Custom Layers Description





Lemma, POS, dependencies (basic), coreference, named entities

Plain Text

Basic UTF-8 plain text. Automatic sentence and token detection will be performed.

Format Read Write Custom Layers Description

Plain text




No annotations

Plain Text (one sentence per line)

Basic UTF-8 plain text where each line is interpreted as one sentence.

Format Read Write Custom Layers Description

Plain text




No annotations

Plain Text (pretokenized)

Basic UTF-8 plain text. Tokens are taken to be separated by spaces. Each line is interpreted as a sentence.

Format Read Write Custom Layers Description

Plain text




No annotations


The file format used by WebAnno version 3.

Format Read Write Custom Layers Description







The probably most commonly used formats supported by the Apache UIMA framework is UIMA CAS XMI. It is able to capture all the information contained in the CAS. This is the de-facto standard for exchanging data in the UIMA world. Most UIMA-related tools support it.

The XMI format does not include type system information. When exporting files in the XMI format, a ZIP file is created for each document which contains the XMI file itself as well as an XML file containing the type system.

There are two flavors of CAS XMI, namely XML 1.0 and XML 1.1. XML 1.0 is more widely supported in the world of XML parsers, so you may expect better interoperability with other programming languages (e.g. Python) with the XML 1.0 flavor. XML 1.1 has a support for a wider range of characters, despite dating back to 2006, it is still not supported by all XML parsers.

Format Read Write Custom Layers Description











WebAnno TSV 1

The file format used by WebAnno version 1 and earlier.

Format Read Write Custom Layers Description

WebAnno TSV 1




WebAnno TSV 2

The file format used by WebAnno version 2.

Format Read Write Custom Layers Description

WebAnno TSV 2




token, multiple token, and arc annotations supported. No chain annotation is supported. no sub-token annotation is supported

WebAnno TSV 3.x

The file format used by WebAnno version 3.

Format Read Write Custom Layers Description

WebAnno TSV 3




Appendix B: WebAnno TSV 3.3 File format

Format Read Write Custom Layers Description

WebAnno TSV 3




In this section, we will discuss the WebAnno TSV (Tab Separated Value) file format version 3.3. The format is similar to the CoNNL file formats with specialized additions to the header and column representations. The file format inhabits a header and a body section. The header section present information about the different types of annotation layers and features used in the file. While importing the WebAnno TSV file, the specified headers should be first created in to the running WebAnno project. Otherwise, the importing of the file will not be possible.

The body section of the TSV file presents the document and all the associated annotations including sentence and token annotations.

Encoding and Offsets

TSV files are always encoded in UTF-8. However, the offsets used in the TSV file are based on UTF-16. This is important when using TSV files with texts containing e.g. Emojis or some modern non-latin Asian, Middle-eastern and African scripts.

WebAnno is implemented in Java. The Java platform internally uses a UTF-16 representation for text. For this reason, the offsets used in the TSV format currently represent offsets of the 16bit units in UTF-16 strings. This is important if your text contains Unicode characters that cannot be represented in 16bit and which thus require two 16bit units. For example a token represented by the Unicode character 😊 (U+1F60A) requires two 16bit units. Hence, the offset count increased by 2 for this character. So Unicode characters starting at U+10000 increase the offset count by 2.

Example: TSV sentence containing a Unicode character from the Supplementary Planes
#Text=I like it 😊 .
1-1	0-1	I	_
1-2	2-6	like	_
1-3	7-9	it	_
1-4	10-12	😊	*
1-5	13-14	.	_
Since the character offsets are based on UTF-16 and the TSV file itself is encoded in UTF-8, first the text contained in the file needs to be transcoded from UTF-8 into UTF-16 before the offsets can be applied. The offsets cannot be used for random access to characters directly in the TSV file.

File Header

WebAnno TSV 3.3 file header consists of two main parts:

  • the format indicator

  • the column declarations

After the header, there must be two empty lines before the body part containing the annotations may start.

Example: format in file header
#FORMAT=WebAnno TSV 3.3

Layers are marked by the # character followed by T_SP= for span types (including slot features), T_CH= for chain layers, and T_RL= for relation layers. Every layer is written in new line, followed by the features in the layer. If all layer type exists, first, all the span layers will be written, then the chain layer, and finally the relation layers. Features are separated by the | character and only the short name of the feature is provided.

Example: Span layer with simple features in file header

Here the layer name is webanno.custom.Pred and the features are named bestSense, lemmaMapped, senseId, senseMapped. Slot features start with a prefix ROLE_ followed by the name of the role and the link. The role feature name and the link feature name are separated by the _ character.

The target of the slot feature always follows the role/link name

Example: Span layer with slot features in file header

Here the name of the role is webanno.custom.SemPred:RoleSet and the name of the role link is webanno.custom.SemPredRoleSetLink and the target type is uima.tcas.Annotation.

Chain layers will have always two features, referenceType and referenceRelation.

Example: Chain layers in file header

Relation layers will come at last in the list and the very last entry in the features will be the type of the base (governor or dependent) annotations with a prefix BT_.

Example: Relation layers in file header

Here, the relation type de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency has a feature DependencyType and the relation is between a base type of de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS.

File Body / Annotations

In this section we discuss the different representations of texts and annotation in WebAnno TSV3format

Reserved Characters

Reserved characters have a special meaning in the TSV format and must be are escaped with the backslash (\) character if they appear in text or feature values. Reserved characters are the following:

Reserved Characters
The way that TSV is presently defined/implemented, it kind of considers as a single "character"…​ and it is also escaped as a single unit, i.e. becomes ->. It is something to be addressed in a future iteration of the format.

Sentence Representation

Sentence annotations are presented following the text marker #Text=, before the token annotations. All text given here is inside the sentence boundaries.

Example: Original text sections
#Text=Bell , based in Los Angeles , makes and distributes electronic , computer and building products .

The text of an imported document is reconstructed from the sentence annotations. Additionally, the offset information of the sentence tokens are taken into account to determine whether padding needs to be added between sentences. The TSV format can presently not record text that occurs in between two sentences.

If a sentence spans multiple lines, the text is split at the line feed characters (ASCII 12) and multiple #Text= lines are generated. Note that carriage return characters (ASCII 13) are kept as escaped characters (\r).

Example: Original multi-line text
#Text=Bell , based in Los Angeles , makes and distributes
#Text=electronic , computer and building products .

Optionally, an alphanumeric sentence identifier can be added in the sentence header section.

Example: Sentence identifier
#Text=Bell , based in Los Angeles , makes and distributes electronic , computer and building products .

Token and Sub-token Annotations

Tokens represent a span of text within a sentence. Tokens cannot overlap, although then can be directly adjacent (i.e. without any whitespace between them). The start offset of the first character of the first token corresponds to the start of offset of the sentence.

Token annotation starts with a sentence-token number marker followed by the begin-end offsets and the token itself, separated by a TAB characters.

Example: Token position
1-2	4-8	Haag

Here 1 indicates the sentence number, 2 indicates the token number (here, the second token in the first sentence) and 4 is the begin offset of the token and 8 is the end offset of the token while Haag is the token.

The begin offset of the first token in a sentence must coincide with the offset at which the first #Text line starts in the original document text.

Example: Valid sentence text header / token offsets
1-1	0-6	Hello
Example: Invalid sentence text header / token offsets
#Text= Hello
1-1	1-7	Hello

Sub-token representations are affixed with a . and a number starts from 1 to N.

Example: Sub-token positions
1-3	9-14	plays
1-3.1	9-13	play
1-3.2	13-14	s

Here, the sub-token play is indicated by sentence-token number 1-3.1 and the sub-token s is indicated by 1-3.2.

While tokens may not overlap, sub-tokens may overlap.

Example: Overlapping sub-tokens
1-3	9-14	plays
1-3.1	9-12	pla
1-3.2	11-14	ays

Span Annotations

For every features of a span Annotation, annotation value will be presented in the same row as the token/sub-token annotation, separated by a TAB character. If there is no annotation for the given span layer, a _ character is placed in the column. If the feature has no/null annotation or if the span layer do not have a feature at all, a * character represents the annotation.

Example: Span layer declaration in file header
Example: Span annotations in file body
1-9	36-43	unhappy	JJ	abstract	negative

Here, the first annotation at column 4, JJ is avalue for a feature PosValue of the layer de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS. For the two features of the layer webanno.custom.Sentiment (Category and Opinion), the values abstract and negative are presented at column 5 and 6 resp.

When serializing a span annotation starts or ends in a space between tokens, then the annotation is truncated to start at the next token after the space or to end at the last token before the space. For example, if you consider the text [one two] and there is an some span annotation on [one ] (note the trailing space), the extent of this span annotation will be serialized as only covering [one]. It is not possible in this format to have annotations starting or ending in the space between tokens because the inter-token space is not rendered as a row and therefore is not addressable in the format.

Disambiguation IDs

Within a single line, an annotation can be uniquely identified by its type and stacking index. However, across lines, annotation cannot be uniquely identified easily. Also, if the exact type of the referenced annotation is not known, an annotation cannot be uniquely identified. For this reason, disambiguation IDs are introduced in potentially problematic cases:

  • stacked annotations - if multiple annotations of the same type appear in the same line

  • multi-unit annotations - if an annotations spans multiple tokens or sub-tokens

  • un-typed slots - if a slot feature has the type uima.tcas.Annotation and may thus refer to any kind of target annotation.

The disambiguation ID is attached as a suffix [N] to the annotation value. Stacked annotations are separated by | character.

Example: Span layer declaration in file header
Example: Multi-token span annotations and stacked span annotations
1-1	0-3	Ms.	NNP	PER[1]|PERpart[2]
1-2	4-8	Haag	NNP	PER[1]

Here, PER[1] indicates that token 1-1 and 1-2 have the same annotation (multi-token annotations) while PERpart[2] is the second (stacked) annotation on token 1-1 separated by | character.

On chain layers, the number in brackets is not a disambiguation ID but rather a chain ID!

Slot features

Slot features and the target annotations are separated by TAB character (first the feature column then the target column follows). In the target column, the sentence-token id is recorded where the feature is drawn.

Unlike other span layer features (which are separated by | character), multiple annotations for a slot feature are separated by the ; character.

Example: Span layer declaration in file header
Example: Span annotations and slot features
2-1	27-30	Bob	_	_	_	bob
2-2	31-40	auctioned	transaction	seller;goods;buyer	2-1;2-3[4];2-6
2-3	41-44	the	_	_	_	clock[4]
2-4	45-50	clock	_	_	_	clock[4]
2-5	52-54	to	_	_	_	_
2-6	55-59	John	_	_	_	john
2-7	59-60	.	_	_	_	_

Here, for example, at token 2-2, we have three slot annotations for feature Roles that are seller, goods, and buyer. The targets are on token 2-1 `, `2-3[4], and 2-6 respectively which are on annotations of the layer webanno.custom.Lu which are bob, clock and john.

Chain Annotations

In the Chain annotation, two columns (TAB separated) are used to represent the referenceType and the referenceRelation. A chain ID is attached to the referenceType to distinguish to which of the chains the annotation belongs. The referenceRelation of the chain is represented by the relation value followed by and followed by the CH-LINK number where CH is the chain number and LINK is the link number (the order the chain).

Example: Chain layer declaration in file header
Example: Chain annotations
1-1	0-2	He	pr[1]	coref->1-1
1-2	3-7	shot	_	_
1-3	8-15	himself	pr[1]	coref->1-2
1-4	16-20	with	_	_
1-5	21-24	his	pr[1]	*->1-3
1-6	25-33	revolver	_	_
1-7	33-34	.	_	_

In this example, token 1-3 is marked as pr[1] which indicates that the referenceType is pr and it is part of the chain with the ID 1. The relation label is coref and with the CH-LINK number 1-2 which means that it belongs to chain 1 and this is the second link in the chain.

Relation Annotations

Relation annotations comes to the last columns of the TSV file format. Just like the span annotations, every feature of the relation layers are represented in a separate TAB. Besides, one extra column (after all feature values) is used to write the token id from which token/sub-token this arc of a relation annotation is drawn.

Example: Span and relation layer declaration in file header
Example: Span and relation annotations
1-1	0-3	Ms.	NNP	SUBJ	1-3
1-2	4-8	Haag	NNP	SBJ	1-3
1-3	9-14	plays	VBD	P|ROOT	1-5|1-3
1-4	15-22	Elianti	NNP	OBJ	1-3
1-5	23-24	.	.	_	_

In this example (say token 1-1), column 4 (NNP) is a value for the feature PosValue of the de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS layer. Column 5 (SUBJ) records the value for the feature DependencyType of the de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency relation layer, where as column 6 (1-3) shows from which governor (VBD) the dependency arc is drawn.

For relations, a single disambiguation ID is not sufficient. If a relation is ambiguous, then the source ID of the relation is followed by the source and target disambiguation ID separated by an underscore (_). If only one of the relation endpoints is ambiguous, then the other one appears with the ID 0. E.g. in the example below, the annotation on token 1-5 is ambiguous, but the annotation on token 1-1 is not.

Example: Disambiguation IDs in relations
#FORMAT=WebAnno TSV 3.3

#Text=This is a test .
1-1	0-4	This	*	_	_
1-2	5-7	is	_	_	_
1-3	8-9	a	_	_	_
1-4	10-14	test	_	_	_
1-5	15-16	.	*[1]|*[2]	*	1-1[0_1]


  • 3.3

  • Adds support for the optional stanza in the sentence header

  • 3.2

  • First time the format is fully documented

Appendix C: Troubleshooting

If the tool is kept open in the browser, but not used for a long period of time, you will have to log in again. For this, press the reload button of your browser.

If the tool does not react for more than 1 minute, please also reload and re-login.

We are collecting error reports to improve the tool. For this, the error must be reproducible: If you find a way how to produce the error, please open an issue and describe it.