Language Technology Paves the Way for New Breakthroughs in Information and Communication Technology
Anders Søgaard is professor at the Centre for Language Technology and Principal Investigator for the European Research Council (ERC) Starting Grant project LOWLANDS (parsing low resource languages and domains).
Technology alone does not make good search engines or machine translations. Linguistics is just as important for the processing of large volumes of speech and text, and for presenting information as accurately as possible. This is the field of research covered by language technology. Reliable automatic analyses require large volumes of data analysed by linguists. Such corpora are readily available in major languages like English, Chinese and German, but this is a small minority of the world’s languages, and often such corpora only cover standard text domains such as news media and reference works. This leads to an imbalance, making language technology much better for standard language use in the major languages. LOWLANDS seeks to compensate for this imbalance, so that computers can also handle less widely spoken languages and more informal usage.
There is a great innovation potential in developing search engines that are able to recognize, analyse and link up new types of information, e.g. from social media such as Twitter and Facebook. These domains are increasingly important sources of quick and personalized information about events, people and places. However, this information is often written in informal language, using slang and contractions, which presents challenges for current language technology. If it were possible to automate accurate data acquisition from these domains, new areas would open up and new user needs could be met. The example of social media shows that progress in ICT is not just a matter for computer science and technical disciplines. Something as deeply rooted in the humanities as linguistics makes a vital contribution to optimizing and expanding the scope of ICT.
To address this challenge, Anders Søgaard has established close working relationships with Google and other major companies in the field. The partnership is not predicated on pursuing a predetermined result or product, but is based on a systematic linking up of basic research into language technology and the development of search engines and machine translation.
Search engines and data analysis are core functions in the use of web-based information and involve billions of users worldwide. From a democratization perspective, there is great benefit in expanding the domains and languages from which knowledge and information are derived. Anders Søgaard’s partnerships with companies like Google guarantee an ongoing mutual exchange of knowledge, in which research into language technology makes an impact via the companies’ development and innovation.
The impact of language technology is documented in the following ways:
- Knowledge-sharing: Recordings of presentations or workshops at which Anders Søgaard or his colleagues have presented research to relevant companies or partners.
- Available data and resources: Collections with annotated data (metadata) and free-to-use language-technology resources.
- Entering into written agreements, contracts and partnership agreements.
Plank, Barbara; Hovy, Dirk; McDonald, Ryan; Søgaard, Anders. 2014: Adapting taggers to Twitter using (less) distant supervision. The 25th International Conference on Computational Linguistics (COLING). Dublin, Ireland.
Professor, PhD, Centre for Language Technology, Department of Nordic Research
Tel. +45 35 32 90 65