Skip to main content
Portugal

Job offer

  • JOB
  • France

Post-doctoral position Historical Handwritten Text Recognition & Information Extraction in Documents

Apply now
The Human Resources Strategy for Researchers
4 Jul 2025

Job Information

Organisation/Company
La Rochelle Université
Research Field
Computer science
Researcher Profile
First Stage Researcher (R1)
Recognised Researcher (R2)
Positions
Postdoc Positions
Country
France
Application Deadline
Type of Contract
Temporary
Job Status
Full-time
Offer Starting Date
Is the job funded through the EU Research Framework Programme?
Not funded by a EU programme
Is the Job related to staff position within a Research Infrastructure?
No

Offer Description

La Rochelle Université calls for applications for a post-doctoral position in computer science, in the field of handwritten text recognition and machine learning, for historical census documents.

  • Duration: 12 months (with possibilities of renewal for 12 months)
  • Desired hiring date: 1st october of 2025
  • Salary: 2570 € gross /month
  • Workplace: L3i laboratory in La Rochelle, France
  • Specialities: Computer Science / Machine Learning / Handwritten Text Recognition

Job Summary 
We are seeking a highly motivated and skilled Post-Doc in Machine to join our  research team focused on advancing Historical Text Recognition (HTR) systems. 
This role will involve developing and optimizing end-to-end solutions for extracting  meaningful information from degraded historical documents (circa 1690-1790). The ideal candidate will work with a variety of recent machine learning models and compare their performance against classical HTR and Named Entity Recognition (NER) systems. The role also involves working with multimodal architectures like 
Vision-Language Learning Models (VLLM) to improve the explainability, performance, and usability of these systems.

Context and Description of the Project 
The DAI-CRéTDHI project proposes to mobilize and adapt the tools of digital and data sciences, demography and anthroponymy to contribute to a better understanding of the population of France from the 16th to the 19th century by deploying both a retrospective approach on a national scale based on aggregated data from old civil status records and, on a few selected corpora, an "individual  approach" which collects and attributes to each actor a certain number of demographic characteristics (sex, age, marital status), family characteristics (fertility, household composition and position within it, etc.), relational characteristics (neighborhood of relatives, etc.), socio-professional characteristics (job, income level, etc.) and geographic characteristics (migrant or native, home address). The multiplicity of sources likely to provide detailed individual information on a significant number of clearly identified actors is well known to historians. Advances in data processing, in terms of engineering and visualization techniques, now make it possible to process considerable masses of data, provided that they are correctly structured and allow for nominative and family tracking. It is also possible to carry out more or less automated matching and to enrich these large corpora with contextual information (e.g. geographic environment) to broaden these analyses. In addition to the data already held by the partners, collaborative indexing (Geneanet) will make it possible to extend these corpora, in time and space.

Job description: 
The main objective of this Post-Doc position are:

  • Propose End-to-End Systems for Historical Text Recognition (HTR)
    • Design and implement HTR systems to process 17th and 18th-century documents (circa 1690-1790), which may include a variety of scripts and degraded text conditions
    • Use some Multimodal Vision-Language Learning Models (VLLM) to 
      extract information from historical documents, enabling enhanced 
      information extraction via in-context learning
  • Comparison with classical HTR + NER Systems:
    • Evaluate and benchmark the performance of modern Transformer-based models against classical HTR systems like Pylaia, Transkribus, etc
    • Analyze differences in accuracy, speed, explainability, and lisibility between classical and Transformer-based systems.

A focus will be made on the analysis and explainability steps in order to assess the acceptability of the proposed systems by scholars in Humanities.

In addition to this topic, the candidate will work on building on terminology extraction methods. The main objective of this young researcher grant is to promote access to French scientific documents to a broader audience and thus improve the international visibility of publications in French-language scientific journals, by automatically translating keywords and entities into English. The project aims to develop and adapt recent advances in deep learning for terminology and cross-lingual, cross-domain information extraction for this purpose.

The use case is based on the journal Sciences Eaux & Territoires (SET): Sciences Eaux & Territoires is a scientific and technical journal freely available online, published by Irstea since 2010. Its target audience includes public and private stakeholders and decision-makers involved in territorial development and environmental issues.

Concretely, using the terminology and abstracts of scientific articles that have been translated into English—on the one hand by their (French-speaking) authors and on the other by professional translators—the postdoctoral researcher will aim to develop tools that provide more effective machine translation. This will be validated both 
qualitatively (through comparative evaluations by professional translators) and quantitatively (by tracking changes in the number of accesses to the translated articles).

The project’s goal is to develop a prototype that can be generalized to other journals and other languages. 

Requirements

Research Field
Computer science
Education Level
PhD or equivalent
Skills/Qualifications

Candidate Profile: 
The candidate, who holds a Ph.D. in the fields of computer science, computer engineering, signal processing, or applied mathematics, must have a significant research experience in at least two of the following areas :

  • Deep Learning with Transformers
  • Machine learning
  • Computer Vision OR Natural Language Processing (knowledge and/or experience in both domains would be considered as a plus)

The candidate's skills will include:

Mastering Python programming language and deep learning framework (Pytorch ; Tensorflow ; …)

Very good teamwork skills (the work will be carried out both in conjunction with researchers from the L3i laboratory, the R&D department of the TEKLIA company and some Scholars in Humanities)

Good scientific writing skills, and fluency in writing and speaking English 

Additional Information

Selection process

To apply: 
Candidates for this position should send a CV and a cover letter (names and reference details would be appreciated).

This application must be submitted via the dedicated application form available by clicking here (Job reference: RECH/L3i/25-09).

Work Location(s)

Number of offers available
1
Company/Institute
La Rochelle Université
Country
France
State/Province
Nouvelle Aquitaine
City
La Rochelle
Postal Code
17000
Geofield

Contact

City
La Rochelle
Website
Street
23 avenue Albert Einstein, BP 33060
Postal Code
17031
E-Mail
antoine.doucet@univ-lr.fr
mickael.coustaty@univ-lr.fr

Share this page