At TTA's AI Reliability Center, we have meticulously certified an AI-powered video interview assessment system for its reliability in the high-stakes field of recruitment. This product, known as the 'subject product,' has undergone a rigorous certification process, showcasing the developer's unwavering commitment to meeting the stringent requirements for AI reliability (TTAK.KO-10.1497). The developer's dedication and diligence in ensuring a trustworthy AI system are evident.
We aim to provide practical guidance on achieving reliable AI by showcasing this exemplary case of AI reliability certification in recruitment. This will be valuable for stakeholders considering the adoption of AI in recruitment.
The subject product, a unique AI-powered video interview assessment system, evaluates candidates by analyzing video interviews and responses through AI. It combines soft skills assessment and certification of communication abilities suitable for the organization and job with behaviour event interview (BEI) techniques. BEI assesses past behaviour to predict high-performance job capabilities. Together, these methods provide a comprehensive evaluation to support decision-makers.
The subject product's users fall into two main groups: clients (organizations using the product for their recruitment processes) and candidates (individuals applying for positions and undergoing interviews).
The certification case details the adherence to the AI reliability enhancement requirements (TTAK.KO-10.1497), documented sequentially by requirement number.
The developer has implemented a comprehensive risk management plan that leaves no stone unturned. This plan addresses risks that may arise during the development and operation stages, involving nearly all departments (planning, development, quality assurance, operations, etc.). The process is meticulously documented for effective management, ensuring the security of the AI system.
Each department plays a vital role in risk management, identifying potential risks based on empirical and research-based knowledge of the AI lifecycle. This comprehensive process includes:
The pivotal AI Reliability Governance Committee oversees the subject product's risk management at the organizational level. With its detailed roles and responsibilities, this committee is crucial in ensuring the system's reliability and adherence to ethical core Requirements.
To ensure AI reliability, the developer has established and operates an "AI Reliability Governance Committee." This committee has undertaken the following key activities:
The operational environment for the subject product includes the operational managers and interviewers (referred to as recruitment experts) from client companies using the product for their recruitment processes, as well as the interview candidates (referred to as applicants).
A working group that included these users and developers was formed to design the test for the subject product. The tests were conducted in an environment identical to actual video interviews. Below are the details of the tests that were created and performed.
(Traceability of AI Decision-Making) The analysis results displayed on the operational page can be verified against information in the applicant database. Additionally, the system allows tracking the initial input information via the applicant key, such as the uploaded initial video's storage path.
(Source of Training Data) Only the interview videos of actual applicants using the system were utilized as the source of training data, ensuring a relatively reliable data source. The subject product maintains an internal system (the "training data system") for processing, storing, and managing the training data. This system logs data usage, facilitating easy tracking of changes.
(Changes to Training Data) To manage changes to the data used for model training, the development organization maintains "data sheets" and "model cards." These documents enable the reproduction of specific datasets used by models.
Type | Details |
Data Sheet | This record records all information required to reconstruct the same dataset from the training data system, including details about models and related models and research conducted using the dataset. |
Model Card | Contains an overview of the model, description, structure, training dataset, libraries used, input/output details, use cases, benchmark results, usage issues, provision history and status, reliability records (bias, validity), and related models. |
(Refined Information) Since the interview videos uploaded through the actual system are used as the source of training data, primary data refinement occurs during this process. This is known as the "Information Insufficiency" process, which involves detecting and preventing issues in the videos before and during the interview.
(Metadata) The training data system automatically attaches metadata to the uploaded data, so no separate specification is required. However, the following information can be accessed:
(Protected Variables) No demographic information is tagged in the training data. Therefore, protected variables are not explicitly designated or managed, but other factors that could cause bias in the results are defined and managed separately. This is detailed in Requirement 9.
(Labeling Training and Guidelines) A labelling workshop was conducted to educate labellers. During the seminar, documentation on data labelling standards (inspection criteria) and methods was provided and taught.
The strategy for ensuring the subject product's robustness is to block the inflow of anomalous data preemptively. In addition to the "Information Insufficiency" process described in Requirement 5, the "AI Supervisor" feature was implemented to inform applicants of and avoid providing anomalous data. The system detects anomalies in the table below during the interview recording. When anomalies such as capturing, recording, or screen sharing are detected, the system displays a warning or enforces a logout.
Features | Details |
Prevent Proxy Testing | This method compares the interview video with the profile picture to confirm the same person's identity. |
Detection of Accompanied Persons | Checks if any face other than the applicant's is detected. |
Prevention of Capturing, Recording, and Screen Sharing | Verifies whether the screen is captured or recorded. |
Mask Detection | Detects if a mask is worn to hide facial expressions. |
Answer Similarity Check Among Applicants | Detects if the applicant verbally repeats answers from a consultation or previous applicant. |
By implementing these measures, the system aims to maintain the integrity and reliability of the data collected during the interview process, ensuring that the results are accurate and trustworthy.
(Human Bias in Data Collection) The source training data for the subject product is the interview videos recorded by the applicants. As described in Requirements 5 and 6, this data undergoes a cleaning process before being uploaded to the training system. Since no separate data collector is involved in this process, the intentional selection of source data does not occur.
(Physical Bias of Collection Devices) Applicants record their videos using their own devices. During the interview process using the subject product, users are provided with minimum device specifications and guidelines. The system collects inputs from a variety of devices within these specifications.
The development organization has established rules for using open-source libraries and frameworks and manages these checks through documented verification. To use open-source libraries and frameworks, the following aspects must be verified:
The management documents record the following details.
Document | Details |
Total open source usage status | open source name, purpose of use, homepage, license, version, activity level, creation date |
Open source usage status by model | model ID, applicable service, department using, open source name, license, license display obligation, GitHub link, activity level (number of stars), creation date |
When updating or applying new models to the AI system, the company follows a unique procedure to verify their validity and bias. The development company has established step-by-step verification datasets for all defined verification elements. The initial construction of several verification sets has been completed, and datasets for the remaining verification elements are being developed for future experiments.
(Procedure) - Internal Model Verification - Official Validity Verification - Model Deployment Testing
(Method) Utilize verification datasets built under actual video interview conditions to validate the significance and validity of verification metrics.
(Verification Elements) Define elements that may cause bias as verification elements and check for significant biases in the AI inference results.
Since accurate labels were not tagged for gender, a discernment model was used to estimate gender and construct the verification set.
Type | Details |
Verification Set Construction Completed | Brightness, recording equipment, gender, glasses, volume, pitch, speech information, camera angle, recording distance. |
Verification Set Construction in Progress | Resolution, hairstyle, speech speed, pronunciation accuracy, specific behaviours, accessories, lighting, technical jargon, etc. |
(Verification Metrics) Various metrics are used for each verification element
The following measures have been established to address potential attacks on the AI system, such as model extraction and model evasion.
(AI Model Specifications) The subject product's inference model is managed as a pair with its data, using model cards and data sheets (see Requirement 4) to specify and manage model versions.
(Inference Basis) Various XAI (Explainable AI) technologies were experimented with to identify factors significantly influencing bias (no actual applications were made). The technologies tested include:
(Interpretation of Inference Results) The development company undertakes the following activities to aid in the interpretation of the model's inference results:
(Bias from User Interface) User input is received through video input devices, and the system displays queries in text. Bias mitigation related to devices is included in the verification elements as part of the company’s validation procedures (see Requirement 9).
(Bias from Interaction Methods) The system functions as a one-way communication tool, and efforts have been made to enhance the clarity of questions to reduce bias based on the applicant’s understanding.
(User Error Guidance and Exception Handling Policies) Applicants are preemptively provided with Common issues and solutions. A dedicated channel is available for users to report exceptions or system problems, and this channel is always open.
(Defense Measures Against System Attacks) To defend against attacks, the measures outlined in Requirement 10 are implemented. The system undergoes regular vulnerability scans, and actions are taken based on the results.
(Human Intervention in AI Inference Results) The primary function of the subject product is to support interviewers’ decisions. The system provides reference screens for final decision-making. The option to replace interviews is available, but the decision is left to the client. If clients use the system to replace interviews, they may incorporate human intervention according to their policies.
(System Error Monitoring and Notification) Continuous server monitoring and auto-scaling technology are applied to prevent server anomalies. Notifications are sent to the development team via collaboration tools if a server issue occurs, prompting immediate response based on internal procedures.
(User Surveys) Surveys were conducted with applicants and client companies to gather feedback, which was then integrated into the system.
(Providing Interpretation Materials) Guides explaining the inference results page, including the meaning of technical terms and result metrics, are provided to interviewers accessing the AI inference results.
(Correct Use of Service) Clients are provided with explanatory materials outlining the purpose and goals of the subject product. As the primary operators, clients are guided to explain the service’s limitations and scope to applicants, including AI analyzing interview videos.
(Human Intervention in Decision-Making) The product’s inferences are meant to support client interviewers' decisions. Adopting these inferences for final decision-making depends on the client's policies. If clients fully replace interview processes with the product, they are strongly advised to disclose this and establish procedures for human intervention.