/sysroot Mount Failure
Context
System Specifications
Hardware platform: TaiShan200 (model 1280V2) Kernel version: 5.10.0-153.1.0.81.oe2203sp2.aarch64
Software Version
openEuler 22.03-LTS-SP2
Symptom
During repeated power cycling tests, intermittent mount failures occur when the system fails to load ext4 or VFAT driver into the kernel during boot.

Possible Causes
This low-probability boot-time issue requires meticulous investigation. The diagnostic process follows a progressive elimination approach.
1. Mount Operation Failure

Debugging with udev.log_priority=debug rd.debug=1 reveals:

The system call fails during mount. Additional kernel logging shows:

Critical findings regarding VFAT mounting:
- The kernel attempts to load the required module via
request_module1whenget_fs_type1detects its absence. - Though the execution path reaches user-space modprobe invocation (
call_modprobe->call_usermodehelper_exec), it returns error code 256.
Diagnosis: User-space module loading fails with exit code 256, warranting investigation into the modprobe failure mechanism.
2. Module Load Failure
Because user-space logs are directed to the command line and unavailable in emergency mode, kernel-level debug prints were implemented:


Log analysis confirms that the driver failed to load during kernel execution via modprobe.
DIagnosis: Either module_sig_check or setup_load_info returned error -129, causing the failure.
3. Signature Verification Failure
The verification process fails with error code -129, occurring through the following call chain:
load_module->module_sig_check->mod_verify_sig->verify_pkcs7_signature->verify_pkcs7_message_sig->pkcs7_validate_trust->pkcs7_validate_trust_one->verify_signature->public_key_verify_signature->crypto_akcipher_verify->pkcs1pad_verify->pkcs1pad_verify_complete

Log analysis shows that while out_buf remains unchanged in both successful and failed verifications, the values of req_ctx->out_buf + ctx->key_size differ, indicating an anomaly in the data. The problematic data is retrieved via sg_pcopy_to_buffer and corresponds to the digest portion of the signature. This digest is produced by hashing the original data and encrypting it with a private key. During verification, the public key decrypts the signature to extract the digest, while the system independently recalculates the digest from the original data. A mismatch between these values suggests either data corruption or tampering.
Additional logging confirmed that pkcs7->data remains consistent in both passing and failing cases, narrowing the issue to the digest comparison phase. 
Further investigation revealed that all failures occurred when using the sha256-ce cryptographic driver, which employs ARMv8 CPU instructions for accelerated hashing. Since the issue could not be reproduced on other systems with the same kernel version, the problem appears to be hardware-specific.

A test script repeatedly loading and unloading the kernel module successfully reproduced the error in the affected environment.

Diagnosis: The evidence points to a CPU hardware issue.
4. CPU Hardware Defect
The failure does not occur in standard test environments. Hardware diagnostics revealed:
The affected processor is from an early engineering sample batch.
Specific Arm instruction executions fail on:
txtAdvsimdLoadStore LCRTSveVectorMoveDiagnostic tests confirm failures exclusively on CPU cores 130 and 131.
When the kernel module stress test script is pinned to cores 130/131, the signature verification failure consistently occurs. The same test passes when restricted to core 129.

Diagnosis: Defective CPU silicon in cores 130/131 causes cryptographic instruction failures.
Solution
Hardware replacement:
Physically replacing the faulty CPU with a production-qualified unit provides a permanent solution.
Software mitigation:
Implement CPU affinity controls to exclude the malfunctioning cores from critical operations as a temporary mitigation.
Licensed under the MulanPSL2