Academic Humanize Qwen2.5-7B DPO-v2 LoRA

This repository contains the DPO-v2 LoRA adapter for Academic Humanize, a post-training project for reducing AI-like patterns in academic English while preserving meaning, terminology, citations, numbers, and logical relationships.

The adapter is trained on top of:

Qwen/Qwen2.5-7B-Instruct

This is a PEFT LoRA adapter, not a full standalone base model. To use it, load the base model first and then attach this adapter with peft.

Model Details

Model Description

Academic Humanize is designed for academic paragraph rewriting. Given an AI-like academic draft, the model rewrites it into a more natural scholarly English paragraph while keeping the original meaning intact.

The target task format is:

instruction + AI-like academic input -> humanized academic output

The project uses a two-stage post-training pipeline:

  1. QLoRA SFT: teaches the model the academic humanization format and basic rewriting behavior.
  2. SPIN-style iterative DPO: uses model-generated outputs as rejected responses and human references as chosen responses to further align the model toward more natural academic writing.

This repository hosts the DPO-v2 adapter, which is the second DPO iteration after SFT and DPO-v1.

  • Developed by: XiaoXu123123
  • Model type: PEFT LoRA adapter for causal language modeling
  • Language: English
  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Training method: QLoRA SFT + SPIN-style DPO + iterative DPO
  • Task: Academic text humanization / academic rewriting
  • License: MIT

Model Sources

Intended Use

Direct Use

This adapter is intended for academic English rewriting. It is useful when the input text is grammatically correct but sounds overly generic, templated, or AI-like.

Example input:

This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement.

Expected output style:

This study examines how adaptive feedback mechanisms support online learning. The results show that personalized intervention can improve student engagement.

Downstream Use

This adapter can be used as a component in:

  • academic writing assistants
  • paper polishing tools
  • AI-text humanization experiments
  • preference optimization / DPO research demos
  • evaluation pipelines for academic rewriting

Out-of-Scope Use

This model is not intended for:

  • generating fabricated academic claims
  • rewriting text while changing its meaning
  • bypassing academic integrity policies
  • replacing human proofreading for final publication
  • domains where factual precision must be professionally verified

How to Use

Install dependencies:

pip install transformers peft accelerate torch

Load the base model and adapter:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Example inference:

instruction = "Rewrite the following academic paragraph to make it more natural and less AI-like while preserving the original meaning, terminology, numbers, citations, and logical relationships."

input_text = """This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement."""

messages = [
    {"role": "system", "content": "You are an academic English rewriting assistant. Reduce AI-like wording while preserving meaning."},
    {"role": "user", "content": f"{instruction}\n\nInput:\n{input_text}"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        temperature=None,
        top_p=None,
    )

generated = tokenizer.decode(
    output_ids[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(generated.strip())

Training Details

Training Data

The training data was constructed for the Academic Humanize V2 task. Each sample contains:

instruction
input: AI-like academic draft
output: human or high-quality academic reference

The full training corpus is not released in this model repository because it was derived from academic-paper text and project-specific processing. The GitHub repository provides toy examples and scripts for reproducing the pipeline structure.

Training Procedure

The full training pipeline contains:

  1. Academic paragraph collection and filtering.
  2. AI-like draft construction.
  3. QLoRA SFT on Qwen2.5-7B-Instruct.
  4. DPO-v1 using:
    • chosen = human reference
    • rejected = SFT model prediction
  5. DPO-v2 using:
    • chosen = human reference
    • rejected = DPO-v1 model prediction

Training Hyperparameters

The final DPO-v2 training used conservative preference-optimization settings to reduce semantic drift:

base model: Qwen/Qwen2.5-7B-Instruct
adapter method: LoRA / PEFT
training method: iterative DPO
learning rate: 1e-6
beta: 0.05
epochs: 1

Evaluation

Evaluation was performed on a held-out Academic Humanize validation set of 346 samples.

The project uses two evaluation layers:

  1. Automatic semantic metrics

    • BERTScore-F1
    • chrF++
    • BLEU
    • TER
    • format violation rate
  2. LLM-as-Judge

    • lexical markers
    • structural patterns
    • naturalness
    • semantic faithfulness
    • terminology accuracy
    • edit value

Automatic Metrics

Model BERTScore-F1 chrF++ BLEU TER Format Violation
SFT LoRA 0.9738 84.72 72.01 24.93 0.023
DPO-v1 0.9664 78.26 63.95 31.73 0.023
DPO-v2 0.9709 81.89 68.95 27.73 0.023

LLM-as-Judge Results

Judge model: deepseek-v4-flash

Model Judge Norm Total Naturalness Semantic Terminology Edit Value
SFT LoRA 0.9003 7.202 1.725 1.731 0.994 0.939
DPO-v1 0.9241 7.393 1.827 1.633 0.988 0.965
DPO-v2 0.9223 7.379 1.795 1.691 0.991 0.957

Summary

DPO-v2 keeps most of the judge-score improvement from DPO-v1 while recovering more semantic fidelity. It is the best local trade-off among the trained 7B adapters in this project.

Bias, Risks, and Limitations

  • The model may still introduce subtle semantic drift during rewriting.
  • The model is optimized for academic English and may not generalize well to casual writing or other languages.
  • The model should not be used to fabricate claims, citations, results, or academic evidence.
  • LLM-as-Judge scores are useful for comparison but should not replace human review.
  • The training data and evaluation set are project-specific, so results may vary across domains.

Recommendations

For high-stakes academic use, users should manually verify:

  • factual claims
  • numbers and units
  • citations
  • domain terminology
  • logical strength of conclusions

Technical Specifications

Model Architecture and Objective

  • Base architecture: Qwen2.5-7B-Instruct
  • Adapter type: LoRA
  • Library: PEFT
  • Objective: DPO preference optimization after supervised fine-tuning

Software

  • PEFT 0.13.2
  • Transformers
  • TRL
  • PyTorch

Citation

If you use this adapter or the associated pipeline, please cite the GitHub repository:

@misc{academic_humanize_2026,
  title = {Academic Humanize: Post-training LLMs for Academic Text Humanization},
  author = {XiaoXu123123},
  year = {2026},
  howpublished = {\url{https://github.com/haibarazz/academic-humanize}}
}

Model Card Contact

For questions or issues, please contact:

2812156857@qq.com
Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(2171)
this model