Academic Humanize Qwen2.5-7B DPO-v2 LoRA

This repository contains the DPO-v2 LoRA adapter for Academic Humanize, a post-training project for reducing AI-like patterns in academic English while preserving meaning, terminology, citations, numbers, and logical relationships.

The adapter is trained on top of:

Qwen/Qwen2.5-7B-Instruct

This is a PEFT LoRA adapter, not a full standalone base model. To use it, load the base model first and then attach this adapter with peft.

Model Details

Model Description

Academic Humanize is designed for academic paragraph rewriting. Given an AI-like academic draft, the model rewrites it into a more natural scholarly English paragraph while keeping the original meaning intact.

The target task format is:

instruction + AI-like academic input -> humanized academic output

The project uses a two-stage post-training pipeline:

QLoRA SFT: teaches the model the academic humanization format and basic rewriting behavior.
SPIN-style iterative DPO: uses model-generated outputs as rejected responses and human references as chosen responses to further align the model toward more natural academic writing.

This repository hosts the DPO-v2 adapter, which is the second DPO iteration after SFT and DPO-v1.

Developed by: XiaoXu123123
Model type: PEFT LoRA adapter for causal language modeling
Language: English
Base model: Qwen/Qwen2.5-7B-Instruct
Training method: QLoRA SFT + SPIN-style DPO + iterative DPO
Task: Academic text humanization / academic rewriting
License: MIT

Model Sources

GitHub Repository: https://github.com/haibarazz/academic-humanize
Base Model: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

Intended Use

Direct Use

This adapter is intended for academic English rewriting. It is useful when the input text is grammatically correct but sounds overly generic, templated, or AI-like.

Example input:

This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement.

Expected output style:

This study examines how adaptive feedback mechanisms support online learning. The results show that personalized intervention can improve student engagement.

Downstream Use

This adapter can be used as a component in:

academic writing assistants
paper polishing tools
AI-text humanization experiments
preference optimization / DPO research demos
evaluation pipelines for academic rewriting

Out-of-Scope Use

This model is not intended for:

generating fabricated academic claims
rewriting text while changing its meaning
bypassing academic integrity policies
replacing human proofreading for final publication
domains where factual precision must be professionally verified

How to Use

Install dependencies:

pip install transformers peft accelerate torch

Load the base model and adapter:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Example inference:

instruction = "Rewrite the following academic paragraph to make it more natural and less AI-like while preserving the original meaning, terminology, numbers, citations, and logical relationships."

input_text = """This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement."""

messages = [
    {"role": "system", "content": "You are an academic English rewriting assistant. Reduce AI-like wording while preserving meaning."},
    {"role": "user", "content": f"{instruction}\n\nInput:\n{input_text}"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        temperature=None,
        top_p=None,
    )

generated = tokenizer.decode(
    output_ids[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(generated.strip())

Training Details

Training Data

The training data was constructed for the Academic Humanize V2 task. Each sample contains:

instruction
input: AI-like academic draft
output: human or high-quality academic reference

The full training corpus is not released in this model repository because it was derived from academic-paper text and project-specific processing. The GitHub repository provides toy examples and scripts for reproducing the pipeline structure.

Training Procedure

The full training pipeline contains:

Academic paragraph collection and filtering.
AI-like draft construction.
QLoRA SFT on Qwen2.5-7B-Instruct.
DPO-v1 using:
- chosen = human reference
- rejected = SFT model prediction
DPO-v2 using:
- chosen = human reference
- rejected = DPO-v1 model prediction

Training Hyperparameters

The final DPO-v2 training used conservative preference-optimization settings to reduce semantic drift:

base model: Qwen/Qwen2.5-7B-Instruct
adapter method: LoRA / PEFT
training method: iterative DPO
learning rate: 1e-6
beta: 0.05
epochs: 1

Evaluation

Evaluation was performed on a held-out Academic Humanize validation set of 346 samples.

The project uses two evaluation layers:

Automatic semantic metrics
- BERTScore-F1
- chrF++
- BLEU
- TER
- format violation rate
LLM-as-Judge
- lexical markers
- structural patterns
- naturalness
- semantic faithfulness
- terminology accuracy
- edit value

Automatic Metrics

Model	BERTScore-F1	chrF++	BLEU	TER	Format Violation
SFT LoRA	0.9738	84.72	72.01	24.93	0.023
DPO-v1	0.9664	78.26	63.95	31.73	0.023
DPO-v2	0.9709	81.89	68.95	27.73	0.023

LLM-as-Judge Results

Judge model: deepseek-v4-flash

Model	Judge Norm	Total	Naturalness	Semantic	Terminology	Edit Value
SFT LoRA	0.9003	7.202	1.725	1.731	0.994	0.939
DPO-v1	0.9241	7.393	1.827	1.633	0.988	0.965
DPO-v2	0.9223	7.379	1.795	1.691	0.991	0.957

Summary

DPO-v2 keeps most of the judge-score improvement from DPO-v1 while recovering more semantic fidelity. It is the best local trade-off among the trained 7B adapters in this project.

Bias, Risks, and Limitations

The model may still introduce subtle semantic drift during rewriting.
The model is optimized for academic English and may not generalize well to casual writing or other languages.
The model should not be used to fabricate claims, citations, results, or academic evidence.
LLM-as-Judge scores are useful for comparison but should not replace human review.
The training data and evaluation set are project-specific, so results may vary across domains.

Recommendations

For high-stakes academic use, users should manually verify:

factual claims
numbers and units
citations
domain terminology
logical strength of conclusions

Technical Specifications

Model Architecture and Objective

Base architecture: Qwen2.5-7B-Instruct
Adapter type: LoRA
Library: PEFT
Objective: DPO preference optimization after supervised fine-tuning

Software

PEFT 0.13.2
Transformers
TRL
PyTorch

Citation

If you use this adapter or the associated pipeline, please cite the GitHub repository:

@misc{academic_humanize_2026,
  title = {Academic Humanize: Post-training LLMs for Academic Text Humanization},
  author = {XiaoXu123123},
  year = {2026},
  howpublished = {\url{https://github.com/haibarazz/academic-humanize}}
}

Model Card Contact

For questions or issues, please contact:

2812156857@qq.com

Downloads last month: 48

Model tree for XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2171)

this model