Instructions to use XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora") - Notebooks
- Google Colab
- Kaggle
Academic Humanize Qwen2.5-7B DPO-v2 LoRA
This repository contains the DPO-v2 LoRA adapter for Academic Humanize, a post-training project for reducing AI-like patterns in academic English while preserving meaning, terminology, citations, numbers, and logical relationships.
The adapter is trained on top of:
Qwen/Qwen2.5-7B-Instruct
This is a PEFT LoRA adapter, not a full standalone base model. To use it, load the base model first and then attach this adapter with peft.
Model Details
Model Description
Academic Humanize is designed for academic paragraph rewriting. Given an AI-like academic draft, the model rewrites it into a more natural scholarly English paragraph while keeping the original meaning intact.
The target task format is:
instruction + AI-like academic input -> humanized academic output
The project uses a two-stage post-training pipeline:
- QLoRA SFT: teaches the model the academic humanization format and basic rewriting behavior.
- SPIN-style iterative DPO: uses model-generated outputs as rejected responses and human references as chosen responses to further align the model toward more natural academic writing.
This repository hosts the DPO-v2 adapter, which is the second DPO iteration after SFT and DPO-v1.
- Developed by: XiaoXu123123
- Model type: PEFT LoRA adapter for causal language modeling
- Language: English
- Base model: Qwen/Qwen2.5-7B-Instruct
- Training method: QLoRA SFT + SPIN-style DPO + iterative DPO
- Task: Academic text humanization / academic rewriting
- License: MIT
Model Sources
- GitHub Repository: https://github.com/haibarazz/academic-humanize
- Base Model: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
Intended Use
Direct Use
This adapter is intended for academic English rewriting. It is useful when the input text is grammatically correct but sounds overly generic, templated, or AI-like.
Example input:
This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement.
Expected output style:
This study examines how adaptive feedback mechanisms support online learning. The results show that personalized intervention can improve student engagement.
Downstream Use
This adapter can be used as a component in:
- academic writing assistants
- paper polishing tools
- AI-text humanization experiments
- preference optimization / DPO research demos
- evaluation pipelines for academic rewriting
Out-of-Scope Use
This model is not intended for:
- generating fabricated academic claims
- rewriting text while changing its meaning
- bypassing academic integrity policies
- replacing human proofreading for final publication
- domains where factual precision must be professionally verified
How to Use
Install dependencies:
pip install transformers peft accelerate torch
Load the base model and adapter:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "XiaoXu123123/academic-humanize-qwen25-7b-dpo-v2-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
Example inference:
instruction = "Rewrite the following academic paragraph to make it more natural and less AI-like while preserving the original meaning, terminology, numbers, citations, and logical relationships."
input_text = """This study endeavors to explore the multifaceted role of adaptive feedback mechanisms in online learning environments. The results underscore the pivotal importance of personalized intervention for improving student engagement."""
messages = [
{"role": "system", "content": "You are an academic English rewriting assistant. Reduce AI-like wording while preserving meaning."},
{"role": "user", "content": f"{instruction}\n\nInput:\n{input_text}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
temperature=None,
top_p=None,
)
generated = tokenizer.decode(
output_ids[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True,
)
print(generated.strip())
Training Details
Training Data
The training data was constructed for the Academic Humanize V2 task. Each sample contains:
instruction
input: AI-like academic draft
output: human or high-quality academic reference
The full training corpus is not released in this model repository because it was derived from academic-paper text and project-specific processing. The GitHub repository provides toy examples and scripts for reproducing the pipeline structure.
Training Procedure
The full training pipeline contains:
- Academic paragraph collection and filtering.
- AI-like draft construction.
- QLoRA SFT on Qwen2.5-7B-Instruct.
- DPO-v1 using:
- chosen = human reference
- rejected = SFT model prediction
- DPO-v2 using:
- chosen = human reference
- rejected = DPO-v1 model prediction
Training Hyperparameters
The final DPO-v2 training used conservative preference-optimization settings to reduce semantic drift:
base model: Qwen/Qwen2.5-7B-Instruct
adapter method: LoRA / PEFT
training method: iterative DPO
learning rate: 1e-6
beta: 0.05
epochs: 1
Evaluation
Evaluation was performed on a held-out Academic Humanize validation set of 346 samples.
The project uses two evaluation layers:
Automatic semantic metrics
- BERTScore-F1
- chrF++
- BLEU
- TER
- format violation rate
LLM-as-Judge
- lexical markers
- structural patterns
- naturalness
- semantic faithfulness
- terminology accuracy
- edit value
Automatic Metrics
| Model | BERTScore-F1 | chrF++ | BLEU | TER | Format Violation |
|---|---|---|---|---|---|
| SFT LoRA | 0.9738 | 84.72 | 72.01 | 24.93 | 0.023 |
| DPO-v1 | 0.9664 | 78.26 | 63.95 | 31.73 | 0.023 |
| DPO-v2 | 0.9709 | 81.89 | 68.95 | 27.73 | 0.023 |
LLM-as-Judge Results
Judge model: deepseek-v4-flash
| Model | Judge Norm | Total | Naturalness | Semantic | Terminology | Edit Value |
|---|---|---|---|---|---|---|
| SFT LoRA | 0.9003 | 7.202 | 1.725 | 1.731 | 0.994 | 0.939 |
| DPO-v1 | 0.9241 | 7.393 | 1.827 | 1.633 | 0.988 | 0.965 |
| DPO-v2 | 0.9223 | 7.379 | 1.795 | 1.691 | 0.991 | 0.957 |
Summary
DPO-v2 keeps most of the judge-score improvement from DPO-v1 while recovering more semantic fidelity. It is the best local trade-off among the trained 7B adapters in this project.
Bias, Risks, and Limitations
- The model may still introduce subtle semantic drift during rewriting.
- The model is optimized for academic English and may not generalize well to casual writing or other languages.
- The model should not be used to fabricate claims, citations, results, or academic evidence.
- LLM-as-Judge scores are useful for comparison but should not replace human review.
- The training data and evaluation set are project-specific, so results may vary across domains.
Recommendations
For high-stakes academic use, users should manually verify:
- factual claims
- numbers and units
- citations
- domain terminology
- logical strength of conclusions
Technical Specifications
Model Architecture and Objective
- Base architecture: Qwen2.5-7B-Instruct
- Adapter type: LoRA
- Library: PEFT
- Objective: DPO preference optimization after supervised fine-tuning
Software
- PEFT 0.13.2
- Transformers
- TRL
- PyTorch
Citation
If you use this adapter or the associated pipeline, please cite the GitHub repository:
@misc{academic_humanize_2026,
title = {Academic Humanize: Post-training LLMs for Academic Text Humanization},
author = {XiaoXu123123},
year = {2026},
howpublished = {\url{https://github.com/haibarazz/academic-humanize}}
}
Model Card Contact
For questions or issues, please contact:
2812156857@qq.com
- Downloads last month
- 48