# Hugging Face Upload Instructions This folder contains everything needed to publish the African CP Synthetic Dataset to Hugging Face. ## Folder Contents ### Dataset Files (7 CSVs - 3.1 MB total) - ✅ `africa_cp_train_1000.csv` - Small training set - ✅ `africa_cp_train_5000.csv` - Medium training set - ✅ `africa_cp_train_10000.csv` - Large training set - ✅ `africa_cp_balanced_1000.csv` - Balanced 50/50 set - ✅ `africa_cp_preterm_2000.csv` - High-risk preterm cohort - ✅ `africa_cp_cases_only_500.csv` - CP cases only - ✅ `africa_cp_test_2000.csv` - Hold-out test set ### Documentation - ✅ `README.md` - Dataset card (auto-displays on HF) - ✅ `CITATION.bib` - Citation information - ✅ `LICENSE` - CC BY-NC 4.0 license - ✅ `dataset_info.yaml` - Dataset metadata ### Code - ✅ `cp_data_generator.py` - Generator script - ✅ `load_dataset.py` - Example loading code ### Configuration - ✅ `.gitattributes` - Git LFS configuration for large files --- ## Upload Steps ### Option 1: Using Hugging Face Web Interface (Easiest) 1. **Create account** at https://huggingface.co/join 2. **Create new dataset**: - Go to https://huggingface.co/new-dataset - Dataset name: `african-cp-synthetic` (or your choice) - License: `cc-by-nc-4.0` - Make public or private 3. **Upload files**: - Click "Files" tab - Drag and drop all files from this folder - Or use "Add file" → "Upload files" - Commit message: "Initial dataset upload" 4. **Verify**: - README.md should auto-display as dataset card - All 7 CSV files should be visible - Check file sizes are correct ### Option 2: Using Git (Advanced) ```bash # 1. Install git-lfs (for large files) git lfs install # 2. Clone your dataset repo git clone https://huggingface.co/datasets/[your-username]/african-cp-synthetic cd african-cp-synthetic # 3. Copy files from this folder cp /path/to/huggingface_upload/* . # 4. Add files git add . # 5. Commit git commit -m "Initial dataset upload" # 6. Push git push ``` ### Option 3: Using Hugging Face CLI ```bash # 1. Install huggingface_hub pip install huggingface_hub # 2. Login huggingface-cli login # 3. Upload folder huggingface-cli upload [your-username]/african-cp-synthetic /path/to/huggingface_upload ``` --- ## After Upload ### 1. Update README.md Replace placeholders in README.md: - `[your-username]` → Your actual Hugging Face username - Add contributors/authors if applicable ### 2. Add Dataset Card Metadata (Top of README.md) ```yaml --- license: cc-by-nc-4.0 task_categories: - tabular-classification - medical task_ids: - binary-classification pretty_name: African Cerebral Palsy Synthetic Dataset size_categories: - 10K