-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Pull requests: huggingface/datasets
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
chore: enable Dependabot weekly GitHub Actions bumps
dependabot
#8221
opened May 26, 2026 by
hf-dependantbot-rollout
Bot
Loading…
Add
.conll / .conllu dataset format loader (CoNLL-2003 / 2000 / U)
#8219
opened May 23, 2026 by
CrypticCortex
Loading…
Include version in DatasetInfo YAML so push_to_hub preserves it
#8218
opened May 22, 2026 by
adityasingh2400
Loading…
Read cached dataset_info.json to populate config_names offline
#8216
opened May 21, 2026 by
adityasingh2400
Loading…
Warn instead of raise when user-provided data_files yields a subset
#8215
opened May 21, 2026 by
adityasingh2400
Loading…
feat: Add return_file_name parameter to JSON builder (#5806)
#8214
opened May 21, 2026 by
Kinetic-Labs-GT
Loading…
fix docs: use AutoImageProcessor for vision model examples
#8207
opened May 18, 2026 by
wali-reheman
Loading…
fix(webdataset): when loading data in WebDataset format using load_datasets during multi-matchines training.
#8203
opened May 16, 2026 by
Wolfram-St
Loading…
3 tasks done
fix pathlib.Path support in save_to_disk and load_from_disk
#8202
opened May 16, 2026 by
JiwaniZakir
Loading…
replace AutoFeatureExtractor with AutoImageProcessor in docs
#8200
opened May 15, 2026 by
ajaystar8
Loading…
fix(arrow_dataset): clear stale local temp dir before re-downloading from remote FS
#8196
opened May 14, 2026 by
xodn348
Loading…
3 tasks done
Fix spurious label column when folder builders see split-named directories
#8195
opened May 12, 2026 by
1fanwang
Loading…
Escape glob chars in
base_path so directory paths with [] work (#7468)
#8192
opened May 10, 2026 by
jbbqqf
Loading…
Preserve
info.features across IterableDataset.map(remove_columns=...) (#7568)
#8191
opened May 10, 2026 by
jbbqqf
Loading…
Reject
num_shards > len(dataset) in Dataset.shard (#7443)
#8190
opened May 10, 2026 by
jbbqqf
Loading…
docs: clarify
num_proc semantics in Dataset.batch / Dataset.filter (#7700)
#8189
opened May 10, 2026 by
jbbqqf
Loading…
Don't infer labels from split-named directories in folder-based builders (#7880)
#8188
opened May 10, 2026 by
jbbqqf
Loading…
Better error for sliced splits in streaming mode (
train[:10%], train[:N]) (#7721)
#8187
opened May 10, 2026 by
jbbqqf
Loading…
docs: fix broken WebDataset documentation link in audio/video/image dataset pages (#7699)
#8186
opened May 10, 2026 by
jbbqqf
Loading…
docs: make
Dataset.map batched example self-contained (#7703)
#8185
opened May 10, 2026 by
jbbqqf
Loading…
Preserve triple-slash in remote URLs (HDFS, file://, ...) in
_as_str (#7934)
#8184
opened May 10, 2026 by
jbbqqf
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.