DFPY-78: Integrate German regex support#146
Conversation
Refs DFPY-71
Refs DFPY-74
Refs DFPY-75
Refs DFPY-72
Refs DFPY-73
Refs DFPY-76
|
Update: this PR now adopts the locale-gated German PII behavior proposed and iterated in #138 by @pranjalparmar. All German |
3bbff71 to
a0ab963
Compare
|
Hey @sidmohan0 👋 Really glad to see the locale-gated approach from #138 being adopted here! One thing I wanted to mention is that this is actually my first-ever open-source contribution, and I'm genuinely really excited to see the German PII patterns being integrated! Since the design and patterns from #138 are being adapted here, would it be possible to add me as a co-author on the relevant commits? GitHub supports this with:
Also looking forward to extending this further planning to contribute multi-country VAT/IBAN patterns and would love to get involved with the next version work too! Thanks again for all the guidance throughout this — it's been a great learning experience! 🙌 |
Co-authored-by: Pranjal Parmar <[email protected]>
a0ab963 to
f1a6fce
Compare
Summary
locales=["de"]or explicitentity_types, with context guards to avoid ordinary ticket/SKU/order ID false positives.scan,redact, guardrail helpers,DataFog,TextService, and the core text CLI commands.Review notes
Verification
DATAFOG_NO_TELEMETRY=1 DO_NOT_TRACK=1 .venv312/bin/python -m pytest tests/test_de_pii_regex.py tests/test_regex_annotator.py -qDATAFOG_NO_TELEMETRY=1 DO_NOT_TRACK=1 .venv312/bin/python -m pytest tests/test_detection_accuracy.py::test_structured_pii_detection_fast tests/test_detection_accuracy.py::test_negative_cases_fast tests/test_main.py::test_lean_datafog_detect tests/test_main.py::test_lean_datafog_process tests/test_client.py::test_scan_text_success tests/test_cli_smoke.py::test_redact_text_command -q.venv312/bin/pre-commit run --files README.md datafog/__init__.py datafog/agent.py datafog/client.py datafog/core.py datafog/engine.py datafog/main.py datafog/processing/text_processing/regex_annotator/regex_annotator.py datafog/services/text_service.py docs/cli.rst docs/getting-started.rst docs/python-sdk.rst tests/corpus/structured_pii.json tests/test_detection_accuracy.py tests/test_regex_annotator.py tests/test_de_pii_regex.py --show-diff-on-failureDATAFOG_NO_TELEMETRY=1 DO_NOT_TRACK=1 .venv312/bin/python -m sphinx -b html docs docs/_build/htmlDATAFOG_NO_TELEMETRY=1 DO_NOT_TRACK=1 .venv312/bin/python -m pytest tests/test_runtime_dependency_safety.py tests/test_no_network_core.py -qgit diff --checkDATAFOG_NO_TELEMETRY=1 DO_NOT_TRACK=1 .venv312/bin/python -m pytest -m "not slow" -q-> 583 passed, 4 skipped, 295 deselected, 19 xfailedRefs DFPY-78.