Augmenty is an augmentation library based on spaCy for augmenting texts.

Last update: Dec 29, 2022

Overview

Augmenty: The cherry on top of your NLP pipeline

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)

spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides	Guides and instruction on how to use augmenty and its features.
📰 News and changelog	New additions, changes and version history.
🎛 API References	The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters	Contains a full list of current augmenters in augmenty.
😎 Demo	A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports	GitHub Issue Tracker
🎁 Feature Requests & Ideas	GitHub Issue Tracker
👩‍💻 Usage Questions	GitHub Discussions
🗯 General Discussion	GitHub Discussions
🍒 Adding an Augmenter	Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System	Status
Ubuntu/Linux (Latest)
MacOS (Catalina)
Windows (Latest)

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.

Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.

🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}

Comments

Use of augmenty with spacy config files for training

I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

Which page or section is this issue related to?

https://spacy.io/usage/training#data-augmentation-custom

https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation
documentation

opened by Giles-Billenness 3

Added sententence_subset.v1 augmenter following #48

Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

import augmenty
import spacy
nlp = spacy.load("en_core_web_sm")

# four sentences
text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
for obtaining higher performance on limited data. You can also use it to see how
robust your model is to changes. It will sample subset of the paragraf."""
docs = nlp(text)

augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)

list(augmenty.texts(texts, augmenter, nlp))

Missing:

[ ] Add tests
[ ] Add documentation

opened by KennethEnevoldsen 3

Paragraf subset augmenter

A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

Example - sentence level

import augmenty
import spacy
nlp = spacy.load("en_core_web_sm")

# four sentences
texts = [
    "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
    "for obtaining higher performance on limited data. You can also use it to see how "
    "robust your model is to changes. It will sample subset of the paragraf.",
]
docs = nlp(texts)

augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)

list(augmenty.texts(texts, augmenter, nlp))

Example outputs:

The first section:

Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
for obtaining higher performance on limited data.

The middle section:

Augmentation is a wonderful tool for obtaining higher performance on limited data. 
You can also use it to see how robust your model is to changes.

The middle section:

You can also use it to see how robust your model is to changes. It will sample subset 
of the paragraf.

Additional thoughts:

Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

additional augmenter

opened by martincjespersen 3

:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26
Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

Commits

8856b4a Elapsed time in minutes (#63)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 2
:arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0
Updates the requirements on pydantic to permit the latest version.

Release notes

Sourced from pydantic's releases.

v1.9.0 (2021-12-31)

Thank you to pydantic's sponsors: @sthagen, @timdrijvers, @toinbis, @koxudaxi, @ginomempin, @primer-io, @and-semakin, @westonsteimel, @reillysiemens, @es3n1n, @jokull, @JonasKs, @Rehket, @corleyma, @daddycocoaman, @hardbyte, @datarootsio, @jodal, @aminalaee, @rafsaf, @jqueguiner, @chdsbd, @kevinalh, @Mazyod, @grillazz, @JonasKs, @simw, @leynier, @xfenix for their kind support.

Highlights

add python 3.10 support, #2885 by @PrettyWood

Discriminated unions, #619 by @PrettyWood

Config.smart_union for better union logic, #2092 by @PrettyWood

Binaries for Macos M1 CPUs, #3498 by @samuelcolvin

Complex types can be set via nested environment variables, e.g. foo___bar, #3159 by @Air-Mark

add a dark mode to pydantic documentation, #2913 by @gbdlin

Add support for autocomplete in VS Code via __dataclass_transform__, #2721 by @tiangolo

Add "exclude" as a field parameter so that it can be configured using model config, #660 by @daviskirk

v1.9.0 (2021-12-31) Changes

Apply update_forward_refs to Config.json_encodes prevent name clashes in types defined via strings, #3583 by @samuelcolvin

Extend pydantic's mypy plugin to support mypy versions 0.910, 0.920, 0.921 & 0.930, #3573 & #3594 by @PrettyWood, @christianbundy, @samuelcolvin

v1.9.0a2 (2021-12-24) Changes

support generic models with discriminated union, #3551 by @PrettyWood

keep old behaviour of json() by default, #3542 by @PrettyWood

Removed typing-only __root__ attribute from BaseModel, #3540 by @layday

Build Python 3.10 wheels, #3539 by @mbachry

Fix display of extra fields with model __repr__, #3234 by @cocolman

models copied via Config.copy_on_model_validation always have all fields, #3201 by @PrettyWood

nested ORM from nested dictionaries, #3182 by @PrettyWood

fix link to discriminated union section by @PrettyWood

v1.9.0a1 (2021-12-18) Changes

Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @tiangolo

Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @samuelcolvin

Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @tharradine

When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @jasujm

Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @uriyyo

Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @BvB93

Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @michaelrios28

Add AmqpDsn class, #3254 by @kludex

Always use Enum value as default in generated JSON schema, #3190 by @joaommartins

Add support for Mypy 0.920, #3175 by @christianbundy

validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @PrettyWood

... (truncated)

Changelog

Sourced from pydantic's changelog.

v1.9.0 (2021-12-31)

Thank you to pydantic's sponsors: @sthagen, @timdrijvers, @toinbis, @koxudaxi, @ginomempin, @primer-io, @and-semakin, @westonsteimel, @reillysiemens, @es3n1n, @jokull, @JonasKs, @Rehket, @corleyma, @daddycocoaman, @hardbyte, @datarootsio, @jodal, @aminalaee, @rafsaf, @jqueguiner, @chdsbd, @kevinalh, @Mazyod, @grillazz, @JonasKs, @simw, @leynier, @xfenix for their kind support.

Highlights

add python 3.10 support, #2885 by @PrettyWood

Discriminated unions, #619 by @PrettyWood

Config.smart_union for better union logic, #2092 by @PrettyWood

Binaries for Macos M1 CPUs, #3498 by @samuelcolvin

Complex types can be set via nested environment variables, e.g. foo___bar, #3159 by @Air-Mark

add a dark mode to pydantic documentation, #2913 by @gbdlin

Add support for autocomplete in VS Code via __dataclass_transform__, #2721 by @tiangolo

Add "exclude" as a field parameter so that it can be configured using model config, #660 by @daviskirk

v1.9.0 (2021-12-31) Changes

Apply update_forward_refs to Config.json_encodes prevent name clashes in types defined via strings, #3583 by @samuelcolvin

Extend pydantic's mypy plugin to support mypy versions 0.910, 0.920, 0.921 & 0.930, #3573 & #3594 by @PrettyWood, @christianbundy, @samuelcolvin

v1.9.0a2 (2021-12-24) Changes

support generic models with discriminated union, #3551 by @PrettyWood

keep old behaviour of json() by default, #3542 by @PrettyWood

Removed typing-only __root__ attribute from BaseModel, #3540 by @layday

Build Python 3.10 wheels, #3539 by @mbachry

Fix display of extra fields with model __repr__, #3234 by @cocolman

models copied via Config.copy_on_model_validation always have all fields, #3201 by @PrettyWood

nested ORM from nested dictionaries, #3182 by @PrettyWood

fix link to discriminated union section by @PrettyWood

v1.9.0a1 (2021-12-18) Changes

Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @tiangolo

Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @samuelcolvin

Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @tharradine

When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @jasujm

Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @uriyyo

Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @BvB93

Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @michaelrios28

Add AmqpDsn class, #3254 by @kludex

Always use Enum value as default in generated JSON schema, #3190 by @joaommartins

Add support for Mypy 0.920, #3175 by @christianbundy

... (truncated)

Commits

fbf8002 prepare for v1.9.0 release, extra change

5406423 prepare for v1.9.0 release

87da9ac apply update_forward_refs to json_encoders (#3595)

6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)

8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)

2d3d266 remove failing release step

ef46789 add step to upload pypi files to release

5d6f48c prepare for v1.9.0a2

e882277 fix: support generic models with discriminated union (#3551)

edad0db fix: keep old behaviour of json() by default (#3542)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40
Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

Release notes

Sourced from MishaKav/pytest-coverage-comment's releases.

Support GitHub enterprise urls

What's Changed

Minor readme improvements by @AlexanderLanin in MishaKav/pytest-coverage-comment#100

Support GitHub enterprise urls by @jbcumming in MishaKav/pytest-coverage-comment#101

New Contributors

@AlexanderLanin made their first contribution in MishaKav/pytest-coverage-comment#100

Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

Changelog

Sourced from MishaKav/pytest-coverage-comment's changelog.

Pytest Coverage Comment 1.1.40

Release Date: 2022-12-03

Changes

Support for url for github enterprise repositories, thanks to @jbcumming for contribution

Minor readme improvements, thanks to @AlexanderLanin for contribution

Commits

b2577f1 Support GitHub enterprise urls (#102)

072a74d Minor readme improvements (#100)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31
Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

Release notes

Sourced from MishaKav/pytest-coverage-comment's releases.

Remove link on badge

add option to remove link on badge

Commits

7c2f420 Remove link on badge (#76)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0
Updates the requirements on streamlit to permit the latest version.

Commits

b6429b6 Fix linting

f4b2051 Up version to 1.11.1

80d9979 Ignore component requests outside of the component root

4a04eef Replace legacy app URLs in docs with custom subdomains (#4959)

27c29ac Up version to 1.11.0

03babac Test that GitRepo can handle import failures (#4942)

4c39606 Fix table overflow styling (#4934)

26de600 Fix issue with wrongly applied colors with Pandas styler (#4940)

ad4547f Fix widgets overwrites from short to long-hand props (#4935)

3809637 Add gap param to st.columns (#4887)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 1
:arrow_up: Bump actions/setup-python from 3 to 4.1.0
Bumps actions/setup-python from 3 to 4.1.0.

Release notes

Sourced from actions/setup-python's releases.

v4.1.0

In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

Update-environment input

- name: setup-python 3.9 uses: actions/[email protected] with: python-version: 3.9 update-environment: false

Besides, we added such changes as:

Allow python-version-file to be a relative path: actions/setup-python#431

Added new environment variables for Cmake: actions/setup-python#440

Updated error message for resolveVersion: actions/setup-python#450

Assign default value of AGENT_TOOLSDIRECTORY if not set: actions/setup-python#394

v4.0.0

What's Changed

Support for python-version-file input: #336

Example of usage:

- uses: actions/[email protected] with: python-version-file: '.python-version' # Read python version from a file - run: python my_script.py

There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

Use pypyX.Y for PyPy python-version input: #349

Example of usage:

- uses: actions/[email protected] with: python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility - run: python my_script.py

RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

Bugfix: create missing pypyX.Y symlinks: #347

PKG_CONFIG_PATH environment variable: #400

Added python-path output: #405

... (truncated)

Commits

c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)

0ad0f6a Merge pull request #452 from mayeut/fix-env

f0bcf8b Merge pull request #456 from akx/patch-1

af97157 doc: Add multiple wildcards example to readme

364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default

782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix

2c9de4e Remove duplicate code introduced in #440

412091c Fix tests for update-environment==false

78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version

96f494e trigger checks

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Bump actions/setup-python from 3 to 4
Bumps actions/setup-python from 3 to 4.

Release notes

Sourced from actions/setup-python's releases.

v4.0.0

What's Changed

Support for python-version-file input: #336

Example of usage:

- uses: actions/[email protected] with: python-version-file: '.python-version' # Read python version from a file - run: python my_script.py

There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

Use pypyX.Y for PyPy python-version input: #349

Example of usage:

- uses: actions/[email protected] with: python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility - run: python my_script.py

RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

Bugfix: create missing pypyX.Y symlinks: #347

PKG_CONFIG_PATH environment variable: #400

Added python-path output: #405 python-path output contains Python executable path.

Updated zeit/ncc to vercel/ncc package: #393

Bugfix: fixed output for prerelease version of poetry: #409

Made pythonLocation environment variable consistent for Python and PyPy: #418

Bugfix for 3.x-dev syntax: #417

Other improvements: #318 #396 #384 #387 #388

Update actions/cache version to 2.0.2

In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

Add "cache-hit" output and fix "python-version" output for PyPy

This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

... (truncated)

Commits

d09bd5e fix: 3.x-dev can install a 3.y version (#417)

f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)

53e1529 add support for python-version-file (#336)

3f82819 Fix output for prerelease version of poetry (#409)

397252c Update zeit/ncc to vercel/ncc (#393)

de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test

22c6af9 Change PyPy version to rebuild cache

081a3cf Merge pull request #405 from mayeut/interpreter-path

ff70656 feature: add a python-path output

fff15a2 Use pypyX.Y for PyPy python-version input (#349)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0
Updates the requirements on streamlit to permit the latest version.

Commits

ecd5428 Up version to 1.9.2

c02c7c1 Make typing-extensions and unconditional dependency (#4697)

5c065b2 Strip surrounding quotes on RC version

1c7a366 Fix shell quoting

a7bc838 Subshell with no output returns "null" not ""

88ebeae Up version to 1.9.1

03958a1 Pin lower version of protobuf (#4783)

f9cef45 Release process fixes (#4753)

c4bea5d Release 1.9.0 (#4673)

27ff5c2 Add more type annotations (#4657)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 1
Sample fake entities for entity augmenter using Faker package

Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.
enhancement help wanted

opened by martincjespersen 1

implement an oversampling function

Augmentation can be used to oversample a category.

Imagined usage would look something like this:

aug = augmenty.load(...)

def is_positive(example):
    """return true if the example contains an entity"""
    if example.y.cats["positive"] == 1:
        return True
    return False

upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)

enhancement

opened by KennethEnevoldsen 0

Back translation augmentation

Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.
additional augmenter

opened by martincjespersen 1
List of potentially new augmenters
The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

A variation of existing augmenters:

[ ] #9

[ ] #8

[ ] Common Danish spelling errors lookups

[ ] Danish synonym augmenters using lookups

[ ] Close Homophones Swap

[ ] Geonames augmentation

[ ] leet augmentation

[ ] Multilingual Lexicon Perturbation

[ ] Character duplication augmentation

[ ] American british augmentation

New augmenters

[x] #5

[x] #6

[ ] Butter finger augmentation

[ ] Date format

[ ] Contractions and Expansions Perturbation

[ ] gender swap

[ ] Appended word soup, Adds a random sequence as to the end of sentence

[ ] sentence shuffle augmenter

[ ] conditional token replace augmenter

[ ] replace numerical

[ ] Emojis --> Emoticons augmentation

[ ] conditional string replace

[ ] german ss -> ß

[ ] punct augmentation

[ ] Causal Negation & Strengthen

[ ] Emojify augmentation

[ ] tense augmentation

[ ] OCR Augmentation

[ ] Antonym augmentation

[ ] tfidf augmentation

[x] #25

[ ] sentence swap

Batch augmenters

[ ] Backtranslation e.g. based on this

[ ] Neural paraphraser

[ ] MLM augmentation

[ ] Summarize article by abstractive summarization augmentation

A combination of existing augmenters

[ ] EDA augmenter following the EDA paper

additional augmenter
opened by KennethEnevoldsen 0

Releases(v1.0.1)

v1.0.1(Jun 21, 2022)
Version

What's Changed

Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50

Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

Documentation updates

added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85

Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

New Contributors

@dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46

@koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51

@martincjespersen

Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1
Source code(tar.gz)
Source code(zip)
v.0.0.12(Feb 7, 2022)
0.0.12 (03/08/21)

Many bugfixes

Added a few more augmenters

Notable updates to the documentation of the package

0.0.1 (03/08/21)

First version of augmenty launches 🎉

with more than 15 highly customizable augmenters,

A high-quality code-base (coverage of 96% and a codefactor A),

and utilities for easy application of augmenters to strings and spaCy Docs.

Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12
Source code(tar.gz)
Source code(zip)

Owner

Kenneth Enevoldsen

Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre

GitHub Repository

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

39 Nov 15, 2022

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

blurr A library that integrates huggingface transformers with version 2 of the fastai framework Install You can now pip install blurr via pip install

253 Dec 31, 2022

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 07, 2022

Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

20 Nov 18, 2022

Flaxformer: transformer architectures in JAX/Flax

Flaxformer: transformer architectures in JAX/Flax Flaxformer is a transformer library for primarily NLP and multimodal research at Google. It is used

114 Dec 29, 2022

Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

1.2k Jan 06, 2023

Pre-Training with Whole Word Masking for Chinese BERT

7.7k Dec 31, 2022

GPT-3 command line interaction

Writer_unblock Straight-forward command line interfacing with GPT-3. Finding yourself stuck at a conceptual stage? Spinning your wheels needlessly on

6 Feb 10, 2022

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022

This is Assignment1 code for the Web Data Processing System.

This is a Python program to Entity Linking by processing WARC files. We recognize entities from web pages and link them to a Knowledge Base(Wikidata).

3 Dec 04, 2022

OpenChat: Opensource chatting framework for generative models

OpenChat is opensource chatting framework for generative models.

427 Jan 06, 2023

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

9 Jul 17, 2021

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

7 Mar 12, 2022

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据，将清华新闻数据、搜狗新闻数据等新闻数据集，以及开源的一些摘要数据进行整理清洗，构建一个较完善的中文摘要数据集。数据集清洗时，仅进行了简单地规则清洗。

785 Dec 29, 2022

Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

7 Sep 20, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

19 Nov 17, 2022

Exploring dimension-reduced embeddings

sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program

91 Nov 29, 2022

Augmenty is an augmentation library based on spaCy for augmenting texts.

Related tags

Overview

Augmenty: The cherry on top of your NLP pipeline

🔧 Installation

🍒 Simple Example

📖 Documentation

💬 Where to ask questions

🤔 FAQ

🎓 Citing this work

Comments

Which page or section is this issue related to?

Example - sentence level

Example outputs:

Additional thoughts:

v1.9.0 (2021-12-31)

Highlights

v1.9.0 (2021-12-31) Changes

v1.9.0a2 (2021-12-24) Changes

v1.9.0a1 (2021-12-18) Changes

v1.9.0 (2021-12-31)

Highlights

v1.9.0 (2021-12-31) Changes

v1.9.0a2 (2021-12-24) Changes

v1.9.0a1 (2021-12-18) Changes

Support GitHub enterprise urls

What's Changed

New Contributors

Changes

Remove link on badge

v4.1.0

Update-environment input

v4.0.0

What's Changed

v4.0.0

What's Changed

Update actions/cache version to 2.0.2

Add "cache-hit" output and fix "python-version" output for PyPy

A variation of existing augmenters:

New augmenters

Batch augmenters

A combination of existing augmenters

Releases(v1.0.1)

v1.0.1(Jun 21, 2022)

What's Changed

Documentation updates

New Contributors

v.0.0.12(Feb 7, 2022)

Owner

Kenneth Enevoldsen

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

Every Google, Azure & IBM text to speech voice for free

Rootski - Full codebase for rootski.io (without the data)

Flaxformer: transformer architectures in JAX/Flax

Community and sentiment analysis based on tweets

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Pre-Training with Whole Word Masking for Chinese BERT

GPT-3 command line interaction

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

This is Assignment1 code for the Web Data Processing System.

OpenChat: Opensource chatting framework for generative models

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Pipeline for chemical image-to-text competition

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

⚖️ A Statutory Article Retrieval Dataset in French.

Exploring dimension-reduced embeddings