This repository is home to the Optimus data transformation plugins for various data processing needs.

Overview

Transformers

test workflow build workflow

Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus.

To install plugins via homebrew

brew tap odpf/taps
brew install optimus-plugins-odpf

To install plugins via shell

curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
Comments
  • Fix: fix ignoreupstream helper for big query view

    Fix: fix ignoreupstream helper for big query view

    Hello, Currently, for any query, we try to find the dependancy and ignoredependancy with FindDependenciesWithRegex and then we again pull the Refereced table with big query dry run.

    If query contains view which is marked with /* @ignoreupstream */ helper, then ignoredependancy will contain the view name but not the table referenced by view.

    The change here is to revise ignoredependancy list with table referenced by view.

    I kept the loop execution in sequential manner, please let me know if should add concurrency here

    enhancement 
    opened by SumitAgrawal03071989 2
  • @ignoreupstream ineffective on big query view

    @ignoreupstream ineffective on big query view

    We have a query referencing to table as well as view. select * from proj.dataset.table t1 left join proj.dataset.view v1 on t1.date = v1.date and t1.id = v1.id

    • now if we apply @ignoreupstream helper on table proj.dataset.table then it correctly ignores to create upstream dependancy for this table.
    • But if we apply @ignoreupstream helper on view proj.dataset.view ( note the view query refers to 2 more tables ) then it does not ignore view or table referenced by view.
    opened by SumitAgrawal03071989 2
  • feat : migrate plugins for the inti-container changes in optimus

    feat : migrate plugins for the inti-container changes in optimus

    As per Optimus PR, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes odpf/optimus#405

    opened by smarch-int 1
  • monthly job didn't run for the last day of month

    monthly job didn't run for the last day of month

    Hi team,

    I have a bq2bq job with window configuration

      window:
        size: 720h
        offset: -48h
        truncate_to: M
    

    I expect to have transformation for date 01 to last day of the month, e.g on April, I expect got transformation from date 01 - 30. but currently only got transformation from date 01 - 29

    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-26 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-27 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-28 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-29 00:00:00+00:00
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: start transformation job
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: sql transformation query:
    

    after checking, I suspect the logic may related to this line, where the last day generated by windows class not included as the transformation partition.

    https://github.com/odpf/transformers/blob/ea1de4f0de3d17d9be7ccefb1e2f3beab1a685f1/task/bq2bq/executor/bumblebee/transformation.py#L393

    please kindly check it, and release the fix. thank you

    opened by novanxyz 1
  • feat: add support for secret env vars

    feat: add support for secret env vars

    With this we are adding support for using secrets in macros, we do not want to print the env vars in the logs, so exporting them as a separate file from optimus.

    Plugins can export this extra file to get env vars.

    opened by sbchaos 1
  • feat : remove wrapper image and use bq2bq executor image in plugin

    feat : remove wrapper image and use bq2bq executor image in plugin

    As per https://github.com/odpf/optimus/pull/425, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes https://github.com/odpf/optimus/issues/405

    opened by smarch-int 0
  • Generate Dependencies is using the dry run apis which is bound to fail with macros

    Generate Dependencies is using the dry run apis which is bound to fail with macros

    The most intuitive way is to parse the query and hit the metadata apis instead of going through the dry run which should be definitly costly then the metadata fetch apis.

    enhancement performance 
    opened by sravankorumilli 0
  • BQ2BQ Replace load dispostion doesn't handle aggregations

    BQ2BQ Replace load dispostion doesn't handle aggregations

    Options

    1. Add an extra option in Replace load dispostition to take input from users to replace a specific or range of partitions using literals / all , dstart, dend. Default is all : all represents splitting of query to multiple partitions from dstart to dend.
    2. Use a new Load Disposition, to replace to a single destination partition which is window start
    bug 
    opened by sravankorumilli 0
Releases(v0.2.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
Weird Sort-and-Compress Thing

Weird Sort-and-Compress Thing A weird integer sorting + compression algorithm inspired by a conversation with Luthingx (it probably already exists by

Douglas 1 Jan 03, 2022
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 02, 2023
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Eliyar Eziz 2.3k Dec 29, 2022
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Phone Level Mixture Density Network for TTS This repo contains pytorch implementation of paper Rich Prosody Diversity Modelling with Phone-level Mixtu

Rishikesh (ऋषिकेश) 42 Dec 13, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

Chua Chin Hon 30 Jul 13, 2022
Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022
A method to generate speech across multiple speakers

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Facebook Archive 873 Dec 15, 2022
GPT-3: Language Models are Few-Shot Learners

GPT-3: Language Models are Few-Shot Learners arXiv link Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-trainin

OpenAI 12.5k Jan 05, 2023
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my the

Corentin Jemine 38.5k Jan 03, 2023
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

2 Dec 29, 2022
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

Tianxiang Sun 41 Dec 30, 2022
Awesome-NLP-Research (ANLP)

Awesome-NLP-Research (ANLP)

Language, Information, and Learning at Yale 72 Dec 19, 2022
**NSFW** A chatbot based on GPT2-chitchat

DangBot -- 好怪哦,再来一句 卡群怪话bot,powered by GPT2 for Chinese chitchat Training Example: python train.py --lr 5e-2 --epochs 30 --max_len 300 --batch_size 8

Tommy Yang 11 Jul 21, 2022
A Telegram bot to add notes to Flomo.

flomo bot 使用 Telegram 机器人发送笔记到你的 Flomo. 你需要有一台可访问 Telegram 的服务器。 Steps @BotFather 新建机器人,获取 token Flomo 官网获取 API,链接 https://flomoapp.com/mine?source=in

Zhen 44 Dec 30, 2022
Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

AdapterHub 1.2k Jan 09, 2023