Python3 to Crystal Translation using Python AST Walker

Related tags

Text Data & NLPpy2cr
Overview

py2cr.py

A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.python.org/3/library/ast.html) for more information.

Status

Currently more than 80% of the relevant tests are passing. See more information below.

Installation

Execute the following:

pip install py2cr

or

git clone git://github.com/nanobowers/py2cr.git

Versions

  • Python 3.6 .. 3.9
  • Crystal 1.1+

Dependencies

Python

pip install pyyaml

# Probably not needed for much longer since py2 support is going to be removed.
pip install six 

# Probably not really needed since there is no crystal equivalent
pip install numpy

Crystal

currently there are no external dependencies

Methodology

In addition to walking and writing the AST tree and writing a Crystal syntax output, this tool either:

  • Monkey-patches some common Crystal stdlib Structs/Classes in order to emulate the Python equivalent functionality.
  • Calls equivalent Crystal methods to the Python equivalent
  • Calls wrapped Crystal methods that provide Python equivalent functionality

Usage

Generally, py2cr.py somefile.py > somefile.cr

There is a Crystal shim/wrapper library in src/py2cr (and linked into lib/py2cr) that is also referenced in the generated script. You may need to copy that as needed, though eventually it may be appropriate to convert it to a shard if that is more appropriate.

Example

TODO

Tests

$ ./run_tests.py

Will run all tests that are supposed to work. If any test fails, its a bug. (Currently there are a lot of failing tests!!)

$ ./run_tests.py -a

Will run all tests including those that are known to fail (currently). It should be understandable from the output.

$ ./run_tests.py basic

Will run all tests matching basic. Useful because running the entire test-suite can take a while.

$ ./run_tests.py -x or $ ./run_tests.py --no-error

Will run tests but ignore if an error is raised by the test. This is not affecting the error generated by the test files in the tests directory.

For additional information on flags, run:

./run_tests.py -h

Writing new tests

Adding tests for most new or existing functionality involves adding additional python files at tests/ .py .

The test-runner scripts will automatically run py2cr to produce a Crystal script, then run both the Python and Crystal scripts, then compare stdout/stderr and check return codes.

For special test-cases, it is possible to provide a configuration YAML file on a per test basis named tests/ / .config.yaml which overrides defaults for testing. The following keys/values are supported:

min_python_version: [int, int] # minimum major/minor version
max_python_version: [int, int] # maximum major/minor version
expected_exit_status: int      # exit status for py/cr test script
argument_list: [str, ... str]  # list of strings as extra args for argv

Typing

Some amount of typing support in Python is translated to Crystal. Completely untyped Python code in many cases will not be translatable to compilable Crystal. Rudimentary for python Optional and Union should convert appropriately to Crystal typing.

Some inference of bare list/dict types can now convert to [] of X and {} of X, however set and tuple may not work properly.

Status

This is incomplete and many of the tests brought forward from py2rb do not pass. Some of them may never pass as-is due to significant language / compilation differences (even moreso than Python vs. Ruby)

To some extent, it will always be incomplete. The goal is to cover common cases and reduce the additional work to minimum-viable-program.

Limitations

  • Many Python run-time exceptions are not translatable into Crystal as these issues manifest in Crystal as compile-time errors.
  • A significant portion of python code is untyped and may not translate properly in places where Crystal demands type information.
    • e.g. Crystal Lambda function parameters require typing and this is very uncommon in Python, though may be possible with Callable[] on the python side.
  • Python importing is significantly different than Crystal and thus may not ever map well.
  • Numpy and Unittest which are common in Python don't have equivalents in Crystal. With some significant additional work, converting tests into Spec format may be possible via https://github.com/jaredbeck/minitest_to_rspec as a guide

To-do

  • Remove python2/six dependencies to reduce clutter. Py2 has been end-of-lifed for a while now.
  • Remove numpy dependencies unless/until a suitable target for Crystal can be identified
  • Add additional Crystal shim methods to translate common python3 stdlib methods. Consider a mode that just maps to a close Crystal method rather than using a shim-method to reduce the python-ness.
  • Refactor the code-base. Most of it is in the __init__.py
  • Add additional unit-tests
  • Multi-thread the test-suite so it can run faster.

Contribute

Free to submit an issue. This is very much a work in progress, contributions or constructive feedback is welcome.

If you'd like to hack on py2cr, start by forking the repo on GitHub:

https://github.com/nanobowers/py2cr

Contributing

The best way to get your changes merged back into core is as follows:

  1. Fork it (https://github.com/nanobowers/py2cr/fork)
  2. Create a thoughtfully named topic branch to contain your change (git checkout -b my-new-feature)
  3. Hack away
  4. Add tests and make sure everything still passes by running crystal spec
  5. If you are adding new functionality, document it in the README
  6. If necessary, rebase your commits into logical chunks, without errors
  7. Commit your changes (git commit -am 'Add some feature')
  8. Push to the branch (git push origin my-new-feature)
  9. Create a new Pull Request

License

MIT, see the LICENSE file for exact details.

Pre-Training with Whole Word Masking for Chinese BERT

Pre-Training with Whole Word Masking for Chinese BERT

Yiming Cui 7.7k Dec 31, 2022
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

Jeong Ukjae 13 Sep 02, 2022
Simple text to phones converter for multiple languages

Phonemizer -- foʊnmaɪzɚ The phonemizer allows simple phonemization of words and texts in many languages. Provides both the phonemize command-line tool

CoML 762 Dec 29, 2022
Concept Modeling: Topic Modeling on Images and Text

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Maarten Grootendorst 120 Dec 27, 2022
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
Meta learning algorithms to train cross-lingual NLI (multi-task) models

Meta learning algorithms to train cross-lingual NLI (multi-task) models

M.Hassan Mojab 4 Nov 20, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 230 Nov 16, 2022
LSTM model - IMDB review sentiment analysis

NLP - Movie review sentiment analysis The colab notebook contains the code for building a LSTM Recurrent Neural Network that gives 87-88% accuracy on

Sundeep Bhimireddy 1 Jan 29, 2022
A natural language processing model for sequential sentence classification in medical abstracts.

NLP PubMed Medical Research Paper Abstract (Randomized Controlled Trial) A natural language processing model for sequential sentence classification in

Hemanth Chandran 1 Jan 17, 2022
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 03, 2023
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 07, 2023
WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023
🤖 Basic Financial Chatbot with handoff ability built with Rasa

Financial Services Example Bot This is an example chatbot demonstrating how to build AI assistants for financial services and banking with Rasa. It in

Mohammad Javad Hossieni 4 Aug 10, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
硕士期间自学的NLP子任务,供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务,供学习参考 任务1 :短文本分类 (1).数据集:THUCNews中文文本数据集(10分类) (2).模型:BERT+FC/LSTM,Pytorch实现 (3).使用方法: 预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022