NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Overview

NumPy String-Indexed

PyPI Version Python Versions

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels, rather than conventional zero-indexing. When a friendly matrix object is initialized, labels are assigned to each array index and each dimension, and they stick to the array after NumPy-style operations such as transposing, concatenating, and aggregating. This prevents Python programmers from having to keep track mentally of what each axis and each index represents, instead making each reference to the array in code naturally self-documenting.

NumPy String-Indexed is especially useful for applications like machine learning, scientific computing, and data science, where there is heavy use of multidimensional arrays.

The friendly matrix object is implemented as a lightweight wrapper around a NumPy ndarray. It's easy to add to a new or existing project to make it easier to maintain code, and has negligible memory and performance overhead compared to the size of array (O(x + y + z) vs. O(xyz)).

Basic functionality

It's recommended to import NumPy String-Indexed idiomatically as fm:

import friendly_matrix as fm

Labels are provided during object construction and can optionally be used in place of numerical indices for slicing and indexing.

The example below shows how to construct a friendly matrix containing an image with three color channels:

image = fm.ndarray(
	numpy_ndarray_image,  # np.ndarray with shape (3, 100, 100)
	dim_names=['color_channel', 'top_to_bottom', 'left_to_right'],
	color_channel=['R', 'G', 'B'])

The matrix can then be sliced like this:

# friendly matrix with shape (100, 100)
r_channel = image(color_channel='R')

# an integer
g_top_left_pixel_value = image('G', 0, 0)

# friendly matrix with shape (2, 100, 50)
br_channel_left_half = image(
	color_channel=('B', 'R'),
	left_to_right=range(image.dim_length('left_to_right') // 2))

Documentation

Full documentation can be found here. Below is a brief overview of Friendly Matrix functionality.

Matrix operations

Friendly matrix objects can be operated on just like NumPy ndarrays with minimal overhead. The package contains separate implementations of most of the relevant NumPy ndarray operations, taking advantage of labels. For example:

side_by_side = fm.concatenate((image1, image2), axis='left_to_right')

An optimized alternative is to perform label-less operations, by adding "_A" (for "array") to the operation name:

side_by_side_arr = fm.concatenate_A((image1, image2), axis='left_to_right')

If it becomes important to optimize within a particular scope, it's recommended to shed labels before operating:

for image in huge_list:
	image_processor(image.A)

Computing matrices

A friendly matrix is an ideal structure for storing and retrieving the results of computations over multiple variables. The compute_ndarray() function executes computations over all values of the input arrays and stores them in a new Friendly Matrix ndarray instance in a single step:

'''Collect samples from a variety of normal distributions'''

import numpy as np

n_samples_list = [1, 10, 100, 1000]
mean_list = list(range(-21, 21))
var_list = [1E1, 1E0, 1E-1, 1E-2, 1E-3]

results = fm.compute_ndarray(
	['# Samples', 'Mean', 'Variance']
	n_samples_list,
	mean_list,
	var_list,
	normal_sampling_function,
	dtype=np.float32)

# friendly matrices can be sliced using dicts
print(results({
	'# Samples': 100,
	'Mean': 0,
	'Variance': 1,
}))

Formatting matrices

The formatted() function displays a friendly matrix as a nested list. This is useful for displaying the labels and values of smaller matrices or slice results:

mean_0_results = results({
	'# Samples': (1, 1000),
	'Mean': 0,
	'Variance': (10, 1, 0.1),
})
formatted = fm.formatted(
	mean_0_results,
	formatter=lambda n: round(n, 1))

print(formatted)

'''
Example output:

# Samples = 1:
	Variance = 10:
		2.2
	Variance = 1:
		-0.9
	Variance = 0.1:
		0.1
# Samples = 1000:
	Variance = 10:
		-0.2
	Variance = 1:
		-0.0
	Variance = 0.1:
		0.0
'''

Installation

pip install numpy-string-indexed

NumPy String-Indexed is listed in PyPI and can be installed with pip.

Prerequisites: NumPy String-Indexed 0.0.1 requires Python 3 and a compatible installation of the NumPy Python package.

Discussion and support

NumPy String-Indexed is available under the MIT License.

Owner
Aitan Grossman
Aitan Grossman
Machine learning classifiers to predict American Sign Language .

ASL-Classifiers American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United Stat

Tarek idrees 0 Feb 08, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

SilkyArcTool English Dual languaged (rus+eng) GUI tool for packing and unpacking archives of Silky Engine. It is not the same arc as used in Ai6WIN. I

Tester 5 Sep 15, 2022
Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

UlionTse 907 Dec 27, 2022
無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン

VOICEVOX ENGINE VOICEVOXの音声合成エンジン。 実態は HTTP サーバーなので、リクエストを送信すればテキスト音声合成できます。 API ドキュメント VOICEVOX ソフトウェアを起動した状態で、ブラウザから

Hiroshiba 3 Jul 05, 2022
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

Zhiyu Chen 114 Dec 29, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Reconstruct handwritten characters from brains using GANs Example code for the paper "Generative adversarial networks for reconstructing natural image

K. Seeliger 2 May 17, 2022
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

Pocket Galaxy 아주 간단한 개인용, 혹은 내부용 툴을 만들어야하는데 이왕이면 웹이 편하죠? 그럴때를 위해 만들어둔 django와 vue(vuetify)로 이뤄진 boilerplate 입니다. 각 폴더에 있는 설명서대로 실행을 시키면 일단 당장 뭔가가 돌아갑니

Jamie J. Seol 16 Dec 03, 2021
基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成 说明 config.py里面含有训练、测试、预测的参数,更改后运行: python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

西西嘛呦 3 May 26, 2022
BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022
Toward Model Interpretability in Medical NLP

Toward Model Interpretability in Medical NLP LING380: Topics in Computational Linguistics Final Project James Cross ( 1 Mar 04, 2022

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

Happy N. Monday 2 Mar 17, 2022
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。

Kuang Dada 6 Nov 08, 2022
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 04, 2023