当前位置:网站首页>JSON input of Chapter 14 of kettle paoding jieniu
JSON input of Chapter 14 of kettle paoding jieniu
2022-04-23 09:23:00 【Feige big data】
introduction
In the last article , We introduced :XML and XPath Those things , Then I explained XML Various detailed settings of the file input component , Finally, the actual combat demonstrates how to operate it to read the data on the disk xml file .
In this article , Let's go on to introduce :kettle Medium JSON Input components (JSON inpu).
To learn to understand JSON Input components , We're going to expand our conversation JSON and JSONPath What happened .
JSON brief introduction
JSON The full name is "JavaScript Object Notation", intend JavaScript Object notation , It's a text-based , Language independent lightweight data exchange format .XML It's also a data exchange format , Why there is no choice XML Well ?
because XML Although it can be used as a cross platform data exchange format , But in JS(JavaScript Abbreviation ) In dealing with XML Very inconvenient , meanwhile XML More tags than data , Increased traffic generated by switching , and JSON There are no additional marks , stay JS Can be treated as an object , So I prefer JSON To exchange data .
JSON3 Data structures
Key value pair
The key value pair is JSON The most basic data structure in
{
"Name": " Big Feige "
}
In the example above, properties "Name" Is a string enclosed in a pair of double quotes . And its value " Big Feige " In this example, it is also a string , Of course, it can also be other types .
object
One JSON An object is a collection that contains a set of unordered key value pairs .
{
"person" : {
"age" : "35",
"sex" : " male ",
"name" : " Big Feige ",
"weight" : "75kg",
"height" : "170cm"
}
}
In the example above person The object contains 5 Attributes , Between them , Segmentation .
Array
JSON of use [] To represent array elements
{
"people" : [
{ "firstName": " Big ", "lastName": " Feige ", "age": 35 },
{ "firstName": " Big ", "lastName": " data ", "age": 32 }
]
}
JSON6 Type of data
string type
character string , It must be enclosed in double quotation marks .
number type
The number , And JavaScript Of number Agreement , Integers ( Don't use decimal or exponential counting ) At most 15 position , The maximum number of decimal places is 17.
object type
JavaScript The object form of ,{ key:value} Representation , Can be nested .
array type
Array ,JavaScript Of Array Representation [value], Can be nested .
Boolean type
true/false,JavaScript Of boolean type .
null type
Null value ,JavaScript Of null.
JSON Advantages and disadvantages
advantage
a、 The data format is relatively simple , Easy to read and write , The formats are all compressed , Small bandwidth ;
b、 Easy to parse , client JavaScript Can pass eval() Conduct JSON Data reading ;
c、 Support for multiple languages , Easy for server-side parsing ;
d、 Use directly for server-side code , Simplify the amount of code development on the server and client side , And easy to maintain .
shortcoming
a、 No, XML Good format Promotion , No, XML So versatility ;
b、JSON The format is currently Web Service in , Promotion is still in its infancy .
JSON Related website
json Chinese official website
http://www.json.org/json-zh.html
json Official website
http://www.json.org/
JSONPath Those things
JsonPath It's a simple way to extract a given JSON Part of the document . JsonPath There are many programming languages , Such as Javascript,Python and PHP,Java.
JsonPath Provided json Parsing is very powerful , It provides a regular expression like syntax , Basically, it can meet all you want to get json Content .
The operator
operation |
explain |
$ |
Query the root element . This will start all path expressions . |
@ |
The current node is processed by the filter predicate . |
* |
wildcard , If necessary, use the name or number of any place . |
.. |
Deep scan . If necessary, you can use the name anywhere . |
.<name> |
spot , Represents a child node |
['<name>' (, '<name>')] |
Parentheses denote subitems |
[<number> (, <number>)] |
Array index or index |
[start:end] |
Array slice operation |
[?(<expression>)] |
Filter expression . The expression must evaluate to a Boolean value . |
function
You can call... At the end of the path , The output of the function is the output of the path expression , The output of the function is determined by the function itself .
function |
describe |
Output |
min() |
Provide the minimum value of the number array |
Double |
max() |
Provide the maximum value of the number array |
Double |
avg() |
Provide the average value of the number array |
Double |
stddev() |
Provide the standard deviation value of the number array |
Double |
length() |
Provide the length of the array |
Integer |
Filter operator
A filter is a logical expression used to filter an array .
The operator |
describe |
== |
left be equal to right( Be careful 1 It's not equal to '1') |
!= |
It's not equal to |
< |
Less than |
<= |
Less than or equal to |
> |
Greater than |
>= |
Greater than or equal to |
=~ |
Match regular expression [?(@.name =~ /foo.*?/i)] |
in |
The left is on the right [?(@.size in ['S', 'M'])] |
nin |
The left doesn't exist on the right |
size |
( Array or string ) length |
empty |
( Array or string ) It's empty |
JSONPath Use examples
JSON The original data
[{
"id": "PRIMARY",
"name": " Primary school ",
"front_id": "PRIMARY",
"front_name": " Primary school "
}, {
"id": "JUNIOR",
"name": " Junior high school ",
"front_id": "JUNIOR",
"front_name": " Junior high school "
}, {
"id": "HIGH",
"name": " high school ",
"front_id": "HIGH",
"front_name": " high school "
}, {
"id": "TECHNICAL",
"name": " secondary specialized school / Technical school ",
"front_id": "TECHNICAL",
"front_name": " secondary specialized school / Technical school "
}, {
"id": "COLLEGE",
"name": " junior college ",
"front_id": "COLLEGE",
"front_name": " junior college "
}, {
"id": "BACHELOR",
"name": " Undergraduate ",
"front_id": "BACHELOR",
"front_name": " Undergraduate "
}, {
"id": "MASTER",
"name": " master ",
"front_id": "MASTER",
"front_name": " master "
}, {
"id": "DOCTOR",
"name": " Doctor ",
"front_id": "DOCTOR",
"front_name": " Doctor "
}]
JSONPath get data
JSONPath expression |
result |
$.[*].name |
Of all degrees name |
$.[*].id |
be-all id |
$.[*] |
All the elements |
$.[(@.length-2)].name |
Of the penultimate element name |
$.[2] |
The third element |
$.[(@.length-1)] |
The last element |
$.[0,1] $.[:2] |
The first two elements |
$.[?(@.name =~ /.* junior college /i)] |
Filter out all the name contain “ junior college ” Of . |
$.[*].length() |
The number of all elements |
JSON and JSONPath Explanation , Mainly for what we want to talk about next JSON input Do matting , Let's get up !
transformation
transformation (transaformation) yes ETL The main part of the solution , It handles extraction 、 transformation 、 Loading various operations on data lines .
Create transformations
What we have to do ETL operation , It's all designed in transformation , So we need to create a transformation first .


Save conversion

Give you a new conversion , Name it , And save


JSON Input
This component can realize , From the specified JSON File input data .


a、 The file specified
1、 The file label specifies the data source file , Click on “ Browse ” Button , Browse local json file . Click on " increase " Button , You can add a file to " Select File " in , As shown below :




Option description
Options |
describe |
The source is defined in a field |
Select to retrieve the source from the previously defined field . After selection , Set the following fields available :Select field( Get source from field )、Use field as file names( The source is a file name ?)、Read source as URL( With Url Get source ?)、Do not pass field downstream. When this option is disabled , The following fields are available :File or directory( File or path )、Regular Expression( Regular expressions )、Exclude Regular Expression( Regular expressions ( exclude ))、Selected files( Selected files ). |
Get source from field |
In the previous steps, specify the field name to use as the source . |
The source is a file name |
Check to indicate that the source is the file name . |
With Url Get source |
Check to indicate whether it should be marked with URL Access the source in the form of . |
Do not pass field downstream |
Select to remove the source field from the output stream . This improves the performance of large-scale JSON Field performance and memory utilization . |
File or path |
If the source is not defined in the field , Please specify the source location . Click Browse (B) Navigate to the source file or directory . click add (A) To include the source in the selected file list . |
Regular expressions |
Specify a regular expression to match the file name in the specified directory . |
Regular expressions ( exclude ) |
Specify a regular expression to exclude file names in the specified directory . |
b、 Content

Option description
Options |
describe |
Ignore empty files |
Select skip empty file . After removal , An empty file will cause the process to fail and stop . |
Don't report errors without documentation |
When there are no documents to process , Choose to continue . |
Ignore incomplete paths |
When an error occurs (1) No fields match JSON The path or (2) All values are null when , Select continue processing file . After removal , No more lines will be processed when an error occurs . |
Default path leaf to null |
If checked, , Returns a for the missing path null value . |
Limit |
Specify the limit on the number of records generated from this step . When set to 0 when , The results are unlimited . |
Include the file name in the output |
If checked, , Add a string field with file name to the result . |
Include the number of lines in the output |
Select this option to add an integer field with line number to the result . |
Add file name |
Select to add processed files to the result file list . |
c、 Field

Option description
Options |
describe |
name |
Mapping to JSON Enter the name of the corresponding field in the stream . |
route |
JSON The full path of the field name in the input stream . By adding an asterisk to the path *, All records can be retrieved . |
type |
Enter the data type of the field . |
Format |
Optional mask used to convert the original field format . Information about the common valid date and number formats that can be used in this step |
length |
Length of field . |
precision |
Floating point numbers for numeric type fields . |
currency |
Currency symbols ( for example $ or €). |
Decimal system |
The decimal point can be .( for example ,5,000.00) or ( for example ,5.000,00). |
Group |
Grouping can be ,( for example ,10,000.00) or .( for example 5.000,00). |
The way to remove empty strings |
Pruning method applied to string . |
repeat |
If the behavior is empty , Then repeat the corresponding value of the last line . |
d、 Other output fields

Option description
Options |
describe |
File name field |
Specify the field containing the file name without path information but with extension . |
Extended name field |
Specify the field that contains the file name extension . |
Path field |
Specify the field containing the path in the operating system format . |
File size field |
Specify the field containing the data size . |
Whether it is a hidden file field |
Fields that specify whether the file is hidden ( Boolean value ). |
Finally, modify the time field |
Specify that only URI The field of the root part . |
Uri Field |
The specified contains URI Field of . |
Root uri Field |
Specify that only URI The field of the root part . |
Okay , About JSON Enter each tab of the component , I explained as much as I could . In fact, in my daily work , Not used so much , There are a few commonly used . But in the process of our study , I'd better speak more fully , I hope you spend time studying , Try to have a general understanding . Let's take an example to operate , This is a better way to absorb and understand .
Actual demonstration
a、 establish json file
I am here D Under the plate , Create a json file , Name it bigdata. The details are as follows

b、 Create transformations

c、JSON Enter Settings

Set fields ( Here you can refer to JSONPth expression )

d、 Preview the record



brothers , See the interface for previewing data , Prove that you have successfully passed XML File input component , Put one on your disk xml file , Read in . Congratulations , You already know how to use XML File input component .
Conclusion
This article mainly explains :JSON and JSONPath Those things , Then I explained JSON Enter the various detailed settings of the component , Finally, the actual combat demonstrates how to operate it to read the data on the disk json file .
brothers , In fact, there is a distance between thinking and acting , If you think about it, it's gone , But you're doing it , It landed .
Don't say anything , Brothers, follow me and it's over , We still break up the way of kneading to say . The following content is more wonderful , Coming soon , Thank you for your attention !!
版权声明
本文为[Feige big data]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230757551560.html
边栏推荐
- LGB, XGB, cat, k-fold cross validation
- Go language learning notes - exception handling | go language from scratch
- Emuelec compilation summary
- Taxable income
- [58] length of the last word [leetcode]
- [SQL Server fast track] view and cursor of database
- Brief steps to build a website / application using flash and H5
- 《數字電子技術基礎》3.1 門電路概述、3.2 半導體二極管門電路
- NPM reports an error: operation not allowed, MKDIR 'C: \ program files \ node JS \ node_ cache _ cacache’
- kettle实验
猜你喜欢

Kettle experiment

653. Sum of two IV - input BST

MySQL小練習(僅適合初學者,非初學者勿進)

What is monitoring intelligent playback and how to use intelligent playback to query video recording

LeetCode_ DFS_ Medium_ 1254. Count the number of closed islands

Experimental report on analysis of overflow vulnerability of assembly language and reverse engineering stack

AQS & reentrantlock implementation principle

SAP 101K 411k inventory change

108. Convert an ordered array into a binary search tree

MySQL small exercise (only suitable for beginners, non beginners are not allowed to enter)
随机推荐
1 + X cloud computing intermediate -- script construction, read-write separation
What is monitoring intelligent playback and how to use intelligent playback to query video recording
GoLand debug go use - white record
653. Sum of two IV - input BST
Go language learning notes - slice, map | go language from scratch
108. 将有序数组转换为二叉搜索树
MySQL小练习(仅适合初学者,非初学者勿进)
Failed to prepare device for development
Emuelec compilation summary
npm ERR! network
MYCAT configuration
How to protect open source projects from supply chain attacks - Security Design (1)
Principle of synchronized implementation
NPM installation yarn
#yyds干货盘点#ubuntu18.0.4安装mysql并解决ERROR 1698: Access denied for user ''root''@''localhost''
Go language self-study series | golang structure pointer
SQL used query statements
RSA encryption and decryption signature verification
MySQL - Chapter 1 (data type 2)
js 原型链的深入