Description

A Simple Document Format: JSON serialization library

Package Information

Version0.1.3 (2017-Mar-21)
Repositoryhttps://github.com/tamediadigital/asdf
LicenseBSL-1.0
CopyrightTamedia Digital, 2016
AuthorsIlya Yaroshenko, Yannick Koechlin
Registered byIlya Yaroshenko
Dependenciesnone

Installation

To use this package, put the following dependency into your project's dependencies section:

dub.json
dub.sdl

Readme

Dub version Dub downloads License codecov.io Build Status Circle CI Docs

A Simple Document Format

ASDF is a cache oriented string based JSON representation. Besides, it is a convenient Json Library for D that gets out of your way. ASDF is specially geared towards transforming high volumes of JSON dataframes, either to new JSON Objects or to custom data types.

❗️: Currently all ASDF Method names and all UDAs are in DRAFT state, we might want want make them simpler. Please submit an Issue if you have input.

❗️: when using the filter method invalid records are silently ignored per default. this is a feature, becase throwing an exception may be too expensive

Why ASDF?
  • ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
  • ASDF is simple. It uses D's modelling power to make you write less boilerplate code.
  • ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it the great fuzzer).

see also github.com/tamediadigital/je a tool for fast extraction of json properties into a csv/tsv.

Simple Example
  1. define your struct
  2. call serializeToJson ( or serializeToJsonPretty for pretty printing! )
  3. profit!
import asdf;

struct Simple
{
	string name;
	ulong level;
}

void main()
{
	auto o = Simple("asdf", 42);
	string data = `{"name":"asdf","level":42}`;
	assert(o.serializeToJson() == data);
	assert(data.deserialize!Simple == o);
}
Documentation

See ASDF API and Specification.

I/O Speed
  • Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
  • Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).
Fast setup with the dub package manager

Dub version

Dub is the D's package manager. You can create a new project with:

dub init <project-name>

Now you need to edit the dub.json add asdf as dependency and set its targetType to executable.

{
	...
	"dependencies": {
		"asdf": "~><current-version>"
	},
	"targetType": "executable",
	"dflags-ldc": ["-mcpu=native"]
}

Now you can create a main file in the source and run your code with

dub

Flags --build=release and --compiler=ldmd2 can be added for a performance boost:

dub --build=release --compiler=ldmd2

ldmd2 is a shell on top of LDC (LLVM D Compiler). "dflags-ldc": ["-mcpu=native"] allows LDC to optimize ASDF for your CPU.

Instead of using -mcpu=native, you may specify additional instruction set for a target with -mattr. For example, -mattr=+sse4.2. ASDF has specialized code for SSE4.2.

Compatibility
  • LDC (LLVM D Compiler) >= 1.1.0-beta2 (recommended compiler).
  • DMD (reference D compiler) >= 2.072.1.
Main transformation functions

| uda | function | | ------------- |:-------------:| | @serializationKeys("bar_common", "bar") | tries to read the data from either property. saves it to the first one | | @serializationKeysIn("a", "b") | tries to read the data from a, then b. last one occuring in the json wins | | @serializationKeyOut("a") | writes it to a | | @serializationMultiKeysIn(["a", "b", "c"]) | tries to get the data from a sub object. this has not optimal performance yet if you are using more than 1 serializationMultiKeysIn in an object | | @serializationIgnore | ignore this property completely | | @serializationIgnoreIn | don't read this property | | @serializationIgnoreOut | don't write this property | | @serializationScoped | Dangerous! non allocating strings. this means data can vanish if the underlying buffer is removed. | | @serializedAs!string | call to!string | | @serializationTransformIn!fin | call function fin to transform the data | | @serializationTransformOut!fout | run function fout on serialization, different notation | | @serializationFlexible | be flexible on the datatype on reading, e.g. read long's that are wrapped as strings |

please also look into the Docs or Unittest for concrete examples!

ASDF Example (incomplete)
import std.algorithm;
import std.stdio;
import asdf;

void main()
{
	auto target = Asdf("red");
	File("input.jsonl")
		// Use at least 4096 bytes for real wolrd apps
		.byChunk(4096)
		// 32 is minimal value for internal buffer. Buffer can be realocated to get more memory.
		.parseJsonByLine(4096)
		.filter!(object => object
			// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": <value>}... }}}
			["colors"]
			// iterates over an array
			.byElement
			// Comparison with ASDF is little bit faster
			//   then compression with a string.
			.canFind(target))
			//.canFind("red"))
		// Formatting uses internal buffer to reduce system delegate and system function calls
		.each!writeln;
}
Input

Single object per line: 4th and 5th lines are broken.

null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
	"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
Output
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
JSON and ASDF Serialization Examples
Simple struct or object
struct S
{
	string a;
	long b;
	private int c; // private feilds are ignored
	package int d; // package feilds are ignored
	// all other fields in JSON are ignored
}
Selection
struct S
{
	// ignored
	@serializationIgnore int temp;
	
	// can be formatted to json
	@serializationIgnoreIn int a;
	
	//can be parsed from json
	@serializationIgnoreOut int b;
}
Key overriding
struct S
{
	// key is overrided to "aaa"
	@serializationKeys("aaa") int a;

	// overloads multiple keys for parsing
	@serializationKeysIn("b", "_b")
	// overloads key for generation
	@serializationKeyOut("_b_")
	int b;
}
User-Defined Serialization
struct DateTimeProxy
{
	DateTime datetime;
	alias datetime this;

	static DateTimeProxy deserialize(Asdf data)
	{
		string val;
		deserializeScopedString(data, val);
		return DateTimeProxy(DateTime.fromISOString(val));
	}

	void serialize(S)(ref S serializer)
	{
		serializer.putValue(datetime.toISOString);
	}
}
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
	@serializationIgnore DList!(SomeArr[]) myDll;
	alias myDll this;

	//no template but a function this time!
	void serialize(ref AsdfSerializer serializer)
    {
        auto state = serializer.arrayBegin();
        foreach (ref elem; myDll)
        {
            serializer.elemBegin;
            serializer.serializeValue(elem);
        }
        serializer.arrayEnd(state);
    }   
}
Serialization Proxy
struct S
{
	@serializedAs!DateTimeProxy DateTime time;
}
Finalizer

If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the finalizeDeserialization method.

struct S
{
	string a;
	int b;

	@serializationIgnoreIn double sum;

	void finalizeDeserialization(Asdf data)
	{
		auto r = data["c", "d"];
		auto a = r["e"].get(0.0);
		auto b = r["g"].get(0.0);
		sum = a + b;
	}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));
serializationFlexible
static struct S
{
	@serializationFlexible uint a;
}

assert(`{"a":"100"}`.deserialize!S.a == 100);
assert(`{"a":true}`.deserialize!S.a == 1);
assert(`{"a":null}`.deserialize!S.a == 0);

Available versions

0.1.3 0.1.2 0.1.1 0.1.0-alpha3 0.1.0-alpha2 0.1.0-alpha1 0.0.11 0.0.10 0.0.9 0.0.8 0.0.7 0.0.6 0.0.5 0.0.4 0.0.3 0.0.2 0.0.1 ~master ~ranges ~ldc