asdf 0.1.4-beta0

Fast, Expressive, and Easy to use JSON Serialization Library with optional SSE4 Optimization.


To use this package, run the following command in your project's root directory:

Manual usage
Put the following dependency into your project's dependences section:

Dub version Dub downloads License codecov.io Build Status Circle CI Docs Build status

A Simple Document Format

ASDF is a cache oriented string based JSON representation. Besides, it is a convenient Json Library for D that gets out of your way. ASDF is specially geared towards transforming high volumes of JSON dataframes, either to new JSON Objects or to custom data types.

❗️: Currently all ASDF Method names and all UDAs are in DRAFT state, we might want want make them simpler. Please submit an Issue if you have input.

❗️: when using the filter method invalid records are silently ignored per default. this is a feature, becase throwing an exception may be too expensive

Why ASDF?
  • ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
  • ASDF is simple. It uses D's modelling power to make you write less boilerplate code.
  • ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it the great fuzzer).

see also github.com/tamediadigital/je a tool for fast extraction of json properties into a csv/tsv.

Simple Example
  1. define your struct
  2. call serializeToJson ( or serializeToJsonPretty for pretty printing! )
  3. profit!
import asdf;

struct Simple
{
	string name;
	ulong level;
}

void main()
{
	auto o = Simple("asdf", 42);
	string data = `{"name":"asdf","level":42}`;
	assert(o.serializeToJson() == data);
	assert(data.deserialize!Simple == o);
}
Documentation

See ASDF API and Specification.

I/O Speed
  • Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
  • Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).
Fast setup with the dub package manager

Dub version

Dub is the D's package manager. You can create a new project with:

dub init <project-name>

Now you need to edit the dub.json add asdf as dependency and set its targetType to executable.

{
	...
	"dependencies": {
		"asdf": "~><current-version>"
	},
	"targetType": "executable",
	"dflags-ldc": ["-mcpu=native"]
}

Now you can create a main file in the source and run your code with

dub

Flags --build=release and --compiler=ldmd2 can be added for a performance boost:

dub --build=release --compiler=ldmd2

ldmd2 is a shell on top of LDC (LLVM D Compiler). "dflags-ldc": ["-mcpu=native"] allows LDC to optimize ASDF for your CPU.

Instead of using -mcpu=native, you may specify additional instruction set for a target with -mattr. For example, -mattr=+sse4.2. ASDF has specialized code for SSE4.2.

Compatibility
  • LDC (LLVM D Compiler) >= 1.1.0-beta2 (recommended compiler).
  • DMD (reference D compiler) >= 2.072.1.
Main transformation functions
udafunction
@serializationKeys("bar_common", "bar")tries to read the data from either property. saves it to the first one
@serializationKeysIn("a", "b")tries to read the data from a, then b. last one occuring in the json wins
@serializationKeyOut("a")writes it to a
@serializationMultiKeysIn(["a", "b", "c"])tries to get the data from a sub object. this has not optimal performance yet if you are using more than 1 serializationMultiKeysIn in an object
@serializationIgnoreignore this property completely
@serializationIgnoreIndon't read this property
@serializationIgnoreOutdon't write this property
@serializationScopedDangerous! non allocating strings. this means data can vanish if the underlying buffer is removed.
@serializedAs!stringcall to!string
@serializationTransformIn!fincall function fin to transform the data
@serializationTransformOut!foutrun function fout on serialization, different notation
@serializationFlexiblebe flexible on the datatype on reading, e.g. read long's that are wrapped as strings

please also look into the Docs or Unittest for concrete examples!

ASDF Example (incomplete)
import std.algorithm;
import std.stdio;
import asdf;

void main()
{
	auto target = Asdf("red");
	File("input.jsonl")
		// Use at least 4096 bytes for real wolrd apps
		.byChunk(4096)
		// 32 is minimal value for internal buffer. Buffer can be realocated to get more memory.
		.parseJsonByLine(4096)
		.filter!(object => object
			// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": <value>}... }}}
			["colors"]
			// iterates over an array
			.byElement
			// Comparison with ASDF is little bit faster
			//   then compression with a string.
			.canFind(target))
			//.canFind("red"))
		// Formatting uses internal buffer to reduce system delegate and system function calls
		.each!writeln;
}
Input

Single object per line: 4th and 5th lines are broken.

null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
	"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
Output
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
JSON and ASDF Serialization Examples
Simple struct or object
struct S
{
	string a;
	long b;
	private int c; // private feilds are ignored
	package int d; // package feilds are ignored
	// all other fields in JSON are ignored
}
Selection
struct S
{
	// ignored
	@serializationIgnore int temp;
	
	// can be formatted to json
	@serializationIgnoreIn int a;
	
	//can be parsed from json
	@serializationIgnoreOut int b;
}
Key overriding
struct S
{
	// key is overrided to "aaa"
	@serializationKeys("aaa") int a;

	// overloads multiple keys for parsing
	@serializationKeysIn("b", "_b")
	// overloads key for generation
	@serializationKeyOut("_b_")
	int b;
}
User-Defined Serialization
struct DateTimeProxy
{
	DateTime datetime;
	alias datetime this;

	static DateTimeProxy deserialize(Asdf data)
	{
		string val;
		deserializeScopedString(data, val);
		return DateTimeProxy(DateTime.fromISOString(val));
	}

	void serialize(S)(ref S serializer)
	{
		serializer.putValue(datetime.toISOString);
	}
}
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
	@serializationIgnore DList!(SomeArr[]) myDll;
	alias myDll this;

	//no template but a function this time!
	void serialize(ref AsdfSerializer serializer)
    {
        auto state = serializer.arrayBegin();
        foreach (ref elem; myDll)
        {
            serializer.elemBegin;
            serializer.serializeValue(elem);
        }
        serializer.arrayEnd(state);
    }   
}
Serialization Proxy
struct S
{
	@serializedAs!DateTimeProxy DateTime time;
}
Finalizer

If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the finalizeDeserialization method.

struct S
{
	string a;
	int b;

	@serializationIgnoreIn double sum;

	void finalizeDeserialization(Asdf data)
	{
		auto r = data["c", "d"];
		auto a = r["e"].get(0.0);
		auto b = r["g"].get(0.0);
		sum = a + b;
	}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));
serializationFlexible
static struct S
{
	@serializationFlexible uint a;
}

assert(`{"a":"100"}`.deserialize!S.a == 100);
assert(`{"a":true}`.deserialize!S.a == 1);
assert(`{"a":null}`.deserialize!S.a == 0);
Authors:
  • Ilya Yaroshenko
  • Yannick Koechlin
Dependencies:
none
Versions:
0.7.17 2023-Feb-07
0.7.16 2023-Feb-01
0.7.15 2022-Jun-02
0.7.14 2022-Mar-24
0.7.13 2021-Nov-09
Show all 85 versions
Download Stats:
  • 6 downloads today

  • 67 downloads this week

  • 364 downloads this month

  • 300818 downloads total

Score:
4.0
Short URL:
asdf.dub.pm