Saturday, March 22, 2014

Scala lifted and direct embedding and large column count

In the latest version of Slick (2.0), there are two separate APIs named lifted embedding and direct embedding. Lifted embedding is the most stable and advised to use in production but we will look both of them here. Lifted embedding is explained in the Slick manual as:

The name Lifted Embedding refers to the fact that you are not working with standard Scala types but with types that are lifted into the scala.slick.lifted.Rep type constructor.

Direct embedding is something that exist as as an alternative to lifted but its clear that this is an experiment in Slick and will be further developed in coming versions.A database table is declared like this, the lifted syntax first and then the direct embedding syntax

Both are self explanatory, a table with the columns NAME and PRICE is declared. In the case of lifted embedding, all queries are lifted by an implicit conversion to a Rep type, in this case Rep[String] and Rep[Double]. For example a query like this

val q = coffees.filter(_.price > 8.0).map(_.name)

does lift the price, 8.0 and name to Rep[..]. This is not of the highest importance and is often transparent when working with Slick. It can however give some confusing compiler errors if you are not aware of this. Queries in direct mode are written a bit differently so lets take a look at this. There are two factory objects to execute queries against, Queryable and ImplicitQueryable. They both support a few familiar collection methods but many are missing. The ones supported are drop, filter, flatmap, length, map and take. To use either one a SlickBackEnd must be created, but with the ImplicitQueryable the backend and session objects only need to be assigned once to a query.

  db withDynSession {

    import scala.slick.direct.{SlickBackend, AnnotationMapper}
    val backend = new SlickBackend(scala.slick.driver.H2Driver, AnnotationMapper)
   
    val q1 = Queryable[Coffee]
    val q2 = q1.filter(_.price > 3.0).map(_.name)
    println(backend.result(q2.length, session) + ": " + backend.result(q2, session))

    val iq1 = ImplicitQueryable(Queryable[Coffee], backend, session)
    val iq2 = iq1.filter(_.price > 3.0)
    println(iq2.length + ": " + iq2.map(_.name).toSeq)

  }

If you plan to use the direct embedding access to the Scala compiler is required at run-time, so another dependency must be declared, for example like this in sbt:

libraryDependencies <+= (scalaVersion)("org.scala-lang" % "scala-compiler" % _)

With direct embedding there are some limitations in the 2.0 release. For example, typesafe database Inserts are not supported, something that is easily done with lifted embedding. Furthermore direct embedding only support the primitives String, Int, and Double in its column mapping.

I've used lifted embedding so far and don't see a reason yet for using direct embedding. There are some obvious limitations to its usage but it will be interesting to follow how this alternative API develop in future versions of Slick.

Another well known limitation of Slick is that when using the code-generator in a database-first scenario a different implementation is used for larger tables with more than 22 columns. I first considered to put this in a separate post but since I'm on the topic of Slick I might continue. The reason for not allowing tables with more than 22 columns are related to Scala only supporting tuples with 22 elements and less. Lets start with by using the code-generator on a table with 23 columns and see what happens. Still using a H2 database and have prepared a table with the 23 columns of type Int.

  val slickDriver = "scala.slick.driver.H2Driver"
  val jdbcDriver = "org.h2.Driver"
  val url = "jdbc:h2:test"
  val outputFolder = "."
  val pkg = ""
  scala.slick.model.codegen.SourceCodeGenerator.main(

    Array(slickDriver, jdbcDriver, url, outputFolder, pkg))

This will create a file Tables.scala. The interesting rows are these

  implicit def GetResultTable23Row(implicit e0: GR[Int]): GR[Table23Row] = GR{
    prs => import prs._
    Table23Row.tupled((<<[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int]))
  }
    def * = (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c22, c23) <> (Table23Row.tupled, Table23Row.unapply)


The documentation also have an explicit warning about long compilation times for tables with more than 25 columns so lets do a quick experiment with on table with 2, 23, and 200 columns and measure the time. I'll start with code-generation to see if there are performance effects. This code snippet is used to time the code generations.



The average times on my test computer are (in seconds) 0.0871, 0.0623 and 0.1222. The larger table runs longer by roughly a factor of 2 but this is expected since the generated file is larger. The respective file sizes are 2, 5 and 27 KB. Considering that the file size which grows by a factor larger than 10 (2 to 27KB) only doubles the execution time, the conclusion is that large tables does not severely impact the code-generation.
As a next step lets look at the compilation time. I create three new projects, called project2, project23 and project200 and add the generated code files for respective table size.

The project with a code file containing a table with 2 columns compiled in 2 seconds, the project with a single file with table of 23 columns compiled in 18 seconds. The project with the large table with 200 columns did not compile at all and I have yet to find a workaround for this.

Summary

The new Slick framework from Typesafe have two separate APIs for database queries, lifted embedding and direct embedding. Since the features of direct embedding are still limited I don't see any reasons yet to pick direct embedding over lifted embedding. The authors of Slick warned about slow compile time for larger table dimensions. In my test a single project containing a small table declaration was considerable faster to compile than a project containing a medium size table declaration. My test using a large table (200 columns) caused the compiler to fail, and the project didn't even load in an IDE like Eclipse. Slick is an interesting tool in the ORM/FRM (Functional Relation Mapper) landscape but it has limitations that are important to be aware of.

Monday, March 17, 2014

A view of the Mastercoin system

This is an attempt to get a birds-eye view of the Mastercoin system. By collecting all addresses and for each address build up a list of all outgoing transactions I have been able to build a few graphs which I will show here. To render these graphs the Excel template NodeXL was used and I will post a link to the Excel file at the end if anyone wants to play with this further. In March 2014 there were roughly 2700 unique Mastercoin addresses, out of which 2288 had a positive balance in MSC. In all graphs a vertex (circle) is an address, and an edge (arrow) is a transaction from one vertex to another.  


The first graph shows a raw overview containing all collected data (click for a larger image). As you can see the graph is directed in the direction of sent payments.


As expected there are many outlier vertices with only one edge pointed to, which is an address that only have received payments from a single address. There are also vertices which appear to be more central, connected by a large number of edges. If you look closely you will see a few arrows that are perfect circles. This is an address sending a payment to itself.

The vertices connected by many edges are obviously of some significance so I'll focus on them next. The next four graphs are sub-graphs of the the four vertices with the highest number of outgoing edges, i.e. addresses which have sent many transactions. 




The look very similar and resemble a hub and spoke pattern where one vertex in the center connect to many but the connected vertices does not connect to any or a few. Edges, if they exist, between vertices in the spoke are included and examples are seen primarily in the two graphs on the left.

Next I'll create four sub graphs for the four vertices with the highest amount sent in MSC. Out of these four vertices only one is included the the four above. (What is special about this address?)





Next, I'll start with the full graph in the beginning but will filter the graph by outgoing degree, i.e. sent transactions and raise this number gradually. The size of the vertices are also scaled where a higher number of transactions shows a larger circle and its presented in a grid with equal spacing between vertices.

Outgoing degree 1 and higher:


Outgoing degree 4 and higher:


Outgoing degree 10 and higher:


Outgoing degree 20 and higher:




To finish this I'll take a look at some metrics for graph theory:

Vertices*
2288
Total edges
3613
Number of payments sent
Unique Edges
2530
Payment only occurred once
Self-loops
4
Payment with same sender/receiver
Connected Components
5
Islands with no connection via payments
Vertices in largest component
2279
Diameter
12
Largest shortest path between vertices (how far have money travelled)
Max out-degree
245
Sent payments from address
Max in-degree
150
Received payments to address
Max (Eigenvector) centrality
0.034
Payments between an address with large number of payments to another with large number of payments
Avg (Closeness) centrality
0.003
Distance from one address to all other addresses. If equal to 1, the will be for every address one payment to all other addresses
Avg Clustering
0.017
Payments between receivers of of a payment
Reciprocal payments
232
Receiver sent payment back to sender
 * as mentioned in first paragraph


Thursday, March 13, 2014

A very simple example of spray-client usage

This post is about Scala and how to send REST/HTTP request with the spray-client API. I wrote this because the official tutorial here was leaving out some details in my opinion. I'll start with the dependencies required to compile an example program similar to the tutorial on the spray page. If you worked with Scala you've probably also used sbt (Simple Build Tool) which is a great tool to manage dependencies. If configured correctly it does a lot of things for you, but I'll show an alternative and manual way to include the libraries needed for this example program.1

The first set of dependencies are, the links will take you to downloads pages on mvnrepository.com

akka-actor 2.2.x
spray-can
spray-http
spray-httpx
spray-util

In addition to the above a few more jars are needed

parboiled-core
parboiled-scala
spray-client
spray-io
typesafe-config (part of Scala distribution, i.e. if you have Scala installed you already have this jar)


With these jars on the build path we can begin with the example code. The code declares a trait (or an interface) to a Webclient with a get method, a class implementation of the trait and an application object. The application object starts the Akka system, which spray is built on, creates the client and sends a HTTP GET request to an URI. At some time in the future the reply is received. This is everything needed and is shows the conciseness of Scala and Spray in a quite eloquent manner.

There is of course much more to spray, and if futures, akka and actors are not familiar, a very smart next move is to study them before continuing with spray. An introduction on Scala Futures and Akka should not take much time and will make you a lot more productive with spray.



1. When writing this I was on a network where sbt dependency resolution did not work, so manually downloading jar-files was a necessity for me.