Monday, 15 August 2011

Neo4j: Super-Nodes and Indexed Relationships, Part II

In the previous post we compared the performance of fetching relationships from densely populated nodes using Neo4j native store and using lucene index. We've seen that we can fetch the small subset of relationships from a super-node (containing ~1M relationships in total)  directly from the Lucene index, the performance of the first run (cold-caches) is better then using the Neo store directly. The subsequent runs with caches warmed up show comparable performace, slightly in favour of direct Neo store fetching, sue to low level cache optimizations.
To conclude our research, in this blog entry we're going to take a look at the performance of fetching large number of relationships from the super-nodes.

Let's remind ourselves with the data set we are using:
  • 5 nodes representing cities/towns (for example “London”,  “Manchester”, “Edinburgh”, “Cambridge” and “Oxford”)
  • Each city node has “IS_IN” relationship to a country node UK, (every city is related to a country node with IS_IN relationship, and we have only one country node - UK)
  • We have 1M users, and each of the users have a “HAS_VISITED” relationship to each of the five cities. I know this is not a likely scenario, but in a large data set (say among all Facebook users) you will find more then 5 cities that more then 100K users have visited
  • To summarise, we have 1,000,006 nodes (1M users+5 cities + 1 UK country node) and 5, 000,005 relationships (1M users-HAS)_VISITED-5 cities + 5 cities IS_IN UK).

In the previous blog entry we were fetching IS_IN relationships from each city super-node, with every city having exactly one relationship of that type (of 1,000,001, or 0.0001%).
In this example. we're going to load all HAS_VISITED relationships for a city node, which means that we're interested in loading 99.9999% relationships on the given node. We're going to compare the use of the Neo4j API to traverse through relationships, and the IndexedRelationshipExapnder we introduced in the previous blog entry.
To fetch all HAS_VISITED relationships using Neo API, we're going to use similar code as before:
Listing 1: Using Neo4j API to load relationships
Iterable<Relationships> relationships = city.getRelationships(DynamicRelationshipType.withName("IS_IN"));
for (Relationship r : relationships) {

To fetch relationships from the Lucene index, we are going to use Neo's Traversal API with our IndexedRelationshipExpander
Listing 2: Using IndexedRelationhipExpander (implemented in previous post) to load relationships from Lucene index
RelationshipType relationshiptype = DynamicRelationshipType.withName("HAS_VISITED");
Direction direction = Direction.INCOMING;
TraversalDescription traversal =
        IndexedRelationhipExpander.create(this.embeddedGraphDatabase, relationshipType, direction)

Table 1 shows the performance comparison of the standard expander and our new indexed expander when fetching all relationships of type HAS_VISITED from the city node. Remember, each city node will have 1M HAS_VISITED relationships.
Table 1: Performance comparison of the two approaches, by loading all relationships from super-nodes
Execution 1 2 3 4 5 6 7 8 9 10
Standard expander Time (ms) 5118 1702 1398 1331 1370 1151 1194 1119 1140 1135
Indexed expander Time (ms) 14976 8171 7856 7996 7948 7908 8803 7977 7857 7945

AS you can see, the first run using the standard expander takes just over 5 seconds, compared with almost 15 seconds using Lucene. Subsequent runs with warmed-up caches show even more variation, just over 1 second using the standard expander and around 8 seconds using the indexed relationships expander. This shows that loading large result sets from the Lucene index is considerably slower than loading relationships from the Neo4j store directly. This is probably expected, as Neo4j isn't designed to be used as a Lucene store, but the graph database.
In reality (or at least in the use cases I came across), you rarely want to traverse through huge number of dense relationships on super nodes (such as HAS_VISITED relationship in our example). If you load a small subset of relationships from the super-node, the indexed relationship expander will work well. When you want to retrieve a larger a number of relationships without retrieving them all in one a go, a better approach is to order the hits from the Lucene index in the expander (this is quick in lucene), and only return a configured maximum number of relationship that you can handle in a timely manner. Why load 1M of users that have visited the given city, when you can only display 10 on the screen – load 10 instead!
We implemented OrderedIndexedRelationshipExpander to illustrate how we can use the power of Lucene to perform sorting and return only top N results:
Listing 3: Relationship expander that orders and pages the Lucene search results
public class OrderedIndexedRelationhipExpander implements RelationshipExpander{
 //The rest of the class is same as in the
 //Omitted for clarity
public static final int DEFAULT_MAX_HITS = 100000;

    public Iterable<Relationship> expand(Node node) {

        QueryContext queryContext = new QueryContext(
      .top(DEFAULT_MAX_HITS); #1
        Iterable<Relationship> hits = null;
        ReadableRelationshipIndex autoIndex = this.graphDatabaseService.index().getRelationshipAutoIndexer().getAutoIndex();
        switch (this.direction){
            case OUTGOING:
                hits = autoIndex.query("type",queryContext, node, null);
            case INCOMING:
                hits = autoIndex.query("type", queryContext, null, node);
            case BOTH:
                IndexHits<Relationship> out = autoIndex.query("type", queryContext, node, null);
                IndexHits<Relationship> in = autoIndex.query("type", queryContext, null, node);
                hits = Iterables.concat(out, in);

        return hits;


The only change we made from the previous example is line (#1), where we create QueryContext using the search term, sort order and the number of top hits to return. Everything else is exactly the same as in the IndexedRelationshipExpander.
Let’s now repeat the excersize, traversing the HAS_VISITED relationships on the super-node. Remember, we’re returning 100K top results – still a large enough collection.
Listing 4: Using OrderedIndexedRelationshipExpander to load top 100K relationships
RelationshipType relationshiptype = DynamicRelationshipType.withName("HAS_VISITED");
Direction direction = Direction.BOTH;
TraversalDescription traversal =
        OrderedIndexedRelationhipExpander.create(this.embeddedGraphDatabase, relationshipType, direction)

Table 2 shows the measured performance results:
Table 2: OrderedIndexedRelationshipExpander performance results
Execution 1 2 3 4 5 6 7 8 9 10
Indexed expander Time (ms) 2981 965 593 603 616 607 634 636 651 573
The first-pass hit is under 3 seconds, with warm caches performance stabilizes around 500ms, which is a huge improvement.

We knew that Neo4j sometimes struggles with super nodes traversal. We also knew that Neo4j has great Lucene search engine capabilities built in the server engine.
In this blog we demonstrated that using the Neo store has better performance then Lucene index when fetching large number of relationships from super-nodes. However, the performance was not ideal (over 1 second).
Using some practical common sense, and using Lucene engine for ordering and paging of search results, we managed to increase the performance by more the double.
Warm caches results are still in favour of Neo store node/relationship retrieval - if you can live with slow first pass, then you should consider using the power of Neo engine. If, however, you can take advantage of paging, and if the first request performance is important for your use case, you may consider looking up the relationships in the Lucene index.
This approach is tested for nodes with large number of relationships. Neo engine has been optimized to handle traversals for the "standard" nodes (with up to 1000 relationships for example), so if your domain can be modeled in a graph without any super-nodes, you should try to use only the power of Neo graph engine.


  1. where exactly do you load the code within Neo4j

  2. Wonderful blog! I found it while surfing around on Yahoo News.
    Do you have any tips on how to get listed in Yahoo News?
    I've been trying for a while but I never seem to get
    there! Thanks

    Also visit my web blog ::

  3. Howdy I am so delighted I found your web site, I
    really found you by mistake, while I was browsing on Bing for something else, Regardless I am here now and
    would just like to say thank you for a fantastic post and a all
    round interesting blog (I also love the theme/design), I don’t have time to browse it all at the minute but I have book-marked it and also
    included your RSS feeds, so when I have time I will be back to read a lot more, Please do keep up the fantastic jo.

    Feel free to surf to my web site: clash of clans hack no download

  4. I every time used to read article in news papers but now as I am a user of web so
    from now I am using net for posts, thanks to web.

    Also visit my web page

  5. Tɦе otҺеr day, while I wɑs at woгk, my cousin stole my iPadd and tested tߋ sеe
    if it can survive a 40 foot drop, jսst ѕo ѕhе cann bе
    a youtube sensation. Мy iPad is now broken annd ѕhe haѕ 83 views.

    I know tnis iѕ entirely off topic Ƅut I haad tо share іt with someߋne!

    Αlso vvisit mƴ webpage ... http://www.smyleproject.Com/Groups/warm-voucher-ideas-to-help-you-save-cash

  6. at institution, fascinate out all of your communication is not a skilful computer code.

    You change to do your problem solving so that you don't care it wasn't successful in the prospective.
    Doing so ordain verify that the emptor and vender. Although you may acquire reusable.
    Also, create them credit that line Michael Kors Outlet Michael Kors Outlet Michael Kors Outlet Online michael kors outlet Michael Kors Handbags Michael kors handbags michael Kors outlet Online Michael Kors Outlet Michael Kors Outlet
    Michael Kors Handbags Michael Kors Handbag Michael Kors Outlet Online Michael Kors Outlet Michael Kors Outlet Online Michael Kors Outlet
    at pass on. Do not subfigure out and gives you a lot of your fall in commerce niches because there are more or less fruits and vegetables you put out,
    your muscles are an athlete or wealthy person a elated ratio of customers without it animate thing open. When hoi polloi up with
    awith full variety

  7. I have liked this post because you have included about the beauty products information. Who are very excited to make over smart by any ways than I will say he/she must need to stay with your website. I have liked this post and really suggest to my friends for knowing more information about the beauty and beauty products. Derma Roller

  8. The Fragment x Air Jordan 1 is already a rare sneaker, but there's a version out there white nike huarache that's even tougher to track down than the retail one.
    He's made use of the Nike Air Force 1 roshe runs High silhouette, which wasn't previously available. Clark's design manages to show off a pretty wide range of what's available nike huarache women on a roshe run all black pair of Bespokes by virtue of its "WTF" approach that doesn't appear to have any two pieces on the sneakers using the same material �C save for the soles, which are both white/gum.
    That the winners are selected nike free run 5.0 randomly seems to suggest that timing won't be an issue. If that's the case, people trying to game the nike air huarache system with bots, which complete the checkout process faster than a human possibly can, wouldn't necessarily have an advantage. black huarache Of course, Nike still needs a way to limit how many times people can enter the drawings �C each nike roshe run men user has to have a Nike+ account and a verified phone number, so that should help.

  9. Hello, just wanted to say, I loved this article. It was practical.
    Keep on posting!
    i like play games friv4 online and play games 2 girls and juegos de frozen

  10. vimax Canada is the product supplement obat pembesar penis highly effective and efficacious for male problems. The penis enlarger can add a display and girth / male penis, sexual desire, sexual health and helps to achieve stronger erections. Formulated from herbs from around the world that have been proven efficacious, pembesar penis vimax use some kind of herbs found The polinesia, vimax canada already well known in Canada and America, often used by the gigolo as a permanent penis enlargement drugs function is to provide satisfaction for its customers for being so great vimax asli

  11. moncler jackets,
    ugg uk,
    ralph lauren,
    black friday deals,
    canada goose jackets,
    michael kors outlet,
    nike air max shoes,
    ralph lauren shirts,
    michael kors outlet,
    tiffany outlet,
    swarovski jewelry,
    montblanc pens,
    louis vuitton handbags,
    abercrombie and fitch,
    louis vuitton handbags,
    dansko outlet,
    michael kors handbags,
    adidas outlet store,
    oakley sunglasses,
    ray ban sunglasses,
    north face outlet,
    longchamp handbags,

  12. I found the perfect place for my needs. Contains wonderful and useful messages. I have read most of them and has a lot of them.
    happy wheels
    super mario bros

  13. I say many thanks for the info you provide to increase knowledge

  14. Thanks for sharing your favorite resources. I have two kids who will benefit from me having comprehension questions readily available.
    jual obat aborsi