Developing on Staxmanade

Nate's DelegateFactory to bypass reflection for reading properties. (speed up i4o)

So today I was perusing my blog list and ran across a blog by Rick Strahl (Dynamic Delegates with Expression Trees) which is referencing this post by Nate Kohari (Fast Late-Bound Invocation with Expression Trees).

I first just thought it was a very interesting post and thought, “wow, next time I have to do something requiring a ton of reflection method calls I’ll have to remember this”. I took off with the family for the day and sometime during the day the idea popped into my head about trying it to read property’s values (since they are actually just a method call under the hood). And I do have the perfect place to try and spike this idea…

One of the things I’ve done to the i4o library is try to reduce the calls to reflection whenever possible. And the only one left that I couldn’t get past was having to use the PropertyInfo’s GetValue reflection call to get at a value for creating the property index. This generally isn’t a big deal because we only have to do it once, however it makes the initialization of the IndexableCollection<T> fairly expensive.

So I decided to spike the idea of using the DelegateFactory from Nate’s blog to read an object’s property values. And below I’m including the project that I spiked to test this idea out.

It compares how fast we can read values out of an object’s property by first using reflection and second using the dynamic delegate.

If you read Ricks post above, he mentions how the dynamic delegate idea is a little reflection expensive up front. However it only happens once and I ran a couple of tests…(p.s. i noticed a small flaw in my timing, however I don’t want to re-zip & re-upload the spike… so if you see the timing flaw, cool…)

I noticed that after about the first 5000 reads the expense of the heavy up front reflection from DelegateFactory started to become negligent. And the more reads the more powerful the new reading strategy became. Take 1/5 million reads for example… normal reflection reading took .78 seconds, while the delegate reading took only .03 seconds which is a substantial increase.

After that I spiked it in the i4o project and proved that this could significantly increase the index build time of the an IndexableCollection<T>. I shelved the spike and am having Aaron take a look at the idea. We’ll see, I haven’t come up with any reasons why this could cause any issues, so it may be included…

i4o: IndexSpecification<T> for the IndexableCollection<T>

We've removed the IndexableAttribute from the i4o library and replaced it with the IndexSpecification<T>. Below I'll explain how you can add/remove/change the index for an IndexableCollection<T>.

Let's show several examples on how to create an IndexableCollection<T>.

Given any enumeration of objects you can translate into an IndexableCollection<T>. For the examples below we're going to use an enumeration of the System.IO.FileInfo class. We are going to index the list of FileInfos by file Extension and weather the file IsReadOnly or not.

Setup the list:

// Get our thing to index
string dir = @"C:\Windows\System32\";
var fileInfosFromDir = (from f in System.IO.Directory.GetFiles(dir, "*.*", SearchOption.AllDirectories)
select new FileInfo(f)).ToList();


Can create the IndexSpecification...

...for the FileInfo's "Extension" and "IsReadOnly" properties and turn the list of FileInfos into an IndexableCollection using the IndexSpecification.

// Create the index specification
var spec = new IndexSpecification<FileInfo>()
.Add(i => i.Extension)
.Add(i => i.IsReadOnly);

// Turn the list of files into an Indexed collection of files
var indexedFileInfosFromDir = fileInfosFromDir.ToIndexableCollection(spec);

Create IndexableCollection<T> without IndexSpecification<T>:

You are not required to specify and IndexSpecification<T> when creating the IndexableCollection<T>. You can translate the list into an IndexableCollection<T> and add properties to index after the fact. EX:

var indexedFileInfosFromDir = fileInfosFromDir.ToIndexableCollection();

// Specify the properties to index dynamically (more late bound)
indexedFileInfosFromDir
.CreateIndexFor(i => i.Extension)
.CreateIndexFor(i => i.IsReadOnly);

Swap one IndexSpecification<T> for another:

If you want to completely swap out the index at run time, you can give the IndexableCollection a new IndexSpecification

var list = new List<FileInfo>();
var indexedList = list.ToIndexableCollection();

indexedList.UseIndexSpecification(
new IndexSpecification<FileInfo>()
.Add(o => o.Directory)
.Add(o => o.Name));

I think that should cover most of the general cases. Hope this helps...

Comments

Alex
Nice Blog, i recently come to your blog through Google excellent knowledge keep on posting you guys.

___________________
Dissertation Sample
___________________
marry
Blogs are so informative where we get lots of information on any topic. Nice job keep it up!!
_____________________________

Dissertation Topics
Satyanarayana Muddu
Hi Jason,

Thank you for your response.
My Xml looks like

University1
Course1
Student1
Student2
University1
Course2
Student3
Student4
University2
Course1
Student5
etc...

XElement cimXml = XElement.Load(@"C:\Students.xml");

So, here I am grouping on Universities.
var universityGroupedElements = from ele in cimXml.Elements() group ele by ele.Name;

Now I want to add indexes to University, Course, Student elements to make queries faster.

Also, I want to group the courses in each university and also students in each course

Please help, in writing some code.
Jason.Jarrett
Hello @Satyanarayana

I would be happy to help you, but have a request and a couple questions...

1. I've realized my blog comments are starting to become a bit more of the i4o knowledge base than it deserves. Would you please ask your question over on the i4o Discussion board?

2. I'd like a little more detail. What does the xml structure look like? What does the linq query look like that returns your "grouped" data? What is it you are trying to index/search?

Thanks,
Jason
Satyanarayana Muddu
I have a question how to use IndexSpecification for a nested collections in a xml file. For example
University
Course
Student

I have list of Students from a various Courses and various Universities. I used Linq for Grouping Universities and Courses of Students. Now, I have question how to implement Indexing on Students, Courses and Universities.
Please give the implementation details .

Thanks
Satya
Paiwan
At first, I was thinking of modifying i4o but after I have been testing, I have found the power behind i4o with simple changes the way you write query.

For example:

Instead of writing this..
var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt" && fi.IsReadOnly == true);

Try this..
var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt");
var f2 = f.Where(fi.IsReadOnly == true);

(fi.Extension should be smaller group than fi.IsReadOnly)

From what I have tested,
It responses within a few milisecond.
I am happy with this.

Thank you for great work.
Jason.Jarrett
@Paiwan - I'm glad to see you're running your own benchmarks to test out the library.

Please keep in mind that this library is very simple, and is not a complete Linq implementation.

It provides some great benefits in the scenarios it was designed for. And I know Aaron has some more improvements on the way.

If you create any patches for the project, we are happy to take a look at any improvements you can come up with.

Thanks again
Paiwan
I have found another limitation of i4o and would like to share.

From the 'Demoi4o' project.

If I change this below query

var studentsNamedAaronFromConstant =
from student in _testStudents
where student.FirstName == studentNameBox.Text
select student;

To

var studentsNamedAaronFromConstant =
from student in _testStudents
where student.FirstName.Contains(studentNameBox.Text)
select student;


It takes longest time!!
Paiwan
Thanks so much for unit test.
Jason.Jarrett
@Paiwan To test your question I wrote a unit test and just checked it in. You can view the test here http://i4o.codeplex.com/SourceControl/changeset/view/31862#398075

If this is what you were asking, then no it will not update the index.

You need to call the IndexableCollection<T>.Add/Remove and not it's base type Collection<T>.Add/Remove for the indexes to be updated.

This is however something we'd like to support in the future, which is why I checked in the failing test. We probably need to just implement ICollection<T> etc... and to fully support this.
Paiwan
Hi Jason,

I have a question.
If we have some changes on our base collection like add or delete, do we have to re create index or do some special steps?

Thanks,
Paiwan
Kevin
Ok, thanks for the confirmation. I wanted to make sure I wasn't crazy. :-)

I was excited when I saw your post and really like the syntax for adding properties to the IndexSpecification. Now, I guess I'll just wait until I can actually use it like that. From Aaron's blog, it sounds like he plans to introduce updates to the Where expression in the next release.

Thanks for your work on this library. I'm hoping this will make it possible to replace a subsytem of our app which currently relies on looking up factors in large in-memory sets of data by using XML. It loads large XML documents into memory and builds XPath queries to get to individual factors that it needs. The XPath queries themselves are pretty past, but I'm trying to use Linq-to-XML to project the XML into collections of strongly type objects that I can query with Where lambda's instead. It works well, but when the collections are large enough the queries are too much slower than the XPath version. I'm hoping the IndexableCollection will make it possible.

Kevin
Jason.Jarrett
That's a good point and I think you are correct...

I was just illustrating the power of the IndexSpecification. Unfortunately the extension methods used to evaluate the expression need some work.
Kevin
Hi Jason,

Thanks for this post. I got to it from the i4o home page on Codeplex. I noticed in the discussions that someone mentioned the fact that if you have multiple expressions in your Where expression, that it won't use the index.

So does that mean, your example showing multiple properties being added to the index doesn't really help if you wanted to search your collection on both of the properties in the index?

For example, would this code use the index?

var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt" && fi.IsReadOnly == true);

From my testing, it appears that no benefit is achieved from using an IndexableCollection vs. a regular List in this case. Seems like there's no point in using the IndexableCollection if you need more than one property in the index. Am I missing something or is that correct?

Thanks,
Kevin

i4o: Update (Index for Objects)

Recently I was added as a developer to the i4o codeplex project by Aaron and last night I made my first commit.

There were a few pretty big changes to the library. I'll highlight some of them below, and in a couple future posts will describe in a little more detail how you can use these features.

Removed the IndexableAttribute:

This was removed for two reasons.

  • It did not allow you index objects you didn't own.

Say you want to index a collection of System.IO.FileInfo objects. Since that's owned by the .Net framework, you can't apply an attribute to the properties of the class you want to index.

The solution to that was to allow adding indexed properties dynamically, however you had to give it a string representing the property name, which leads into the second issue...

  • Didn't provide any compile time checking or refactor support for properties using the dynamic add/remove methods.

If you refactored a property and forgot to rename the string to match the refactor you would end up with a runtime error.

To resolve the two issues above I've introduced the IndexSpecification<T> (I'll describe how to use it a later post)

Performance tuning:

Although we haven't enhanced the Linq support for IndexableCollection (should come soon), I was able to eek some performance by doing some internal caching of property types and a few other tweaks resulting in the Index creation becoming about 30% faster...

Fluent interface for managing the Indexes

The last big change which fits in with the IndexSpecification<T> is the Fluent interface for dynamically adding/removing properties to be indexed. One short example could be:

image

i4o & Silverlight Unit Tests (A little more work than the i4o library)

Follow up to my post on i4o & Silverlight (compiles first try)...

 

I took a stab at porting the i4o unit tests to Silverlight which was quite a bit more work than I initially expected.

After creating a Silverlight Unit Test project and linking the original test files into the Silverlight project, I compiled...

  • First the VB using statement wasn't even needed, so I removed that. using Microsoft.VisualBasic;
  • Second there is no System.Diagnostics.Stopwatch() class in Silverlight, so I basically implemented a quick one using DateTime to get the unit tests to compile in Silverlight. Here's the class, except the Frequency property has been commented out (didn't spend time to figure how to make that correct, or what is correct???)
public class Stopwatch
{
private DateTime _StartUtcDateTime;
private DateTime? _EndUtcDateTime;
private bool _IsRunning = false;

//public static readonly long Frequency { get { throw new NotImplementedException(); } }
public static readonly bool IsHighResolution = false;
public Stopwatch()
{}

public TimeSpan Elapsed
{
get
{
if (_EndUtcDateTime.HasValue)
{
return new TimeSpan(_EndUtcDateTime.Value.Ticks - _StartUtcDateTime.Ticks);
}
else
{
return new TimeSpan(DateTime.UtcNow.Ticks - _StartUtcDateTime.Ticks);
}
}
}

public long ElapsedMilliseconds { get { return Elapsed.Milliseconds; } }
public long ElapsedTicks { get { return Elapsed.Ticks; } }
public bool IsRunning { get { return _IsRunning; } }

public static long GetTimestamp()
{
return DateTime.Now.Ticks;
}

public void Reset()
{
_EndUtcDateTime = null;
_StartUtcDateTime = DateTime.UtcNow;
}

public void Start()
{
_EndUtcDateTime = null;
_IsRunning = true;
this._StartUtcDateTime = DateTime.UtcNow;
}

public static Stopwatch StartNew()
{
var w = new Stopwatch();
w.Start();
return w;
}

public void Stop()
{
_EndUtcDateTime = DateTime.UtcNow;
_IsRunning = false;
}
}


The only other issue that came up was some of the Stopwatch dependent tests happened so fast that they would fail intermittently... the quick hack/solution for this was to up the iteration count of whatever they were testing.


After all the above taking care of all the above issues, I was able to get the unit tests to pass.


image

Comments

Tiaan
You might also want to look at my implementation of the Stopwatch for Silverlight, which supports resuming.
Tiaan
For the Frequency property's implementation, you probably just need to return the System.TimeSpan.TicksPerSecond value.

i4o with Silverlight (Compiles first try)

So I've been thinking lately of a problem that we will be solving at work that will require a user to follow these steps...

1. User selects some set of data to work with
2. Some number crunching has to happen to give the user a report like interface
3. User analyses that data, tweaks some value and will basically go to step 2 above...

This application is being build with Microsoft Silverlight and in thinking about how to make step 2 above as smooth and responsive as possible, I thought "Hey, what about i4o?"

A quick Google for "i4o Silverlight" and without digging too far I basically found nobody had tried it, or at least tried and told about i4o in Silverlight. So I thought I'd give a go...

 

Step 1: Go get the code from from codeplex/i4o. I downloaded the latest source bits.

Step 2: After extracting the .zip...Open up the project i4o.sln

Step 3: Add a new Silverlight class project

image

Step 4: Now use the "Add existing item" option to add files to the Silverlight project. Browse to the i4o project folder to select the 3 i4o class files. You can add those to you Silverlight project as is or use the "Add as Link" feature to share the code across the platforms.

image 

Step 5: once you've add the files to the Silverlight project final step is to "BUILD" the solution...

What's that you say? "That's all there was to it?" YA, my thoughts exactly, the project just compiled on the first try...

 

DISCLAIMER: I haven't tried to use it yet... It's late and this idea popped into my head distracting me from getting to sleep. So I tried it and am quite satisfied for now.

 

Up NEXT: Don't know if I'll do this sooner than later, if at all, but figure out how hard it would be to port the existing unit tests to see how they run in Silverlight...