Developing on Staxmanade

i4o: IndexSpecification<T> for the IndexableCollection<T>

We've removed the IndexableAttribute from the i4o library and replaced it with the IndexSpecification<T>. Below I'll explain how you can add/remove/change the index for an IndexableCollection<T>.

Let's show several examples on how to create an IndexableCollection<T>.

Given any enumeration of objects you can translate into an IndexableCollection<T>. For the examples below we're going to use an enumeration of the System.IO.FileInfo class. We are going to index the list of FileInfos by file Extension and weather the file IsReadOnly or not.

Setup the list:

// Get our thing to index
string dir = @"C:\Windows\System32\";
var fileInfosFromDir = (from f in System.IO.Directory.GetFiles(dir, "*.*", SearchOption.AllDirectories)
select new FileInfo(f)).ToList();


Can create the IndexSpecification...

...for the FileInfo's "Extension" and "IsReadOnly" properties and turn the list of FileInfos into an IndexableCollection using the IndexSpecification.

// Create the index specification
var spec = new IndexSpecification<FileInfo>()
.Add(i => i.Extension)
.Add(i => i.IsReadOnly);

// Turn the list of files into an Indexed collection of files
var indexedFileInfosFromDir = fileInfosFromDir.ToIndexableCollection(spec);

Create IndexableCollection<T> without IndexSpecification<T>:

You are not required to specify and IndexSpecification<T> when creating the IndexableCollection<T>. You can translate the list into an IndexableCollection<T> and add properties to index after the fact. EX:

var indexedFileInfosFromDir = fileInfosFromDir.ToIndexableCollection();

// Specify the properties to index dynamically (more late bound)
indexedFileInfosFromDir
.CreateIndexFor(i => i.Extension)
.CreateIndexFor(i => i.IsReadOnly);

Swap one IndexSpecification<T> for another:

If you want to completely swap out the index at run time, you can give the IndexableCollection a new IndexSpecification

var list = new List<FileInfo>();
var indexedList = list.ToIndexableCollection();

indexedList.UseIndexSpecification(
new IndexSpecification<FileInfo>()
.Add(o => o.Directory)
.Add(o => o.Name));

I think that should cover most of the general cases. Hope this helps...

Comments

Alex
Nice Blog, i recently come to your blog through Google excellent knowledge keep on posting you guys.

___________________
Dissertation Sample
___________________
marry
Blogs are so informative where we get lots of information on any topic. Nice job keep it up!!
_____________________________

Dissertation Topics
Satyanarayana Muddu
Hi Jason,

Thank you for your response.
My Xml looks like

University1
Course1
Student1
Student2
University1
Course2
Student3
Student4
University2
Course1
Student5
etc...

XElement cimXml = XElement.Load(@"C:\Students.xml");

So, here I am grouping on Universities.
var universityGroupedElements = from ele in cimXml.Elements() group ele by ele.Name;

Now I want to add indexes to University, Course, Student elements to make queries faster.

Also, I want to group the courses in each university and also students in each course

Please help, in writing some code.
Jason.Jarrett
Hello @Satyanarayana

I would be happy to help you, but have a request and a couple questions...

1. I've realized my blog comments are starting to become a bit more of the i4o knowledge base than it deserves. Would you please ask your question over on the i4o Discussion board?

2. I'd like a little more detail. What does the xml structure look like? What does the linq query look like that returns your "grouped" data? What is it you are trying to index/search?

Thanks,
Jason
Satyanarayana Muddu
I have a question how to use IndexSpecification for a nested collections in a xml file. For example
University
Course
Student

I have list of Students from a various Courses and various Universities. I used Linq for Grouping Universities and Courses of Students. Now, I have question how to implement Indexing on Students, Courses and Universities.
Please give the implementation details .

Thanks
Satya
Paiwan
At first, I was thinking of modifying i4o but after I have been testing, I have found the power behind i4o with simple changes the way you write query.

For example:

Instead of writing this..
var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt" && fi.IsReadOnly == true);

Try this..
var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt");
var f2 = f.Where(fi.IsReadOnly == true);

(fi.Extension should be smaller group than fi.IsReadOnly)

From what I have tested,
It responses within a few milisecond.
I am happy with this.

Thank you for great work.
Jason.Jarrett
@Paiwan - I'm glad to see you're running your own benchmarks to test out the library.

Please keep in mind that this library is very simple, and is not a complete Linq implementation.

It provides some great benefits in the scenarios it was designed for. And I know Aaron has some more improvements on the way.

If you create any patches for the project, we are happy to take a look at any improvements you can come up with.

Thanks again
Paiwan
I have found another limitation of i4o and would like to share.

From the 'Demoi4o' project.

If I change this below query

var studentsNamedAaronFromConstant =
from student in _testStudents
where student.FirstName == studentNameBox.Text
select student;

To

var studentsNamedAaronFromConstant =
from student in _testStudents
where student.FirstName.Contains(studentNameBox.Text)
select student;


It takes longest time!!
Paiwan
Thanks so much for unit test.
Jason.Jarrett
@Paiwan To test your question I wrote a unit test and just checked it in. You can view the test here http://i4o.codeplex.com/SourceControl/changeset/view/31862#398075

If this is what you were asking, then no it will not update the index.

You need to call the IndexableCollection<T>.Add/Remove and not it's base type Collection<T>.Add/Remove for the indexes to be updated.

This is however something we'd like to support in the future, which is why I checked in the failing test. We probably need to just implement ICollection<T> etc... and to fully support this.
Paiwan
Hi Jason,

I have a question.
If we have some changes on our base collection like add or delete, do we have to re create index or do some special steps?

Thanks,
Paiwan
Kevin
Ok, thanks for the confirmation. I wanted to make sure I wasn't crazy. :-)

I was excited when I saw your post and really like the syntax for adding properties to the IndexSpecification. Now, I guess I'll just wait until I can actually use it like that. From Aaron's blog, it sounds like he plans to introduce updates to the Where expression in the next release.

Thanks for your work on this library. I'm hoping this will make it possible to replace a subsytem of our app which currently relies on looking up factors in large in-memory sets of data by using XML. It loads large XML documents into memory and builds XPath queries to get to individual factors that it needs. The XPath queries themselves are pretty past, but I'm trying to use Linq-to-XML to project the XML into collections of strongly type objects that I can query with Where lambda's instead. It works well, but when the collections are large enough the queries are too much slower than the XPath version. I'm hoping the IndexableCollection will make it possible.

Kevin
Jason.Jarrett
That's a good point and I think you are correct...

I was just illustrating the power of the IndexSpecification. Unfortunately the extension methods used to evaluate the expression need some work.
Kevin
Hi Jason,

Thanks for this post. I got to it from the i4o home page on Codeplex. I noticed in the discussions that someone mentioned the fact that if you have multiple expressions in your Where expression, that it won't use the index.

So does that mean, your example showing multiple properties being added to the index doesn't really help if you wanted to search your collection on both of the properties in the index?

For example, would this code use the index?

var f = indexedFileInfosFromDir.Where(fi => fi.Extension == "txt" && fi.IsReadOnly == true);

From my testing, it appears that no benefit is achieved from using an IndexableCollection vs. a regular List in this case. Seems like there's no point in using the IndexableCollection if you need more than one property in the index. Am I missing something or is that correct?

Thanks,
Kevin