The tedious truth is that most IdM projects must begin with a phase of data matching and cleaning. Before you can start to automate management of identities, you need a predictable data set around which to base your rules.
Really, it’s all about joins. Once existing accounts are joined to their correct data source (such as matching an HR record to an AD account) you can begin to flow updates. Once you have a clear idea who does and does not have an account in a target directory, you can begin to make provisioning and deprovisioning decisions. But be careful – your joins must be reliable! The last thing you want is to kill the credibility of your fledgling IdM project by updating someone’s account with another person’s details or, even worse, deleting their account because you thought they’d left!
When you first start data matching, you will use a lot of different join rules, and you will make joins manually and with CSV files (more on that below). But keep this in mind:
Any join made manually, or with extra effort, should be considered temporary.
There are various situations in ILM which can only be reliably rectified by a clear-out and re-import of a connector space. Always plan for this. Ideally, when you re-import a connector space, you will have a single, direct join rule which effortlessly re-joins all your objects.
And to achieve this we use...
Once the join is verified you should export a uniquely identifying attribute, such as an employee number, to the target directory. After that a simple “employeeID = employeeID” type join rule is all you will need.
Sometimes you are faced with an import-only system. For either technical or political reasons you are not able to export the breadcrumb attribute. There are a couple of things you can do:
There are some key points to note about Join Rules:
At the bottom of the Join Rule configuration you will see a check box “Use rules extension to resolve”. Here you can link to code you write under the ResolveJoinSearch subroutine in the MA extension. The Resolve rule is used when ILM finds multiple possible matches in the Metvserve (including already-joined objects). All the possible matches into the collection rgmventry and your job, in the code, is to check through them, looking for the best possible match. If your code finds an ideal match you return the index number of the object in imventry, and set the value of ResolveJoinSearch to true. The following example only joins if a single, unjoined Metaverse object was found.
Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch Select Case joinCriteriaName Case "Resolve_NotYetJoined" If rgmventry.Length = 1 AndAlso _ rgmventry(0).ConnectedMAs("AD").Connectors.Count = 0 Then imventry = 0 Return True Else Return False End If End Select End Function
A simple join rule directly matches a connector space attribute to a Metaverse attribute:
With an Advanced Join Rule you construct a list of possible values with which to find an exact match in the Metaverse:
Sometimes you can use code rules to make your list of possible matches (eg., presenting a phone number in different formats); other times you have to use long look-up lists of possible variations (try genealogy websites for name-variation lists). The following example uses a lookup file of aliases, where each line has the possible variations on a name. If a match to the first name is found on the connector space object, the whole line is added to the possible values to search for in the Metaverse.
Elizabeth,Liz,Beth,Betty David,Dave,Davey Jerome,Jérôme ...
Public Class MAExtensionObject Implements IMASynchronization Dim fileAliases As System.IO.StreamReader Dim arrAliases As String() Dim i As Integer Public Sub Initialize() Implements IMASynchronization.Initialize fileAliases = New System.IO.StreamReader("C:\aliases.txt", System.Text.Encoding.Default) i = 0 While Not fileAliases.EndOfStream ReDim Preserve arrAliases(i) arrAliases(i) = fileAliases.ReadLine i = i + 1 End While fileAliases.Close() End Sub Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_aliases" Dim aliasList As String Dim value As String For Each aliasList In arrAliases If aliasList.Contains(csentry("givenName").Value) Then For Each value In aliasList.Split(",".ToCharArray) values.Add(value) Next End If Next End Select End Sub End Class
A lot of the difficult account matching will probably be done outside ILM. It is therefore useful to be able to “export” lists of possible matches, and later, “import” the joins from a CSV file. Be careful when writing to files from extension code – the DLL doesn’t unload for five minutes after the MA run completes, which means you may have to wait for it to finish writing to the file. If you’re in a hurry, recompiling the code will force the DLL to finish writing to the file.
You can use the Resolve rule to export possible matches to a CSV file. The following example resolves the join rule “sn | Direct | lastname”. As a match on the last name alone is too weak for an immediate join, we just write the possible matches to the text file.
Public Class MAExtensionObject Implements IMASynchronization Dim fileMatches As System.IO.StreamWriter Public Sub Initialize() Implements IMASynchronization.Initialize fileMatches = New System.IO.StreamWriter("C:\possible matches.txt", System.Text.Encoding.Default) End Sub Public Sub Terminate() Implements IMASynchronization.Terminate fileMatches.Close() End Sub Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch Select Case joinCriteriaName Case "Resolve_Lastname" Dim MAName As String = csentry.MA.Name Dim mvobject As MVEntry Dim cFirstname, mFirstname As String If csentry("givenName").IsPresent Then cFirstname = csentry("givenName").StringValue Else cFirstname = "UNKNOWN" End If If mvobject("firstname").IsPresent Then mFirstname = mvobject("firstname").StringValue Else mFirstname = "UNKNOWN" End If For Each mvobject In rgmventry If mvobject.ConnectedMAs(MAName).Connectors.Count = 0 Then fileMatches.WriteLine(csentry("sn").StringValue & ";" _ & cFirstname & ";" _ & mvobject("lastname").StringValue & ";" _ & mFirstname) End If Next Return False End Select End Function End Class
You will need to adapt this code of course. Firstly, you probably want to export a lot more identifying information in your CSV file – department, email address, dn ... whatever helps. Next, it can really help to supplement your possible matches with a probability score. This is where you do a series of tests and add points- the more points, the higher the chance of the match. For example:
*Some tips for testing if names are similar:
You can use an Advanced Join Rule to “import” joins from a CSV file. Firstly, our CSV file must be constructed like this:
CS_identifier;MV_identifier
For example, if we are trying to match an AD account against Metaverse objects imported from the HR system, we populate the CSV with the AD DN and the employeeID:
CN=Fred Bloggs,OU=User,OU=MyOrg,DC=mydomain,DC=com;0012988
Next we create the Advanced Join Rule which will look up the csobject’s DN in the text file, but use the employeeID to search the Metaverse.
Note that you can’t actually use the DN in the join rule – but that’s ok, just use any attribute that definitely exists. Eg.,
sAMAccountName | Rules extension – Join_CSV | employeeID
And now for the code :
Imports Microsoft.MetadirectoryServices Public Class MAExtensionObject Implements IMASynchronization Dim joins As String() Dim i As Integer Public Sub Initialize() Implements IMASynchronization.Initialize Dim fileJoins As System.IO.StreamReader Dim strLine As String 'Open the csv file and read into an array fileJoins = New System.IO.StreamReader("C:\joins.csv", System.Text.Encoding.Default) i = 0 While Not fileJoins.EndOfStream ReDim Preserve joins(i) joins(i) = fileJoins.ReadLine i = i + 1 End While fileJoins.Close() End Sub Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_CSV" 'If the csentry DN is found in the joins array, then 'use the paired employeeID to search the Metaverse. For i = 0 To joins.Length - 1 If joins(i).Contains(csentry.DN.ToString) Then values.Add(joins(i).Split(";")(1)) End If Next End Select End Sub
People will ask you questions like “How many people have you joined in system X but not in Y?”, “How sure are you that the joins are correct?”, “Which department has the most unidentified accounts?” It’s best to be prepared for these sorts of questions.
Once data is in the Metaverse it is a simple matter to access it for reporting, either by exporting it into a reporting table, or by directly querying the underlying tables (as long as you’re careful to do it when ILM is idle, or else use NOLOCK).
So consider this: During the data cleaning phase, import all identifying attributes into the Metaverse from all sources.
For example: You’ve made a join between a user in AD and an HR record. Under normal operations you would consider HR as the master source for the name, and you would only flow it from there. You wouldn’t bother importing the name attributes from AD – in fact you’re more likely to be overwriting them with export flow rules.
However, during the data matching phase, you’re probably not ready to start overwriting attributes, and the information about current values can be very important in your verification and reporting.
Now, if you have a look at the mms_metaverse table in the ILM database, you will see how simple it is to query the progress of your joins, and also to judge on what criteria the joins were made. Some example queries…
/* HR person with no join to AD */ select HR_lastname, HR_firstname, HR_employeeid from mms_metaverse with (nolock) where AD_dn is null
/* HR person with join to AD */ select HR_lastname, HR_firstname, HR_employeeid, AD_lastname,AD_firstname,AD_dn from mms_metaverse with (nolock) where AD_dn is not null
Unfortunately it is not so simple to query the connector space to, for example, report on that state of your disconnected objects, but here are a few ways you can export the data: SQL query - The CS table holds data differently to the Metaverse tables. It is possible to query for disconnectors in this way, however you will only be able to retrieve the CN of objects – which may not be sufficient to identify them.
select cs.rdn from dbo.mms_connectorspace cs with (nolock) join dbo.mms_management_agent ma with (nolock) on cs.ma_id = ma.ma_id left outer join dbo.mms_csmv_link mv with (nolock) on mv.cs_object_id = cs.object_id where ma.ma_name = 'My MA' and mv.mv_object_id is null and cs.connector_state = 0
CSExport - The command-line utility csexport.exe, found in the <ILM program>\bin folder, will allow you to dump connector space objects to an XML file.
Search Connector Space
Report directly from the data source - Once an object in the data source has been correctly identified you will ideally export a unique attribute out to it. It may then be possible to identify the non-joined objects as those which don’t possess this attribute.
Project to a different object type - ILM processes joins before projections, so it is fairly simple to project all non-joined objects to a different Metaverse object type – for example, one called “disconnectors”. This may help in reporting on an overall status direct from the Metaverse tables. Note however that to make these objects once again available for joins they will have to be disconnected from the MVExtension provisioning code.
Sometimes you may need to join to a value in a multi-value attribute. An example is searching through all the proxyAddresses for a match against a single email address.
When the multi-valued attribute exists in the connector space, and the single valued attribute is in the Metaverse, this is very easily accomplished. Just use an Advanced Join Rule to break the multi-valued attribute down into the values list used by the join rule.
Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_proxyAddresses" Dim alias As String For Each alias In csentry("givenName").Values values.Add(alias) Next End Select End Sub
However, when the multi-valued attribute has already been imported into the Metaverse you will have a problem. While you can join on a multi-valued attribute, you have to join on the whole thing. There is no way that you can match one value out of a multi-valued Metaverse attribute against a single-valued connector space attribute.
Some possible options:
All the joining techniques so far have been based around a single connector space object, with one or more possible matches in the Metaverse. But what do you do if you want to work the other way around, where there are multiple possible matches in the connector space for a single Metaverse object, and you want to pick the best one? Unfortunately this is not straight-forward. There is no way to offer a selection of connector space objects in a join rule, as joins always work on a single connector space object at a time. You can judge the merits of the current CS object, but you can’t tell if there’s a better one coming up. One way around this is to do everything with CSV files.
If you have an enormous number of accounts, and different data sources to trawl through, you may be best off doing your data matching outside of ILM, and then just using the CSV join rule above to make the joins.
Check out the Fuzzy Lookup Transformation from the Enterprise version of SQL SSIS for help here.
UNIFYBob edited Revision 6. Comment: Added "with (nolock)" directives where appropriate
Ed Price - MSFT edited Revision 3. Comment: TOC