I was working on something like this recently (for entity resolution
in records of free software projects). It's a surprisingly hard
problem, particularly if you want to deal with variation of names
(e.g. match "Howison, J", "J Howison", "J L Howison" and "James Linton
Howison" but not J K Howison).
I found this paper quite helpful (and the author was happy to share
his perl code)
D. G. Feitelson, “On identifying name equivalences in digital
libraries”. Information Research 9(4) paper 192, Jul 2004.
http://InformationR.net/ir/9-4/paper192.html
The 'typos' aspect of the matching is easier, the usual algorithm is
the Levenshtein distance:
http://en.wikipedia.org/wiki/Approximate_string_matching
It's implemented in many languages; I'm not sure of a GUI-fied
version, but perhaps the keywords will help you. Perhaps DDupe might
help, especially if you are working with network data, although I
haven't used it.
http://www.cs.umd.edu/projects/linqs/ddupe/
Let us know if you find a better tool.
--J
On Oct 7, 2009, at 23:32, Andrew Von Nordenflycht wrote:
> I have several large datasets containing names of companies and
> individual
> people. The companies or people can and do appear multiple times
> (e.g., in
> different years) and I want to link all instances of the same name.
> This
> is easy when the match is exact.
>
> However, for a variety of reasons, such as typos or 'nicknames',
> there are
> also many "close" matches - where the text does not match exactly
> but is
> very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John
> Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").
>
> My goal is to identify these close matches in a systematic way without
> manually going over the data. I presume the main function of such a
> program or algorithm would be to identify "all but 1 character"
> matches,
> and then "all but 2 character matches", etc. Preferably the program
> would
> suggest close matches and let me decide if they are matched.
>
> Any ideas on useful software for this task would be appreciated.
>
>
>
> Andrew von Nordenflycht
>
> Assistant Professor, Strategy
>
> Simon Fraser University
>
>
vonetc@sfu.ca
>
>
>
>
>
> View my research on my SSRN Author page:
> <http://ssrn.com/author=100363>
http://ssrn.com/author=100363
>
>
>