Showing posts with label Software - Outwit Hub. Show all posts
Showing posts with label Software - Outwit Hub. Show all posts

Wednesday, 10 April 2013

How I use Outwit Hub for the TILLIN One Name Study

Following my post on how I conduct my One Name Study I received a tweet from @genejean aka Pam Smith asking how I used Outwit Hub and following a couple more tweets this post was born. (Apologies for the poor quality photos but I'm not a great blogger expert yet!)

If I want to scrape some data I go to Outwit and open the page from within Outwit itself. I like to think of Outwit as a geeky internet browser which allows you to see what's behind the page as well as what you can normally see. For this example I'm going to search for all instances WILLIAM TILLIN in the 1841 England census to keep it small - 3 records.



Normally if I wanted to collect the information for each of these records I would go into each individual record as below and type the information from the screen into my database - or at best copy and paste the information using something like a Firefox addin.



But Outwit works slightly differently.

This is the page of data that I want to "scrape" and this is the page that my scraper will go to and collect the data. I've written the scraper myself based on the html behind this page but I won't go into too much detail here about how I did that as it will make this a really long post. I can do more of an explanation in another post if that would help anybody.

So using the back button in the top left hand corner (just like other web browsers) I get back to my original search results. Then I click on the links button (circled in red below) and this gives me a completely different view of the page.


Then I select the rows labelled View Record. These are going to act as an address for the scraper so it knows where to go and find the data. Once it gets there it will apply the rules within the scraper to the data it finds and then return that data to the data catch area.


So I highlight the rows and tell it to go and explore using my 1841 scraper (making sure I don't overutilize the resources of the site by exploring too many records too quickly) and this is the data it brings back in about 10-15 seconds
.

You can see the detailed information at the bottom based on the 3 records from the original search. The data can then be exported as a csv or excel file and added to the database.

I hope this makes sense - I've found it exceptionally useful for census data as well as civil registration data. I would not have all the data I have if I'd had to collate the information in the traditional way.

I'd be happy to provide some more examples or go into more detail - so please leave any questions in the comments below or ask me on twitter - you can find me @Wibblingjo - or on my google+ page +Wibbling Jo Genealogy



Tuesday, 9 April 2013

Tech Tuesday - How I collect and analyse data for my One Name Study

I'm always interested in how people actually "do" their One Name Study on a practical level. My study is still in it's infancy so I've had quite a haphazard approach.

Recently I sat down and decided on a more structured plan of attack.

I knew that I wanted to collect some data, store the data somewhere and then start to group the individuals named in the data into families. Ultimately the goal would be to work out if everyone is related but that's a long way in the future.

As my first mission I decided that I wanted to collect all references to TILL?N and TILLING from the England Censuses 1841-1911 and the England and Wales Births, Marriages and Deaths Indices. I used Ancestry as the source as this is the website that I currently have a subscription to - ultimately I think I'd like to cross-reference with other sources (e.g. FindMyPast) to try and negate any transcribing errors.

To collect my data I used Outwit Hub. This scrapes the data from the internet and turns it into a csv or excel file.

I uploaded the output files into Custodian, which is database software written specifically for family history. This generates a Name Index where the references are consolidated allowing me to see all links to a particular name.

I already have some information in Family Historian so I've begun the process of going through each TILL?N and TILLING individual and allocating a personal and family reference to each record I now hold in Custodian. This is slow going but it's already shown me where I've missed people and is generating a new research list.

Once I've gone through the individuals I've already got in Family Historian I'll start to try and group the other individuals recorded in Custodian.

Already I've spotted where I've got unallocated records.


The picture above shows the name index filtered for everyone in my records with a forename starting Charles and the surname recorded as Tillin. From here i can start to group records together and spot records I've not allocated to people.

I'm looking forward to starting to fill in some gaps even though this will be a long process.

How do you conduct your One Name Study? Any tips? Do you use any of these programs?