Detailed Methodology

Note: this page describes the "given name" collection, though the surnames and place names are handled in a very similar fashion.

On this page I'd like to explain some of the inner-workings of this site. I doubt it will interest anyone, but at the very least it will help keep things straight in my mind!

  1. Data collections
    1. Names
    2. Popularities
    3. Namesakes
    4. Name days
  2. Methodology
    1. Adding new names
    2. Categorizing names

A.i) Names

The name database is the backbone of this website. Below is a description of the main data fields associated with the names.

The name itself is stored in three different ways. First, it is stored in a flat form, with accents, punctuation, and subscripts removed. This is useful for matching user-entered searches. Second, it is stored in an encoded form, with accents, punctuation and subscripts translated. This results in a unique key. Last, it is stored in a display form, with accents, punctuation and subscripts included. This is the form that is displayed in the name's definition.

Examples of how some different names are stored

When a name is not natively written in Latin characters, the name is stored under its transcribed spelling. A good transcription is:

Variant transcriptions can also be included in the database.

The gender can either be m, f, mf, fm.

The pronunciation must use symbols found in the pronunciation table. Syllables must be separated by dashes. Each syllable must have one and only one vowel symbol. One and only one syllable must be written in CAPITALS to indicate that it is stressed. This should be changed in the future - some languages have equal stress on all syllables. This system is also inadequate for reresenting the sounds of tonal languages such as Chinese.

Pronunciation table

The usage is a list of cultures and/or categories to which the name belongs. A name can be assigned any number of usages, they are stored with the name as a coded list. All names should have at least one usage, though there are a small number of problem names in the database which have none. Besides providing information, these usages also determine how the name will be categorized in the database. Optionally, the usages can be marked as (Modern), (Archaic), or (Rare). They can also be marked with one style: (Anglicized), (Anglicized), or (Latinized).

Usage table

The transcription is a spelling of the name in non-Latin scripts. To implement transcriptions I use a rather complicated system of codes. The codes are sequences of letters which can be typed on a regular keyboard. This saves me from having to type the unicode characters and probably prevents some errors. There are plently of exceptions in the system, which made for some interesting programming.

Transliteration tables

The description is a block of text about the name's etymology and history. It can also give famous bearers and more detailed information about how the name is used. In name definitions I only allow myself two html tags: <i> and <p>. All other formatting is created automatically. The definition can also include transcriptions, links to other names, links to the glossary, and images.

A.ii) Popularities

no information yet

A.iii) Namesakes

no information yet

A.iv) Name Days

no information yet

B.i) Adding New Names

The scope of this site is broad. All given names (whether given at birth, given later in life, or self-adopted) from all cultures are eligible to be listed in the database. Beyond this, notable names from fiction, mythology, and popular culture are also eligible. Invented names which have not been used by a real person or a notable fictional person are ineligible.

Despite this inclusive scope, a top-down approach is taken to adding new names. This means that more notable names are added before less notable ones, lest the database become too unbalanced (it is certainly already unbalanced in favour of English names).

B.ii) Categorizing Names

Names are categorized in two ways: by gender and by usage.

Assigning usages is not overly scientific. A usage may have been assigned for any of the following reasons: