Ponto Help

Ponto is a tool intended for scoring the spellings of young learners.

Trials Data Format

A typical way to use the tool is to first prepare one’s data in a spreadsheet program, copy and paste it into the Ponto tool, select the type of analysis one wants, and click the Score! button. A server on the Web will do the scoring then return a new table like the one you submitted, but with additional columns scoring each of the spellings. It also returns a table that summarizes the statistics for each speller. These tables can be selected and pasted into a spreadsheet.

Ponto expects the table of data to be arranged by trials. That is, each row except for the column headers contains a stimulus (the word presented to the subject) and a response (the subject’s attempted spelling). It typically will also include a cell identifying the subject. Thus this is a typical table format:

subjectstimulusresponse
1haʊshous
1fɑrmfarm
2haʊshouse
2fɑrmfarm
3haʊshowse
3fɑrmfrm

The three column headers must be included exactly as show here, always in lowercase. The columns can be in any order. You may include other columns in your data; they will be ignored by the program, but echoed back to you in the results. These rules provide for some flexibility, but also a little trap for the unwary: if you label the subject column something other than subject, perhaps sujet, everything will look fine except that the summary table will not break things down by subject: It will just have a single row labelled ALL.

Except for the requirement that the first row must be column headers, the subsequent rows can come in any order. Thus one may choose to arrange the above information like this:

subjectstimulusresponse
1haʊshous
2haʊshouse
3haʊshowse
1fɑrmfarm
2fɑrmfarm
3fɑrmfrm

and still get a summary by subject.

You may omit a response, in which case it will be treated as a response with no letters – probably getting a big error score. However, omitting the stimulus cell in a row simply means there was no trial; Ponto will just return --- as the score, and will not include a big error score in the summary statistics for the subject. You can also achieve the same effect by typing --- (3 dashes) as the stimulus or the response.

Perhaps counterintuitively, it is possible to have two or more stimuli for a given trial. Multiple stimuli are separated by the vertical bar character |. For example, a possible entry might be:

subjectstimulusresponse
1tri|tʃricre

One way to think of multiple stimuli is that the entry in the stimulus column is an indirect representation of the actual stimulus, which the participant may interpret in different ways. For example, the experimenter’s pronunciation is a sequence of phones, such as [t̠ʰɹi], but our unit of analysis is phonemes. Some speakers may interpret this as /tri/, others as /tʃri/. Another reason to use multiple stimuli is because different experimenters may pronounce a word slightly differently, or the participants may reinterpret a stimulus in terms of their own pronunciation.

All the examples so far have shown the stimulus as a pronunciation transcription. In other analyses, it may make sense to have the stimulus be a target spelling. For example:

subjectstimulusresponse
1treecre

Of course it is unlikely that the actual stimulus in a spelling experiment will be a display of the correct answer. But it may make sense to represent the stimulus as the orthographic spelling in Ponto if your model of analysis is that the participant is reaching for a particular spelling, and you wish to score her response in terms of how different it is from that target.

Your table of trial data is entered into the text box labelled Trials, in any of several ways. Note that a possible header field is already entered for you, but in most cases you will want to delete the contents of the box before you enter your own data (e.g., click in the box, Ctrl-a, Delete).

  1. You may type directly into the text box. In such a case, the least error-prone way is to separate columns by putting a comma between each cell in the row:
    subject,stimulus,response
    1,haʊs,hous
    1,fɑrm,farm
    2,haʊs,house
    2,fɑrm,farm
    3,haʊs,howse
    3,fɑrm,frm
    
    Instead of a comma you can use a tab or a space as a separator, but those have the disadvantage that it is very hard to tell when you have two or more of them in a sequence. Recall that a cell can be empty, so sometimes separators will be adjacent to each other. Tabs have the additional disadvantage that most browsers are set up so that hitting the Tab key is a special shortcut for taking you out of the Trials box and into the Correspondences box. Whichever separator you use, you must use the same one throughout the whole table.
  2. You may also copy and paste your text from a plaintext editor. For real data that will be much more convenient than typing afresh each time you use Ponto. Commas are still recommended for plaintext editing. Transfer the data into Ponto by selecting the table, copying (Ctrl-c), clicking the mouse in the Trials box, and pasting (Ctrl-v).
  3. A document editor such as Open Office Writer or Microsoft Word provides structured table objects that can typically be selected, copied, and pasted into the Trials box. Typically, this will result in columns that are separated by tabs, which is fine for Ponto. One thing to keep in mind is that document editors provide all sorts of special formatting, which will not be carried over when you copy and paste into the form. There is no point in trying to make your headers bold, or your spellings italic, although there is no harm in doing so. There is also no point in trying to distinguish between fonts.
  4. For large projects, you are likely to use a spreadsheet, such as Open Office Calc, or Microsoft Excel. These too can be copied and pasted from, typically resulting in a tab-separated table.

Spaces are ignored at the beginning or end of a cell.

If you wish, you may include quotation marks around your items, e.g., 'hous' or "hous". This is provided mostly as a convenience for those who save CSV files from spreadsheets that add quotation marks automatically. But you may also like the idea of representing empty content, such as null responses, as "" instead of just leaving invisible space. Ponto does not do anything with quotation marks but ignore them. In particular, if you use a comma or space as a separator character, Ponto will not treat them as data just because they appear inside quotes.

The Ponto tool operates with Unicode UTF-8 encoding. If you are running a good, standard, modern computer, everything should work fine and you won’t need to wonder what UTF-8 means. If the characters change to some other character when you copy and paste between programs, then it is likely that one of your programs does not understand Unicode, or that you are using fonts in some nonstandard way. In particular, some old schemes for typing the International Phonetic Alphabet involved using special fonts that might, for example, display the shape ʃ when they see the code for ordinary S. If you use such a scheme, pasting your text into the Ponto tool will result in your seeing the letter for what it really is, an S.

Correspondences

Ponto uses a simple model of spelling: it is context-free. For example, it can be told that /s/ is spelt <s>, <c>, or <ss>, but it cannot be told to accept <ss> only after vowel letters.

As discussed a little later below, there are several predefined correspondence sets you can use. But if you wish to define your own, or to make modifications to existing schemes, you can enter the rules into the Correspondences text box. Some examples:

stimulusresponsepenalty
ss0
sc0
sss0
scc0.2
ə0.5
i0
-0.1

This first three rows of this incomplete table illustrates how to enter the rule just discussed. Each row gives one way of spelling /s/. The heading stimulus actually means ‘unit of correspondence in the entries in the stimulus column of the trials’, and response refers to the units of correspondence in the response column of the trials. Thus if your stimuli are pronunciations, the units in the stimulus column here will be sounds; if your stimuli are represented in terms of target spellings, the stimulus column here will give letters.

The penalty cell tells how big an error it would be to use the spelling in the response column to spell the unit in the stimulus column. The intention is that a correct spelling should receive the penalty 0, and that errors would get positive numbers. The table above illustrates that <s>, <c>, or <ss> are all considered perfectly fine spellings for /s/, but that the spelling <cc>, though close and credible, should be counted slightly incorrect. But the only firm requirement is that the entries be numbers: all digits, with possibly one decimal point. Ponto will score subjects’ spellings by adding up the penalty scores for all the correspondence rules needed to align the stimulus with the spelling. It is perfectly normal to specify only correct correspondences, and thus to give all of the correspondence rules the penalty 0, and let Ponto assign default penalties (see below) for alignments that are not specified explicitly.

It is possible to leave out the response unit from a row, in which case the correspondence tells how big a penalty to assign if the subject does not spell the stimulus unit. Most users will prefer to just use deletion box below the Correspondences box to indicate a uniform penalty to assign to deletions, but here one may enter specific penalties for specific deletions. The table above gives the example of assigning a penalty to not spelling a schwa; one might reasonably want to count its omission as less important than the omission of other sounds. Or, if the stimuli are expressed as target spellings, one might imagine wanting to give a smaller penalty to leaving out nonalphabetic characters such as hyphens.

Conversely, it is possible to omit a stimulus unit and give only a response unit. That tells the penalty for inserting letters into the spelling which do not spell anything in the input. A uniform penalty for insertions is given in the insertion box, but these entries override that box. The table above gives an example of assigning only a tiny penalty for putting an illegal hyphen in a word.

Unlike the Trials table, there is no role for the | separator: You cannot, for example, specify multiple stimuli units or multiple response units in the same row. Just enter multiple rows, even if some of them have the same stimulus or the same response as another row.

The rules for formatting the Correspondences table are much the same as for Trials. Columns can be separated by comma, tab, or space.

Sometimes it is best to use somebody else’s set of correspondences, not only to save time but also to ensure some standardization across studies. You can, of course, achieve this by exchanging files containing the correspondences and copying and pasting their contents. Ponto also has several correspondences built in, which you can use simply by selecting a pulldown option. These are listed underneath the Correspondences text box. Hover your mouse over their names to get a very brief description as a tool tip; click on their names to select them. Currently available sets are:

  1. Identity. Under this scheme, every single character is treated as an acceptable (penalty 0) response for itself. This is mostly useful for direct comparison of the child’s spelling to the target spelling, where it constitutes a good, generic, gradient scoring method that can work for all languages.
  2. Čestina non-exhaustive for Czech.
  3. AMPR GA: the orignal AMPR scheme, for rhotic English.
  4. AMPR GA SAMPA: very similar to AMPR GA, but stimulus is to be written in SAMPA.
  5. AMPR RP: very similar to AMPR GA, but for nonrhotic English.
  6. Grade 1 Monosyllables: exhaustive parse based on US Grade1 monosyllables.
  7. Heather: Non-exhaustive parse based on US K–12 words.
  8. Standard Northern English pronunciation, exhaustive list based on English Grade 1 words from CPWD.
  9. Non-exhaustive RP: Non-exhaustive parse for non-rhotic English.
  10. Español Non-exhaustive for Spanish.
  11. Français Non-exhaustive for French.
  12. Tatiana for General Brazilian Portuguese.
  13. Slovenčina Non-exhaustive for Slovak.

Some issues to keep in mind when using correspondence sets or defining your own:

  1. Many correspondence sets are designed for non-exhaustive parses. That means that they will often just look for the main letters that represent sounds, and ignore auxiliary letters and silent letters. For example, in scoring the English word though /ðo/, a typical non-exhaustive parse might give full credit for <t> and <o>, and not even ask whether <t> is followed by <h> or whether <o> is followed by <ugh>. Therefore, for such schemes, the Insertion Penalty should be set to 0, because one does not want to punish the speller for writing <h> after <t>. Of course, that also means that the speller will not be punished for other, less sensible insertions.
  2. You will normally want to represent correspondences at a fairly fine level, leveraging the generality of alphabetic writing systems. For example, while one could give a separate entry for every syllable in a language, it is more efficient and satisfying to give one for every phoneme.
  3. Nevertheless even alphabetic writing systems often deviate from the one-to-one ideal. One should not hesitate to enter digraphs or trigraphs as spelling units. Somewhat less commonly, there will be cases where a spelling unit inexorably corresponds to a sequence of phonems, such as /ks/=<x> in Latin.
  4. When you define such a correspondence with multiple letters or phones, Ponto infers that those constitute a potential spelling unit, and may use them as units when making matches not specified in the table. Imagine you have used <ch> in a correspondence (such as /tʃ/=<ch>), and that a child spells milk as “milch”. Since Ponto now realizes that <ch> is a spelling unit, it will potentially treat the <ch> here as a single substitution for the <k>. If <ch> were not a unit, it would have to do something like treat the <c> as a subsitution for the expected <k> and treat the <h> as an extraneous insertion.
  5. Ponto pays attention to every detail of the spelling and pronunciation, sometimes more so than you may want. In treats uppercase letters differently from lowercase ones. So if you actually want Ponto to take off points for using case incorrectly, you will probably need to have separate correspondences for uppercase and lowercase, with different penalties. Perhaps more typically you will want to ignore case. You can do that by having separate correspondences for uppercase and lowercase, but giving them the same penalty. A somewhat simpler solution is to write all your correspondences in the same case (lowercase is our official recommendation) and then make sure all your Trials give the data in that case.
  6. Diacritics or accent marks are treated by Ponto as separate characters. E.g., <é> is <e> followed by a <´>. In a typical configuration, spelling été as ete would therefore be treated as two deletions. In many experiments, you may simply want to ignore diacritics, which can be done by removing them from your data before pasting them into the Trials box.
  7. The Require correct sequence checkbox, described later, has a big influence on what the correspondence set needs to include. If that box is checked, then Ponto tries to do an exhaustive alignment of stimulus and response. That means that you will need to consider issues such as silent letters. One may think to give a rule such as “,e,0”, that is, assigning no penalty when the subject adds an <e> that does not correspond to anything in the pronunciation. Note, however, that while that will work fine for a child who spells /mek/ as “make”, it will also fail to take off for spelling /kæt/ as “eeceaet”. There is no perfect solution to this problem in a context-free rule set. One intermediate solution is to treat silent letters as parts of digraphs, entering rules such as “k,ke,0”, i.e., /k/ can be spelt <ke>. Of course, that increases the size of the rule set and still allows many unreasonable spellings such as “keat” for /kæt/.

Ponto will use all correspondences in all the named sets you check as well as those you include in the Correspondences box. If a row in the Correspondences box has the same stimulus unit and the same response unit as a row in one of the named sets, the penalty specified in the Correspondences box prevails. This is one way of making small alterations to a named set.

The checkboxes under Score correspondences not found in above tables specify how Ponto will score correspondences not explicitly specified elsewhere. They start up with some useful defaults. Insertion tells how much of a penalty to give when the child inserts a spelling unit that doesn’t correspond to any unit of the stimulus; the default is 1. Deletion is the opposite: the penalty for not spelling a unit of the stimulus; its default is also 1. Substitution is the penalty for spelling a stimulus unit with a response unit other than those specified in the correspondence sets. Its default is 1.4, because of Pythagoras.

Scoring

The checkbox Require correct sequence switches between two rather different ways of scoring.

When it is not checked, Ponto performs a very lenient scoring. It looks at each unit in the stimulus, then tries to find in the response the best representation of that unit; that is, the correspondence with the lowest penalty. It does not care about order, or even about reusing a spelling that was matched with a prior stimulus unit. Any extraneous letters are ignored, i.e., given a penalty of 0, regardless of the setting of the Deletion checkbox. For example, if the correspondences include /æ/=<a>, /k/=<c> and /t/=<t>, then “attacks” would get an error score of 0. This is clearly a very lenient scoring method. Some correspondence sets are based on the assumption that this box will be unchecked.

When the box is checked, Ponto attempts a strict sequential match. It tries every possible alignment of stimulus units with response units, then selects the alignment that gives the best score (lowest cumulative penalty). It does not allow alignment lines to cross: spellings out of sequential order are considered to be insertions. Thus “attacks” for /kæt/ would likely get credit for the initial <a> and one of the two following <t>s, but not for the following <c> or <k>, because those come out of sequence. Thus the overall error score would be fairly high, with the “acks” tail all being considered insertions. Of course, the precise details depend on the correspond set. This sort of alignment is essentially a Levenshtein metric, except that it is not symmetrical: switching the stimulus and the response will usually give a different score.

The checkbox Run Monte Carlo significance test asks for additional information to be returned with the results, telling how much better each subject did than would be expected by chance. Chance levels are interpreted as the score a speller would get if they were just generating spellings without any regard for the the rules. This is determined by randomly rearranging the pairing between the stimuli and the response 1,000 times, each time measuring the total error rate. If the child was actually using at least some of the rules some of the time, and paying attention to what word he was supposed to be spelling, his actual total score (distSum, for sum of distances) should be less than the average total scores across those 1,000 random rearrangements (distRand). This number will be reported as improve, the proportional improvement over chance levels. Additionally, Ponto will compute a p value: the proportion of the rearrangements whose score was better than the observed distSum. This may be interpreted as the probability that the null hypothesis is correct: namely, that any improvement over chance was in fact coincidental. A low p value means that the child is probably paying attention (perhaps implicitly) to the correspondences and the stimuli. Naturally, it takes much longer for Ponto to return results if the Monte Carlo option is checked, so that box should be left unchecked when this information is not needed.

The button labelled Score! sends the information to the server, performs the analysis, and returns the results.

Results

Results are returned as a new page that replaces your original query page, although you should be able to use the browser’s Go Back command to get your original query page back. The results page begins with a Parameters section that basically repeats the options you specified on the search page; the idea is that if you save this page as a file, you will have enough context to understand the search options that can affect the results. In the Correspondences subsection, all correspondences are listed, including the ones you had brought in from named correspondence sets. Though it can be tedious to scroll past this information, occasionally it can be vitally important.

The meat of the results page is the next two sections. The Scores section basically repeats the data you submitted in the Trials section, but adds two new columns at the end of the table: distance and align. The distance is the error: how far away the child’s spelling is from correct. This is the sum of all penalties occurred when attempting to align the stimulus with the response: explicit correspondence penalties as well as default penalties for alignments not explicitally licensed. There are often multiple ways to align the stimulus with the response; Ponto always picks the best alignment, the one that gives the child the most credit, that is, the lowest distance score. The align column shows the alignment that gives that lowest score; if more than one alignment has that same score, one of those alignments is selected arbitrarily. An example of the format is [c=p*][a=a][=s*][t=t][=a*], which is a plausible alignment when one spells “pasta” for cat. Each correspondence is enclosed in square brackets, with the stimulus unit preceding the response unit, separated by an = sign. For an insertion, no stimulus unit appears; for a deletion, no response unit appears. Whenever an alignment has a penalty greater than 0, it is followed by an asterisk. The alignments give a wealth of information that can be examined to do detailed error analyses. But even when such detailed analysis is not required, it is a good idea to examine a representative sample of alignments, in order to fully understand how the scoring process works under a given correspondence set.

The Summaries section aggregates the information. If the Trials data contains a subject column, then the aggregation is by subject, giving one row of information for each subject; if not, the section just contains a single row for the entire run. In all cases, the Summaries table contains a distSum column, which sums all the distance scores for the subject; nTrials, which tells how many trials the subject had; and distMean, which divides distance by nTrials to give the subject’s average error score. If Monte Carlo tests were asked for, there are 3 additional columns: distRand, improve, and pRand, as described earlier: The distRand column is the average score after random rearrangement; improve is the proportional improvement of distSum over distRand; and pRand is the p value of the hypothesis that improve is due to chance.

The Results tables are returned as preformatted HTML text, not as HTML tables. This means that they may not be lined up very nicely, but the browser will display them fast and will not be put off by tables that contain hundreds or thousands of rows. If you wish, you can select a table and paste it into your favourite software, such as a spreadsheet. If your first attempt does not give favourable results, you may find that there is a command such as paste special that lets you get fine-grained control over how the data is entered. The browser presents the tables as tab-separated tables of Unicode text.

Please keep in mind that the Web is not inherently secure: eavesdroppers can potentially intercept and read your Trials data. Subject-identifying information should be omitted. If this is not possible, Ponto should be run only behind a firewall on a local network, or even on one’s own computer; or it should be run under HTTPS.


Last change 2009-11-21T00:49:05-0600