Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Revised Proposal to Encode Additional Tamil Fractions and Symbols, Study notes of Information Technology

Revisions to a previous proposal for encoding additional Tamil characters, specifically fractions and symbols, in Unicode. changes to character assignments, aliases, and font requests.

What you will learn

  • What is the significance of the Tamil Supplement block mentioned in the document?
  • What are the specific changes made to the previous proposal for encoding Tamil fractions and symbols?
  • Why were some character assignments, aliases, and font requests revised?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

youcangetme
youcangetme 🇬🇧

5

(4)

214 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Revised proposal to encode Tamil fractions and symbols
Shriramana Sharma, jamadagni-at-gmail-dot-com, India
2013-Mar-05
§1. Background
This document replaces my previous document L2/12-231, which proposed 62 characters
for encoding in the Tamil block and a new Tamil Supplement block. Now 55 characters are
proposed, as per the recommendations of the prelim review committee L2/13-028 §17 (p 8)
which were later approved at the UTC meeting last month. The changes are:
1. The following characters with identical glyphic form were proposed to be disunified
based on difference in GC. Since this is found insufficient grounds for the same, they
are now unified:
Glyph Existing character Proposed character
0BAA LETTER PA Lo 11FC8 FRACTION ONE-TWENTIETH No
0BB5 LETTER VA Lo 11FCF FRACTION ONE-QUARTER No
0B99 LETTER NGA Lo 11FDA SIGN KURUNI So
0BA4 LETTER TA Lo 11FDD SIGN TUUNI So
0BB3 LETTER LLA Lo 11FDE SIGN KALAM So
- - 11FD1 FRACTION THREE-QUARTERS No
11FD6 SIGN UZHAKKU So
2. The gaps resulting from the unification of these six pairs are fixed and annotations
are added to the unified characters to explain the additional usage.
3. Since it is found desirable to have more evidence for the proposed 0BFB TAMIL
RINGGIT SIGN, it is withdrawn from this proposal and may be separately proposed.
4. In fixing this gap, TAMIL CURRENT SIGN has been moved to 0BDF and TAMIL TRADITIONAL
NUMBER/CREDIT SIGNs have been moved to 0BFB and 0BFC closer to their counterparts
at 0BFA/0BF7. TAMIL PUNCTUATION END OF TEXT has been moved to 11FFF.
5. The informative aliases for TAMIL SPENT SIGN and TAMIL TOTAL SIGN (now at 0BFE and
0BFF) are slightly changed, and the two glyph changes (originally requested in
L2/12-106 §2; see also L2/12-231 §5.3) are effected.
L2/13-047
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Revised Proposal to Encode Additional Tamil Fractions and Symbols and more Study notes Information Technology in PDF only on Docsity!

Revised proposal to encode Tamil fractions and symbols

Shriramana Sharma, jamadagni-at-gmail-dot-com, India

2013-Mar-

§1. Background

This document replaces my previous document L2/12-231, which proposed 62 characters

for encoding in the Tamil block and a new Tamil Supplement block. Now 55 characters are

proposed, as per the recommendations of the prelim review committee L2/13-028 §17 (p 8)

which were later approved at the UTC meeting last month. The changes are:

1. The following characters with identical glyphic form were proposed to be disunified

based on difference in GC. Since this is found insufficient grounds for the same, they

are now unified:

Glyph Existing character Proposed character

ப 0BAA L ETTER PA Lo 11FC8 F RACTION ONE-TWENTIETH No

வ 0BB5 LETTER VA Lo 11FCF F RACTION ONE-QUARTER No

ங 0B99 LETTER NGA Lo 11FDA SIGN K URUNI So

த 0BA4 LETTER TA Lo 11FDD SIGN TUUNI So

ள 0BB3 LETTER LLA Lo 11FDE S IGN K ALAM So

 - - 11FD1 F RACTION THREE -Q UARTERS No

11FD6 S IGN U ZHAKKU So

2. The gaps resulting from the unification of these six pairs are fixed and annotations

are added to the unified characters to explain the additional usage.

3. Since it is found desirable to have more evidence for the proposed 0BFB T AMIL

RINGGIT SIGN , it is withdrawn from this proposal and may be separately proposed.

4. In fixing this gap, TAMIL CURRENT SIGN has been moved to 0BDF and T AMIL TRADITIONAL

N UMBER /CREDIT SIGN s have been moved to 0BFB and 0BFC closer to their counterparts

at 0BFA/0BF7. TAMIL P UNCTUATION END OF TEXT has been moved to 11FFF.

5. The informative aliases for T AMIL S PENT SIGN and T AMIL TOTAL SIGN (now at 0BFE and

0BFF) are slightly changed, and the two glyph changes (originally requested in

L2/12-106 §2; see also L2/12-231 §5.3) are effected.

Note that this revised document only provides the bare minimum technical material for

the proposal. Detailed discussion may be found in the previous proposal L2/12-231.

Since the UnicodeData.txt and NamesList.txt updates are non-trivial, I include them

as text file attachments to this PDF for convenience of the editors.

§2. On the font for the code charts

I would also like to repeat my request in L2/12-231 §9.5 (p 62) to use the Lohit Tamil font

which is freely publicly available under the Open Font Licence for the official code charts of

the Tamil and Tamil Supplement blocks. This font is of good aesthetic quality, and my

newly designed glyphs are based on the existing Tamil glyphs in it. While the official

homepage of this font is https://fedorahosted.org/lohit/, the download available from

there does not yet contain these glyphs as they are not encoded as yet. A derivative version

available from http://pravins.fedorapeople.org/tamil-fraction-symbol-proposal-fonts/

contains my new glyphs, and these will be added to the upstream font pending Unicode

publication of these characters (see https://bugzilla.redhat.com/show_bug.cgi?id=839303).

§3. Unicode Character Properties etc

(These may also be found as attachments to this PDF.)

§3.1. Additions to UnicodeData.txt

0BDF;TAMIL CURRENT SIGN;So;0;ON;;;;;N;;;;; 0BFB;TAMIL TRADITIONAL NUMBER SIGN;So;0;ON;;;;;N;;;;; 0BFC;TAMIL TRADITIONAL CREDIT SIGN;So;0;ON;;;;;N;;;;; 0BFD;TAMIL AND ODD SIGN;So;0;ON;;;;;N;;;;; 0BFE;TAMIL SPENT SIGN;So;0;ON;;;;;N;;;;; 0BFF;TAMIL TOTAL SIGN;So;0;ON;;;;;N;;;;;

11FC0;TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIETH;No;0;L;;;;1/320;N;;;;; 11FC1;TAMIL FRACTION ONE ONE-HUNDRED-AND-SIXTIETH;No;0;L;;;;1/160;N;;;;; 11FC2;TAMIL FRACTION ONE EIGHTIETH;No;0;L;;;;1/80;N;;;;; 11FC3;TAMIL FRACTION ONE SIXTY-FOURTH;No;0;L;;;;1/64;N;;;;; 11FC4;TAMIL FRACTION ONE FORTIETH;No;0;L;;;;1/40;N;;;;; 11FC5;TAMIL FRACTION ONE THIRTY-SECOND;No;0;L;;;;1/32;N;;;;; 11FC6;TAMIL FRACTION THREE EIGHTIETHS;No;0;L;;;;3/80;N;;;;; 11FC7;TAMIL FRACTION THREE SIXTY-FOURTHS;No;0;L;;;;3/64;N;;;;; 11FC8;TAMIL FRACTION ONE SIXTEENTH;No;0;L;;;;1/16;N;;;;; 11FC9;TAMIL FRACTION ONE TENTH;No;0;L;;;;1/10;N;;;;; 11FCA;TAMIL FRACTION ONE EIGHTH;No;0;L;;;;1/8;N;;;;; 11FCB;TAMIL FRACTION THREE TWENTIETHS;No;0;L;;;;3/20;N;;;;; 11FCC;TAMIL FRACTION THREE SIXTEENTHS;No;0;L;;;;3/16;N;;;;; 11FCD;TAMIL FRACTION ONE FIFTH;No;0;L;;;;1/5;N;;;;; 11FCE;TAMIL FRACTION ONE HALF;No;0;L;;;;1/2;N;;;;; 11FCF;TAMIL FRACTION THREE QUARTERS;No;0;L;;;;3/4;N;;;;; 11FD0;TAMIL FRACTION DOWNSCALING FACTOR KIIZH;No;0;L;;;;1/320;N;;;;; 11FD1;TAMIL SIGN NEL;So;0;ON;;;;;N;;;;; 11FD2;TAMIL SIGN SUVADU;So;0;ON;;;;;N;;;;; 11FD3;TAMIL SIGN AAZHAAKKU;So;0;ON;;;;;N;;;;; 11FD4;TAMIL SIGN URI;So;0;ON;;;;;N;;;;; 11FD5;TAMIL SIGN MUUVUZHAKKU;So;0;ON;;;;;N;;;;;

0BEF TAMIL DIGIT NINE

@ Tamil numerics +@+ Tamil fractions are encoded in the Tamil Supplement block starting at 11FC 0BF0 TAMIL NUMBER TEN 0BF1 TAMIL NUMBER ONE HUNDRED 0BF2 TAMIL NUMBER ONE THOUSAND -@ Tamil symbols +@ Tamil calendrical symbols 0BF3 TAMIL DAY SIGN = naal **+ = naazhi/padi

    • denotes a measure of grain that equals 2 uri or 4 uzhakku
  • x (tamil sign uri - 11FD4)
  • = pillaiyaar suzhi
    • denotes auspiciousness** 0BF4 TAMIL MONTH SIGN = maatham 0BF5 TAMIL YEAR SIGN = varudam +@ Tamil clerical symbols 0BF6 TAMIL DEBIT SIGN = patru 0BF7 TAMIL CREDIT SIGN

- = varavu **+ = eduppu

    • denotes incoming cash which is set aside for unknown expenses
    • sometimes used as the credit sign
    • the traditional credit sign is different
  • x (tamil traditional credit sign - 0BFC)** 0BF8 TAMIL AS ABOVE SIGN = merpadi -@ Currency symbol +@ Tamil currency symbol 0BF9 TAMIL RUPEE SIGN = rupai -@ Tamil symbol +@ Additional Tamil symbols 0BFA TAMIL NUMBER SIGN **+ = niluvai
    • denotes balance
    • sometimes used as the number sign
    • the traditional number sign is different
  • x (tamil traditional number sign - 0BFB) +0BFB TAMIL TRADITIONAL NUMBER SIGN** = enn **+ * this is the traditional number sign
  • x (tamil number sign - 0BFA) +0BFC TAMIL TRADITIONAL CREDIT SIGN
  • = varavu
    • this is the traditional credit sign
  • x (tamil credit sign - 0BF7) +0BFD TAMIL AND ODD SIGN
  • = silvaanam/sillarai
    • not to be confused with the sign for "long-lived"
  • x (tamil sign ciranjiivi - 11FEA) +0BFE TAMIL SPENT SIGN
  • = poka
    • not to be confused with the abbreviation for "pillai"
  • x (tamil sign pillai - 11FEB) +0BFF TAMIL TOTAL SIGN
  • = aaka +@+ More symbols are encoded in the Tamil Supplement block 11FC0-11FFF starting at 11FD**

§3.3. Additions to NamesList.txt (SMP)

@@ 11FC0 Tamil Supplement 11FFF @ Fractions 11FC0 TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIETH = mundiri 11FC1 TAMIL FRACTION ONE ONE-HUNDRED-AND-SIXTIETH = araikkaani 11FC2 TAMIL FRACTION ONE EIGHTIETH = kaani 11FC3 TAMIL FRACTION ONE SIXTY-FOURTH = kaalviisam 11FC4 TAMIL FRACTION ONE FORTIETH = araimaa 11FC5 TAMIL FRACTION ONE THIRTY-SECOND = araiviisam 11FC6 TAMIL FRACTION THREE EIGHTIETHS = mukkaani 11FC7 TAMIL FRACTION THREE SIXTY-FOURTHS = mukkaalviisam

  • for one twentieth "maa" use 0BAA x 0baa tamil letter pa 11FC8 TAMIL FRACTION ONE SIXTEENTH = viisam/maakaani 11FC9 TAMIL FRACTION ONE TENTH = irumaa 11FCA TAMIL FRACTION ONE EIGHTH = araikkaal 11FCB TAMIL FRACTION THREE TWENTIETHS = mummaa 11FCC TAMIL FRACTION THREE SIXTEENTHS = muuviisam/mummaamukkaani 11FCD TAMIL FRACTION ONE FIFTH = naalumaa
  • for one quarter "kaal" use 0BB x 0bb5 tamil letter va 11FCE TAMIL FRACTION ONE HALF = arai 11FCF TAMIL FRACTION THREE QUARTERS = mukkaal 11FD0 TAMIL FRACTION DOWNSCALING FACTOR KIIZH
  • when prefixed to a fraction, reduces its value by a factor of 1/ @ Measures of grain 11FD1 TAMIL SIGN NEL
  • one grain of paddy 11FD2 TAMIL SIGN SUVADU
  • equals 360 nel 11FD3 TAMIL SIGN AAZHAAKKU
  • equals 5 suvadu
  • for the measure uzhakku which equals 2 aazhaakku, use 11FCF x 11fcf tamil fraction three quarters 11FD4 TAMIL SIGN URI
  • equals 2 uzhakku 11FD5 TAMIL SIGN MUUVUZHAKKU
  • equals 3 uzhakku
  • for the measure naazhi/padi which equals 2 uri or 4 uzhakku, use 0BF x (tamil day sign - 0BF3)
  • for the measure kuruni/marakkaal which equals 8 naazhi/padi, use 0B x (tamil letter nga - 0B99) 11FD6 TAMIL SIGN PADAKKU
  • equals 2 kuruni/marakkaal 11FD7 TAMIL SIGN MUKKURUNI
  • equals 3 kuruni
  • for the measure tuuni which equals 2 padakku or 4 kuruni, use 0BA x (tamil letter ta - 0BA4)
  • for the measure kalam which equals 3 tuuni, use 0BB x (tamil letter lla - 0BB3) @ Old currency symbols 11FD8 TAMIL SIGN PAISAA
  • old paisa comprises 3 pai and equals 1/64 of a rupee
  • new or naya paisa equals 1/100 of a rupee

§4. Code Charts

0B8 0B9 0BA 0BB 0BC 0BD 0BE 0BF

0 ஐ^ ர^ ◌ீ^ ௐ^ ௰

1 ற^ ◌ு^ ௱

2 ஒ^ ல^ ◌ூ^ ௲

3 ஃ^ ஓ^ ண^ ள^ ௳

4 ஔ^ த^ ழ^ ௴

5 அ^ க^ வ^ ௵

6 ஆ^ ஶ^ ெ◌^ ௦^ ௶

7 இ^ ஷ^ ே◌^ ◌ௗ^ ௧^ ௷

8 ஈ^ ந^ ஸ^ ை◌^ ௨^ ௸

9 உ^ ங^ ன^ ஹ^ ௩^ ௹

A ஊ^ ச^ ப^ ெ◌ா^ ௪^ ௺

B ே◌ா^ ௫^ 

C ஜ^ ெ◌ௗ^ ௬^ 

D ◌்^ ௭^ 

E எ^ ஞ^ ம^ ◌ா^ ௮^ 

F ஏ^ ட^ ய^ ◌ி^ ^ ௯^ 

Tamil Supplement

11FC 11FD 11FE 11FF

0 ^ ^ 

1 ^ ^ 

2 ^ ^ 

3 ^ ^ 

4 ^ ^ 

5 ^ ^ 

6 ^ ^ 

7 ^ ^ 

8 ^ ^ 

9 ^ ^ 

A ^ ^ 

B ^ ^ 

C ^ ^ 

D ^ ^ 

E ^ ^ 

F ^ ^ ^ 

10

2b. If YES, with whom?

G Balachandran of ICTA Sri Lanka, various Tamil scholars participating in the C-Tamil mailing list

(ctamil-at-services.cnrs.fr), some members of INFITT WG02. See §1 of L2/12-231.

2c. If YES, available relevant documents

The matter was largely discussed in person or via email.

3. Information on the user community for the proposed characters (for example: size, demographics,

information technology use, or publishing use) is included?

Those who desire to store as digital text old Tamil manuscripts involving these characters, and those

who may desire to revive the use of at least some of these characters

4a. The context of use for the proposed characters (type of use; common or rare)

Rare

4b. Reference

See detailed proposal.

5a. Are the proposed characters in current use by the user community?

Scholars who work with manuscripts will use these characters.

5b. If YES, where?

Largely in research institutions around the world involved with Tamil and some Grantha manuscripts.

6a. After giving due considerations to the principles in the P&P document must the proposed characters be

entirely in the BMP?

No

6b. If YES, is a rationale provided?

6c. If YES, reference

7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)?

Yes

8a. Can any of the proposed characters be considered a presentation form of an existing character or

character sequence?

No.

8b. If YES, is a rationale for its inclusion provided?

8c. If YES, reference

9a. Can any of the proposed characters be encoded using a composed character sequence of either existing

characters or other proposed characters?

No

9b. If YES, is a rationale for its inclusion provided?

9c. If YES, reference

10a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an

existing character?

Some characters are similar (but not identical) to existing Tamil/Grantha letters or ligatures thereof.

10b. If YES, is a rationale for its inclusion provided?

Yes

10c. If YES, reference

They have a consistently distinct shape and meaning.

11a. Does the proposal include use of combining characters and/or use of composite sequences?

No

11b. If YES, is a rationale for such use provided?

11c. If YES, reference

11d. Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided?

12a. Does the proposal contain characters with any special properties such as control function or similar

semantics?

No.

12b. If YES, describe in detail (include attachment if necessary)

13a. Does the proposal contain any Ideographic compatibility character(s)?

No

13b. If YES, is the equivalent corresponding unified ideographic character(s) identified?

13c. If YES, reference:

-o-o-o-